What we have here is a state machine. A given endpoint is either FINE, FAILING (n), FAILED, or RECOVERING(n). Failing and recovering states have n copies.

Possible transitions:

FINE & !ping(ok) => Failing(0)

Failing(x) & ping(ok) => Fine Failing(x < n) & !ping(ok) => Failing(x+1) Failing(x == n) => Failed

Failed & ping(ok) => Recovering(0)

Recovering(x < n) & ping(ok) => recovering(x + 1) Recovering(x) & !ping(ok) => Failed Recoverin (x = n) => Fine

Or, in English, if we're fine and we get a bad ping, to into failing. Any good pings in failing take us back into fine. Otherwise, each bad ping takes us one step closer to failed. After n bad pings in a row we move into failed (and this transition sends a message). In failed, the situation is reversed. A good ping in failed puts us in recovering, any bad pings in recovering put us back into failed (no notification). After n good pings in a row we move back into fine (and probably trigger a notification).

Each endpoint needs to track it's current state, and the length and sign (good/bad) of the current streak. (Streaks have a minimum length of 1 since they're driven by the ping that has just returned)

StateNames enum (Fine, Failing, Failed, Recovering) PingResult enum (Ok, NotOk) Streak struct {uint count, PingResult sign}