Watchdog

1

Why a Watchdog Is More Than Rebooting a Frozen System

8 minute

Many devices have a watchdog. The common explanation is simple: if the system freezes, the watchdog times out and reboots it.

That is not wrong, but it is too shallow.

A useful watchdog is not merely a timed reset mechanism. It asks a more specific question: are the critical paths that must keep making progress actually still making progress?

If the watchdog is fed from the wrong place, the business thread may be deadlocked while the system still feeds the watchdog on time forever. If the timeout is too short for real scheduling and I/O behavior, the system may reset even though it could have recovered normally.

Read More