Why Timers and Clocks Affect Timeout Behavior

Reading time: 7 minute Word count: 1408

Operating Systems Timers Clocks Scheduling Timeouts

Engineering code often contains calls like these:

wait_event_timeout(..., 1000);
sleep(1);
select(fd + 1, &rfds, NULL, NULL, &tv);

They all look like “wait for a while.” But waiting inside an operating system does not mean the CPU sits still and counts time.

A timeout usually passes through several steps: the program submits a wait request, the kernel places the current thread on a wait queue, a timer records the latest wakeup time, the scheduler gives the CPU to another thread, hardware clocks or timer interrupts advance time, and the thread is woken when the condition is satisfied or the timeout expires.

So “wait for 1 second” does not mean “continue exactly 1 second later.” A better model is: timers create expiration events, clocks provide the time base, and the scheduler decides when a woken thread actually runs.

application waits
-> kernel registers condition and timeout
-> current thread sleeps
-> hardware timer / tick advances time
-> condition becomes true or timeout expires
-> kernel wakes the thread
-> scheduler lets it run

This path explains many behaviors: why sleep(1) may sleep longer, why changing system time affects some timeouts, why periodic tasks jitter under load, and why an RTOS tick limits delay precision.

A Timer Is Not Busy Waiting

The crudest way to wait is busy polling:

while (!done) {
    /* keep polling */
}

This keeps consuming CPU. It may be acceptable for very short and controlled hardware waits, but it is not suitable for ordinary tasks, threads, or application-level timeouts.

Timed waiting in an operating system usually does not busy-wait. If a condition is not currently true, the thread attaches itself to a wait queue or timer structure and gives up the CPU. The CPU can then run other tasks.

thread A waits for an event
-> thread A sleeps
-> CPU runs thread B
-> timer expires or event occurs
-> thread A is woken

That is why sleep() does not burn a whole CPU core. Sleeping is not counting time in place; the thread temporarily stops participating in execution.

Time Has Different Meanings

Many timeout bugs come from treating all “time” as the same thing.

Systems usually need to distinguish several kinds of time:

wall-clock time: the user-visible date and time, affected by NTP, manual changes, and timezone configuration
monotonic time: time that only moves forward, suitable for intervals and timeouts
uptime: elapsed time since system boot
CPU time: time actually spent executing on behalf of a process or thread
hardware counters: counter sources provided by an SoC, RTC, HPET, ARM generic timer, or similar hardware

If wall-clock time is used for timeouts, adjusting the system clock forward or backward can make waits end early or last too long. Network timeouts, lock waits, and retry intervals are usually better based on monotonic time.

That is why many APIs distinguish CLOCK_REALTIME from CLOCK_MONOTONIC. The former is like a clock on the wall; the latter is better for measuring elapsed time.

The Tick Limits Time Granularity in Many RTOSes

Many RTOS kernels use a periodic tick interrupt to advance system time. For example, a tick may fire every 1 ms or 10 ms. The kernel uses it to update counters, check delayed tasks, and trigger scheduling.

If the system tick is 10 ms, a delay(1 ms) cannot reliably produce a true 1 ms sleep. It is usually rounded to tick boundaries and affected by scheduling, interrupt-disabled sections, and higher-priority tasks.

tick: 0ms  10ms  20ms  30ms
task requests sleep 12ms
actual wakeup may happen near 20ms
actual execution still waits for scheduling

Linux also has ticks, dynamic ticks, and high-resolution timers, but the engineering judgment is similar: timer expiration means “the thread can be woken,” not “your code runs immediately.”

Timeout Expiration Does Not Mean Immediate Execution

When a timer expires, the kernel usually marks a waiting thread runnable or queues follow-up work through a callback, softirq, workqueue, or kernel thread.

Actual execution still depends on scheduling:

whether the CPU is running a higher-priority task
whether interrupts have been disabled for too long
whether the system is busy with softirqs or kernel threads
whether the target thread is still blocked on locks, I/O, or other resources
whether a multicore system needs migration or load balancing

So “timeout after 100 ms” often means “the return condition becomes possible after about 100 ms.” Under high load, long critical sections, or poor real-time priority configuration, the actual return may be much later.

That delay is not caused by the timer alone. It is the combined result of timers, scheduling, and system load.

sleep and timeout Mean Different Things

sleep is closer to voluntarily pausing the current thread for some duration. When the duration expires, the thread becomes runnable.

A timeout is often tied to a wait condition. It is not just sleeping; it is a race between “condition became true” and “time expired.”

For a device read with a timeout:

thread waits for data, at most 500ms
-> interrupt arrives after 100ms
-> data is readable
-> read returns early

Another path is:

thread waits for data, at most 500ms
-> no data arrives
-> timer expires
-> return timeout

Neither path means “sleep exactly 500 ms.” A timeout is the maximum wait boundary. Events may end the call earlier.

When designing protocol retries, sensor sampling, or device state waits, make clear whether the code is waiting for time itself or waiting for a state to appear within a time limit.

Timer Callbacks Should Not Do Heavy Work

When a timer expires, the system must run some handling logic. The details vary: some systems do part of it in interrupt context, while others continue in softirq, kernel thread, or workqueue context.

Regardless of the exact mechanism, timer callbacks should not do heavy work.

Reasons include:

the callback context may not be allowed to block
long execution can delay later timers and interrupt response
shared data requires careful concurrency handling
rearming timers inside callbacks can create races
high-frequency expiration amplifies system load

A more robust pattern is to make the timer do lightweight marking, wakeup, or work submission, and move expensive processing into a context that may block and be scheduled normally.

Power Management Changes Time Intuition

When a device enters a low-power state, normal CPU ticks may stop, some peripheral clocks may be gated, and only the RTC or a dedicated low-power timer may remain able to wake the system.

This creates several questions:

which clocks keep running during sleep
which timer can act as a wake source
whether time needs to be synchronized after wake
whether a timeout registered before deep sleep is still valid
whether deferred timed work runs in a burst after wake

IoT devices often run into these issues. The application may think “retry after 10 seconds,” while the system has entered deep sleep and wakeup depends on a different low-power clock source.

Low-power design must consider not only application-level timers, but also clock sources, power domains, and wake sources.

How to Debug Timeout Problems

When you see inaccurate timeouts, oversleeping, periodic jitter, or occasional device wait failures, split the path into layers.

First, identify the time base. Is it wall-clock time, monotonic time, tick count, or a hardware counter?

Second, identify the wait semantics. Is it sleep, event wait, I/O timeout, lock timeout, or protocol retry?

Third, check whether precision requirements exceed system capability. Tick granularity, timer resolution, scheduling latency, and hardware clock sources must match the requirement.

Fourth, check whether the thread is scheduled promptly. Priority, CPU load, interrupt-disabled time, softirqs, and lock contention may delay execution.

Fifth, check whether low-power state changed the clock and wake path. Which counter keeps running during sleep, and which interrupt can wake the system?

These questions are closer to the real cause than only checking whether the timer was started.

What Matters in Practice

Timers, clocks, and the scheduler solve different problems.

Clocks provide a time base. Timers create expiration events. The scheduler decides when a thread runs. sleep, timeouts, periodic work, and low-power wakeups are built on top of these layers.

So do not interpret “set a 100 ms timer” as “the code will run exactly 100 ms later.” A more accurate statement is: after about 100 ms, the system can wake or process the event; actual execution still depends on the clock source, timer resolution, scheduling, load, and power state.

Separating these layers turns many seemingly random timeout issues into measurable and explainable system behavior.