Low-power bugs often look intermittent: the first I/O after idle fails, interrupts disappear after wakeup, /dev still exists but hardware does not respond, or power never goes down.
These problems often come from the Linux driver power-management state machine.
Embedded Linux drivers commonly face two paths: runtime PM and system suspend/resume. Both save power, but they solve different problems.
The safest first model is this: runtime PM handles per-device idle power saving while the system is running; system suspend/resume handles whole-system sleep and wakeup. A driver must keep I/O, resources, wakeup sources, and state restore consistent in both paths.
runtime PM:
device idle -> runtime suspend -> next use -> runtime resume
system suspend:
system prepares sleep -> driver suspend -> wake event -> driver resume
Mixing the two concepts is a common way to create drivers that work normally but fail after low power.
runtime PM Is Device-Level Idle Management
runtime PM happens while the system is still running.
The CPU may be active and other devices may be working, but one device is unused and can independently disable clocks, power, or enter low power.
Examples:
- power off a sensor when idle
- disable UART clocks when no fd is open
- power down camera sensor and MIPI when not streaming
- autosuspend an I2C controller after transfer
The core idea is not “the system slept.” It is “this device is currently unused and may lower power.”
Drivers usually manage usage references with APIs like:
pm_runtime_get_sync(dev);
/* access hardware */
pm_runtime_put_autosuspend(dev);
When the reference is held, the device should be accessible. After it is released, the device may enter low power after autosuspend delay.
system suspend Is Whole-System Sleep
system suspend puts the whole system into a low-power state.
User space may be frozen, and the kernel calls driver suspend callbacks in dependency order. After wakeup, it calls resume callbacks.
During system suspend, a driver usually needs to:
- stop new I/O
- wait for or cancel active transfers
- save hardware state
- configure wakeup sources
- disable interrupts or switch interrupt semantics
- disable clocks/regulators and select sleep pinctrl
During resume, it needs to:
- restore power, clocks, pinctrl, and reset
- reinitialize registers
- restore queues, interrupts, and DMA rings
- handle wake events that happened during sleep
system suspend cares about whether the whole system can sleep and wake consistently, not only whether one idle device can save power.
Usage Count Is the Hard Part of runtime PM
runtime PM revolves around usage references.
Before hardware access, get a reference. After use, put it. While the count is above zero, the device should not runtime suspend. When it reaches zero, autosuspend may lower power.
Common bugs include:
- forgetting
pm_runtime_getbefore hardware access - forgetting
pm_runtime_puton error paths - IRQ thread accessing hardware while the device is runtime suspended
- get on open but missing put on release, so power never drops
- put after read/write while async DMA is still running
- remove racing with runtime suspend
Wrong usage count causes two classes of bugs:
- put too early: device is powered down while in use, causing intermittent I/O failure
- missing put: device stays active forever, wasting power
runtime PM is not adding two API calls. It requires a clear definition of who is using the device.
autosuspend Is Delayed Power Down
Many devices should not power down immediately after every I/O.
If a sensor samples every 100 ms, or an I2C controller transfers frequently, immediate suspend/resume may increase latency and waste power.
autosuspend delays power down after the last use:
pm_runtime_use_autosuspend(dev);
pm_runtime_set_autosuspend_delay(dev, 200);
pm_runtime_put_autosuspend(dev);
This balances power and responsiveness.
Too short a delay causes frequent power cycling.
Too long a delay keeps the device active and wastes power.
The autosuspend delay should come from access frequency, resume cost, and product power targets, not a random value.
runtime suspend Must Leave Hardware Safe
The runtime suspend callback should put the device into a recoverable low-power state.
Common actions:
- stop new transfers
- ensure hardware is not busy
- stop or wait for DMA
- disable interrupts or keep only required ones
- save small register state
- disable clocks and regulators
- select sleep pinctrl
It is not enough to “turn off the clock.”
If a transfer is still active, disabling the clock may hang the controller.
If interrupts are not handled, resume may lose events or trigger an IRQ storm.
If register state is lost, resume must reinitialize it.
The goal is that the next runtime resume can restore the device to usable state.
runtime resume Must Restore Access Conditions
runtime resume is not just enabling one clock.
It must restore hardware prerequisites:
- enable regulators
- prepare/enable clocks
- select default pinctrl
- deassert reset
- reprogram registers
- restore FIFO, DMA, and queue state
- enable interrupts
The order must follow the datasheet.
Many “first access after idle fails” bugs happen because runtime resume restores only part of the state.
Drivers can share helper functions between probe and resume, but should keep the distinction clear: probe binds and initializes the device for the first time; resume restores it from low power.
wakeup source Decides Who Can Wake the System
During system suspend, not every interrupt can wake the system.
A device acting as wakeup source usually requires:
- Device Tree or ACPI declaring wake capability
- driver calling
device_init_wakeup()or equivalent - suspend path enabling wake IRQ or device wake mode
- power domain and interrupt path remaining available during sleep
An interrupt working at runtime does not mean it can wake from deep sleep.
Common failures:
- GPIO interrupt works while running but cannot wake the system
- wake IRQ is configured, but sleep pinctrl changes the pin incorrectly
- device power is off, so it cannot generate wake
- wake event happens, but resume does not read and clear the status
wakeup source behavior depends on hardware, power domains, interrupt controllers, pinctrl, and driver suspend configuration together.
system suspend and runtime PM Interact
A driver cannot assume the device is active when system suspend begins.
The device may already be runtime suspended. In that case, system suspend should not repeat operations on already-disabled resources, and should not access registers requiring clocks.
After resume, whether the device remains active should also match runtime PM state.
Common strategies:
- ensure the device is in a suitable state before suspend
- take a light path for devices already runtime suspended
- restore only required system state during resume
- let the next runtime resume restore full access state
This is often platform- and dependency-specific. The important part is to track current PM state instead of blindly toggling resources in every callback.
Debug PM State, Not Only Logs
Low-power debugging is hard because logging changes behavior. Heavy logs may keep a device active; serial output can prevent deeper sleep.
Besides dmesg, inspect:
/sys/bus/.../devices/.../power/runtime_statusruntime_usageruntime_active_timeruntime_suspended_time- wakeup-related sysfs nodes
- ftrace power, IRQ, and device PM events
- interrupt counts and wake counts
- power meter and GPIO markers
Classify the problem first:
- power does not drop: who holds runtime PM reference
- device powers down while in use: who put too early
- system cannot suspend: who blocks system suspend
- system cannot wake: wakeup source, IRQ, power domain, or pinctrl issue
- device fails after wake: resume path incomplete
What to Check First
First, is runtime PM enabled? Check runtime status, usage count, and autosuspend delay.
Second, does every hardware access hold a PM reference? Check open/read/write/ioctl, IRQ thread, workqueue, and DMA completion.
Third, do error paths put references? Failure returns, timeouts, cancellation, and remove must not leak usage count.
Fourth, does runtime suspend wait for hardware idle? DMA, FIFO, and controller busy bits must be handled.
Fifth, does runtime resume restore all resources? pinctrl, clocks, regulators, reset, registers, and interrupts must be complete.
Sixth, does system suspend configure wakeup source? wake IRQ, sleep pinctrl, power domain, and device wake state must agree.
Seventh, does system resume handle sleep-time events? Wake status must be read and cleared; queues must be restored.
Eighth, do runtime PM and system PM step on each other? A device already runtime suspended should not have resources toggled blindly by system suspend/resume.
What to Remember in Practice
runtime PM and system suspend/resume are both power management, but their boundaries differ.
runtime PM manages device idle while the system runs: who uses the device, when it can power down, and how it resumes before the next use.
system suspend/resume manages whole-system sleep: how devices stop before sleep, which events can wake the system, and how state is restored after wake.
Driver PM is a state machine:
active
-> runtime suspended
-> runtime resumed
-> system suspended
-> system resumed
Every state transition must agree with resources, I/O, wakeup sources, and user-visible behavior. Otherwise the device becomes “normal most of the time, broken after low power,” which is one of the hardest classes of bugs to debug.