The first tool many driver developers reach for is printk. Probe does not run, print a line. Interrupts do not arrive, print a line. DMA does not move, print a line. User space cannot read data, print another line.
That works, but only up to a point:
- too many logs hide the real failure
- high-frequency logs slow the system down
- printing in interrupt or locked paths changes timing
- production images cannot keep verbose logs enabled
- an intermittent bug disappears after adding logs
- multi-instance devices are hard to distinguish
A better model is layered instrumentation:
dev_err/dev_warn/dev_info: record failures and key states
dev_dbg + dynamic debug: enable detailed logs only when needed
tracepoint/ftrace: inspect timing, call paths, and latency
rate limit: prevent logs from damaging the system
debugfs/sysfs: expose necessary state snapshots
Logs are good at answering “what happened”. Tracing is better at answering “when did it happen, which path ran, and where was the latency”.
printk Has Weak Device Context
Raw printk lacks context. With multiple boards, instances, and buses, a log line may not tell which device produced it.
Drivers should usually use device-aware logging:
dev_err(dev, "failed to enable clock: %d\n", ret);
dev_warn(dev, "using fallback configuration\n");
dev_info(dev, "firmware version %u\n", version);
dev_dbg(dev, "irq status=0x%x\n", status);
dev_* logs include device context, making it easier to connect the line to a sysfs device, driver instance, and hardware block.
A basic convention is:
dev_err: operation failed, feature unavailable, or state invaliddev_warn: system can continue but has degraded or unexpected statedev_info: low-frequency key state, such as version, mode, or successful initdev_dbg: detailed information for development and diagnosis
Do not print everything as an error. Once error level is abused, real failures are buried.
Log State Transitions
Useful logs are not about quantity. They should sit at state transitions.
Good places include:
- probe start and failure cause
- resource acquisition failures, such as clock, regulator, GPIO, IRQ, DMA
- hardware reset, power-up, and mode changes
- abnormal interrupt status
- DMA submission and completion errors
- runtime PM suspend/resume
- remove and error rollback
- important user-space open/read/write/ioctl errors
Poor long-term logs include:
- every polling iteration
- every interrupt
- every packet
- context-free “here 1” and “here 2”
- repeated success messages on hot paths
Logs should show where the state machine went wrong, not turn the system into a text stream.
dynamic debug Makes dev_dbg Controllable
dev_dbg is usually not always printed. dynamic debug can enable debug logs at runtime by file, function, module, or format string.
The common control file is:
/sys/kernel/debug/dynamic_debug/control
Examples:
echo 'module my_driver +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file drivers/foo/bar.c +p' > /sys/kernel/debug/dynamic_debug/control
The value is that developers can leave detailed dev_dbg calls in the driver while production stays quiet by default. Field debugging can enable one module, collect detail, and turn it off.
This is better than adding temporary printk because:
- no kernel rebuild is required
- one file or function can be targeted
- details can be collected in the field and disabled again
- debug logs do not need to be always on
The prerequisite is that the driver already uses dev_dbg for fine-grained detail instead of scattered raw printk.
Rate Limiting Prevents Log Damage
Some driver errors happen on hot paths: interrupt storms, bus errors, DMA timeouts, repeated resets.
Printing every occurrence creates new problems:
- ring buffer is filled
- CPU is consumed
- interrupt and workqueue handling slow down
- real-time behavior changes
- the first useful evidence is overwritten
- persistent logging stresses storage
Linux provides rate-limited logging helpers:
dev_err_ratelimited(dev, "rx timeout\n");
dev_warn_ratelimited(dev, "bad status=0x%x\n", status);
For repeated high-frequency failures, limit the print rate and keep counters for later inspection.
More logs are not always better. For high-frequency errors, the first event, final state, accumulated count, and trigger condition are often more useful than every repeated line.
Tracing Is Better for Timing
Many driver failures are about ordering:
- interrupt arrives before buffers are ready
- runtime suspend powers off just before user access
- workqueue processing is too slow
- DMA completion races with close
- probe deferral makes services start before devices exist
- lock contention causes intermittent timeout
Logs are weak here. Timestamps include print overhead, and too many logs change the timing.
ftrace, tracepoints, function graph tracing, and perf are better tools for paths and latency.
A practical sequence is:
use logs to confirm state
then use trace to inspect timing
then keep counters or debugfs/sysfs entries for recurring diagnosis
Tracing is friendlier to hot paths and can correlate scheduling, interrupts, softirqs, workqueues, and function calls.
tracepoints Are Observation Points
If an event needs long-term observability, a tracepoint may be better than a temporary log.
Logs are for humans. Tracepoints are for tools. They can carry structured fields, be enabled on demand, and integrate with kernel tracing.
Events that may deserve tracepoints include:
- request submission and completion
- DMA buffer lifecycle
- state machine transitions
- error retries
- queue depth changes
- power state changes
Not every driver needs custom tracepoints. Many subsystems already have useful ones, such as block, networking, scheduler, IRQ, workqueue, and dma_fence. Check existing subsystem tracing first.
Logs Change Timing
One counterintuitive fact: adding logs can make a bug disappear.
Reasons include:
- printing adds delay and changes race windows
- console output may take locks or block
- high-frequency logs change scheduling and cache behavior
- serial output is slow
- print paths may add memory allocation or lock contention
- log level and console configuration change actual cost
For intermittent races, timeouts, interrupt ordering, or DMA completion order, do not rely only on adding logs.
Better options are:
- lightweight counters
- tracing for event timing
- bounded ring buffers for recent events
- avoiding prints in hard IRQ and locked hot paths
- enabling dynamic debug precisely instead of globally
Logging is an observation tool, not a zero-cost record of truth.
Debugging Facilities Need Lifecycle Handling
Driver debug entries must match the device lifecycle:
- probe failure should not leave half-created debugfs nodes
- remove should make old nodes disappear safely
- register reads during runtime suspend should resume the device or fail clearly
- suspend/resume should not race with debug access
- error-path logs should show which cleanup step failed
If a debugfs file dumps registers, reading it must consider whether the device is powered, whether clocks are enabled, and whether register access is safe. Otherwise the debugging interface can create faults.
Instrumentation is part of the driver. It should not bypass resources and power management.
A Practical Debugging Order
For driver failures, use this sequence:
start with dev_err/dev_warn: is there a clear failure
-> inspect probe resources and return codes
-> enable dynamic debug for details
-> read dmesg timestamps and device context
-> use trace/ftrace for timing and call paths
-> use rate limits and counters for hot errors
-> move recurring diagnostic state into debugfs or a stable ABI
Useful entry points include:
dmesg -T
journalctl -k -b
cat /sys/kernel/debug/dynamic_debug/control
echo 'module my_driver +p' > /sys/kernel/debug/dynamic_debug/control
trace-cmd record -e irq -e workqueue
cat /sys/kernel/tracing/trace
Actual availability depends on kernel configuration, rootfs, and permissions. Production devices should still keep enough error logs and diagnostic entry points to locate critical failures.
Driver Debugging Is Observation Design
printk is useful, but it is not a debugging system.
dev_err/dev_warn/dev_info preserve key state. dev_dbg plus dynamic debug provides on-demand detail. trace and ftrace show timing and paths. Rate limiting prevents logs from damaging the system. debugfs and sysfs expose necessary state.
The closer a driver is to interrupts, DMA, power management, and concurrency, the less it can rely on simply printing more lines. Stable debugging comes from planned observation points, not from flooding the system with logs in the field.