Linux Drivers

10 Posts

Why Linux Driver probe Failure Paths Are More Bug-Prone Than Successful probe

7 minute

Driver debugging often focuses on the successful probe path: acquire resources, map registers, request interrupt, initialize hardware, register user-space interface, then print “probe ok”.

Field bugs often hide on another path:

  • probe fails halfway and does not roll back cleanly
  • probe returns an error while IRQ or workqueue is still active
  • user space keeps an fd after remove
  • runtime suspend is powering down while the error path releases resources
  • DMA buffer is freed while the device is still writing
  • devm_ is used, but object lifetime is not what the driver expected

The hard part is not initializing hardware once. The hard part is: if any step fails, the device is removed, the module unloads, the system suspends, or user space still holds an fd, the driver must stop everything it has already started in the right order.

Read More

Why Linux Driver Debugging Is More Than printk

6 minute

The first tool many driver developers reach for is printk. Probe does not run, print a line. Interrupts do not arrive, print a line. DMA does not move, print a line. User space cannot read data, print another line.

That works, but only up to a point:

  • too many logs hide the real failure
  • high-frequency logs slow the system down
  • printing in interrupt or locked paths changes timing
  • production images cannot keep verbose logs enabled
  • an intermittent bug disappears after adding logs
  • multi-instance devices are hard to distinguish

A better model is layered instrumentation:

Read More

What Should sysfs, debugfs, and procfs Expose?

6 minute

Linux drivers often expose information through text files in addition to /dev/xxx and ioctl:

/sys/...
/sys/kernel/debug/...
/proc/...

They all support cat and echo, so it is tempting to place state, configuration, debug knobs, and statistics wherever convenient. That convenience becomes interface debt: test scripts depend on debugfs, product applications parse procfs, sysfs formats cannot be changed, and field tools do not know which interface is stable.

A practical boundary is:

Read More

How runtime PM Differs From suspend/resume in Linux Drivers

7 minute

Low-power bugs often look intermittent: the first I/O after idle fails, interrupts disappear after wakeup, /dev still exists but hardware does not respond, or power never goes down.

These problems often come from the Linux driver power-management state machine.

Embedded Linux drivers commonly face two paths: runtime PM and system suspend/resume. Both save power, but they solve different problems.

The safest first model is this: runtime PM handles per-device idle power saving while the system is running; system suspend/resume handles whole-system sleep and wakeup. A driver must keep I/O, resources, wakeup sources, and state restore consistent in both paths.

Read More

Why GPIO, pinctrl, clock, regulator, and reset Are Driver Lifecycle Resources

7 minute

Many embedded Linux device bugs look like register-access bugs: probe runs, register mapping succeeds, but reads return invalid values; interrupts never arrive; an I2C device randomly NACKs; the first access after resume fails.

The problem is not always in register access.

Whether hardware works often depends on more basic resources first: pins must be muxed correctly, clocks must be enabled, power must be stable, reset must be released, and GPIO polarity must be correct.

Read More

Why Linux DMA mapping API Cannot Use Raw Pointers

7 minute

A common DMA driver bug looks like this: the CPU prepared a buffer, but the device reads old data; the device finished DMA, but the CPU still sees old values; the driver works on one SoC but fails after enabling IOMMU.

The usual root cause is treating a CPU pointer as an address the device can use.

In Linux, user virtual addresses, kernel virtual addresses, physical addresses, and DMA addresses are different layers. A device should receive a DMA address, not an ordinary C pointer.

Read More

How Interrupts, Wait Queues, and poll Connect in Linux Drivers

7 minute

An application waits on a device fd with poll() or epoll_wait(). The hardware has already raised an interrupt, but the application never wakes. Or the application wakes repeatedly, but read() returns no data.

This kind of bug usually does not live in one function. It comes from interrupts, buffers, wait queues, and poll semantics not agreeing with each other.

A hardware interrupt does not deliver data directly to an application. It only tells the CPU that a device event happened. The driver must turn that event into kernel-visible state and wake the processes waiting for it.

Read More

What char devices and file_operations Expose to User Space

8 minute

In Linux driver debugging, user space often sees a /dev/xxx node.

An application can open() it, read() from it, write() to it, or control it with ioctl(). It looks as if the device is a file.

But a char device is not “hardware handed directly to the application.” /dev/xxx is only a user-space entry. What actually decides what each call means is the char device object registered by the driver and its file_operations.

Read More

Why platform driver probe Happens

8 minute

platform_driver is common in embedded Linux. SoC internal peripherals such as UART, I2C controllers, SPI controllers, PWM, watchdogs, GPIO controllers, clock controllers, and interrupt controllers often use platform drivers.

When debugging this kind of driver, the most common problem is not a wrong register write. It is that probe never happens.

The Device Tree node exists, and the driver is built in, but why does probe not run?
The compatible string looks right, but why does it not match?
probe runs, but why are resources missing?
What does EPROBE_DEFER in the log mean?

Read More

Why the Linux Driver Model Separates device, driver, and bus

8 minute

A common Linux driver problem looks like this: the driver is built into the kernel, the Device Tree node exists, but probe never runs. Or probe runs but cannot obtain resources. Or the module loads successfully, but no device node appears.

If you only stare at the driver C file, this kind of problem is easy to misread.

A Linux driver is not “a hardware library function called by an application.” It first has to enter the kernel device model. The kernel needs to know which devices exist, which drivers exist, and what rules match them. Only after a match succeeds does the driver get a chance to initialize the hardware.

Read More