Drivers

Why Cache, Memory Barriers, and DMA Often Break Drivers

8 minute

Some driver bugs feel almost random.

The CPU has written a descriptor, but the device reads old contents. The DMA completion interrupt has fired, but the driver still reads stale buffer data. Adding one log line makes the bug disappear. Changing optimization brings it back. It worked on a single-core MCU, then fails occasionally on an SoC with cache.

These bugs are often not caused by “broken DMA” or “an aggressive compiler.” They happen because several different guarantees were mixed together.

Why Linux DMA mapping API Cannot Use Raw Pointers

7 minute

A common DMA driver bug looks like this: the CPU prepared a buffer, but the device reads old data; the device finished DMA, but the CPU still sees old values; the driver works on one SoC but fails after enabling IOMMU.

The usual root cause is treating a CPU pointer as an address the device can use.

In Linux, user virtual addresses, kernel virtual addresses, physical addresses, and DMA addresses are different layers. A device should receive a DMA address, not an ordinary C pointer.

What char devices and file_operations Expose to User Space

8 minute

In Linux driver debugging, user space often sees a /dev/xxx node.

An application can open() it, read() from it, write() to it, or control it with ioctl(). It looks as if the device is a file.

But a char device is not “hardware handed directly to the application.” /dev/xxx is only a user-space entry. What actually decides what each call means is the char device object registered by the driver and its file_operations.

Why the Linux Driver Model Separates device, driver, and bus

8 minute

A common Linux driver problem looks like this: the driver is built into the kernel, the Device Tree node exists, but probe never runs. Or probe runs but cannot obtain resources. Or the module loads successfully, but no device node appears.

If you only stare at the driver C file, this kind of problem is easy to misread.

A Linux driver is not “a hardware library function called by an application.” It first has to enter the kernel device model. The kernel needs to know which devices exist, which drivers exist, and what rules match them. Only after a match succeeds does the driver get a chance to initialize the hardware.

Why Device Tree and Board Description Affect Driver Probe

8 minute

A common Linux device-debugging problem looks like this: the driver code did not change, the kernel boots, but the device never probes. The log may only say probe failed, or the driver may never be entered at all.

The problem is not always in C code.

On many embedded Linux platforms, whether a driver can find a device and obtain register ranges, interrupts, clocks, power supplies, GPIOs, and DMA channels first depends on whether the board description is correct. On ARM, RISC-V, and many embedded platforms, that description is usually Device Tree.

Why DMA and Cache Can Make Data Inconsistent

8 minute

One of the most confusing driver bugs is “the data is clearly in memory, but the other side cannot see it.”

The CPU prepares a transmit buffer, but the network device sends old data. DMA has written received data into memory, but the driver or application still reads stale values. Adding logs makes the issue disappear; changing optimization brings it back.

These bugs are often not because DMA failed or the pointer is wrong. The problem is that CPU cache, DMA device, and memory visibility were not handled correctly.

What Separates Applications, the Kernel, and Drivers?

8 minute

When application code calls read(), write(), or ioctl(), it can look like the program is directly operating a device. Reading a UART, writing to a network interface, controlling GPIO, or accessing a sensor may all appear to be simple function calls.

But that path is not the application touching hardware directly.

On systems with an operating system, applications, the kernel, and drivers are separated by several boundaries: permission boundaries, address-space boundaries, system-call boundaries, device abstraction boundaries, and blocking semantics. Many application-driver debugging problems come from mixing these boundaries together.

Why an Interrupt Is Not a Normal Function Call

10 minute

Device software often creates a misleading mental model: code runs in source order, and the next thing only happens after the current function returns. That model is useful inside a single function, but it breaks down as soon as timers, UART input, network packets, GPIO events, and DMA completion enter the system.

A peripheral does not wait until the main loop reaches the right line. A UART byte arrives when it arrives. A network packet arrives when it arrives. A timer expires when it expires. If the CPU had to discover every event by polling, response would either be slow or the system would waste a large amount of time checking status bits.