A common DMA driver bug looks like this: the CPU prepared a buffer, but the device reads old data; the device finished DMA, but the CPU still sees old values; the driver works on one SoC but fails after enabling IOMMU.
The usual root cause is treating a CPU pointer as an address the device can use.
In Linux, user virtual addresses, kernel virtual addresses, physical addresses, and DMA addresses are different layers. A device should receive a DMA address, not an ordinary C pointer.
The safest first model is this: the Linux DMA mapping API converts CPU-side buffers into device-visible DMA addresses and transfers buffer ownership between CPU and device.
CPU buffer
-> DMA map / alloc
-> dma_addr_t
-> device DMA
-> sync / unmap
-> CPU handles result
Address translation, cache maintenance, IOMMU permissions, and direction semantics may all be handled by the DMA API. Bypassing it often creates drivers that only work by accident on some platforms.
CPU Pointers Are Not DMA Addresses
Drivers may encounter several address types:
user virtual address: pointer seen by application
kernel virtual address: pointer used by kernel code
physical address: location seen by the memory controller
DMA address: bus address seen by the device or IOMMU
The device does not understand user-space virtual addresses. Many devices cannot use kernel virtual addresses either. Even if physical and DMA addresses happen to have the same numeric value on one platform, a driver must not rely on that.
Linux uses dma_addr_t for addresses handed to devices. Drivers should obtain it through DMA APIs, not by casting pointers.
wrong assumption: device address = (unsigned long)buf
correct path: buf -> dma_map_* -> dma_addr_t
IOMMU, non-coherent caches, bounce buffers, and device address-width limits can all make DMA addresses differ from CPU-side addresses.
coherent DMA Fits Descriptors and Rings
One common DMA model is coherent DMA memory, usually allocated with dma_alloc_coherent():
void *cpu_addr;
dma_addr_t dma_addr;
cpu_addr = dma_alloc_coherent(dev, size, &dma_addr, GFP_KERNEL);
It returns two addresses:
cpu_addr: used by the CPUdma_addr: used by the device
Coherent means the platform keeps CPU and device views consistent, so the driver usually does not manually clean or invalidate caches around every transfer.
It fits:
- DMA descriptors
- ring descriptors
- control blocks
- small structures frequently shared by CPU and device
But coherent memory is not always ideal for large data buffers. It may be limited, and CPU access performance may differ from normal cached memory.
streaming DMA Fits Phase-Based Transfers
The other common model is streaming DMA mapping.
The driver already has a CPU buffer, then maps it before handing it to the device:
dma_addr_t dma;
dma = dma_map_single(dev, buf, len, DMA_TO_DEVICE);
if (dma_mapping_error(dev, dma))
return -EIO;
/* write dma to device register or descriptor */
After transfer, unmap it:
dma_unmap_single(dev, dma, len, DMA_TO_DEVICE);
The core of streaming DMA is ownership transfer:
CPU prepares buffer
-> dma_map_single(..., DMA_TO_DEVICE)
-> device owns buffer
-> DMA completes
-> dma_unmap_single(...)
-> CPU owns buffer again
Map and unmap do more than “get address” and “release address.” They may perform cache sync, IOMMU mapping, bounce buffering, and permission management.
Direction Matters
DMA API direction is critical:
DMA_TO_DEVICE: CPU writes, device readsDMA_FROM_DEVICE: device writes, CPU reads laterDMA_BIDIRECTIONAL: both may read or write
Wrong direction can mean wrong cache synchronization.
Transmit direction:
CPU writes tx buffer
-> map/sync for device
-> device DMA reads it
Receive direction:
device DMA writes rx buffer
-> unmap/sync for CPU
-> CPU reads it
Do not use DMA_BIDIRECTIONAL everywhere just to avoid thinking. It may cost more and hides unclear interface semantics.
Direction is part of buffer ownership. The driver must know whether the CPU is preparing data or the device is producing data.
sync Temporarily Transfers Ownership
Some streaming buffers are reused repeatedly without unmap/map every time.
Then sync APIs are used:
dma_sync_single_for_device(dev, dma, len, DMA_TO_DEVICE);
/* device uses it */
dma_sync_single_for_cpu(dev, dma, len, DMA_FROM_DEVICE);
/* CPU uses it */
The names for_device and for_cpu matter.
for_device: the device will access next; CPU writes must become visible to itfor_cpu: the CPU will access next; device writes must become visible to it
Sync is not an ordinary memory barrier. It may clean or invalidate caches and follow platform DMA-coherency rules.
If the CPU writes while the buffer belongs to the device, or reads before DMA completes, sync cannot fix the ownership bug.
Descriptors and Data Buffers May Need Different Memory
Many drivers use two kinds of memory:
DMA descriptors / ring: control information shared frequently by CPU and device
data buffer: actual payload
Descriptors often use coherent DMA because CPU and device frequently access status bits, lengths, addresses, and owner bits.
Data buffers often use streaming DMA because they may be large and usually belong to either CPU or device during a phase.
Mixing these models blindly causes bugs:
- descriptors as streaming buffers can miss sync points
- large data buffers as coherent memory can hurt performance or exhaust resources
- descriptors must contain DMA addresses, not CPU pointers
Before writing DMA code, separate control structures from data buffers.
IOMMU Breaks Physical-Address Assumptions
On platforms without IOMMU, DMA addresses may look similar to physical addresses. Broken drivers can appear to work there.
When IOMMU is enabled, the device often sees an IOVA rather than a CPU physical address. DMA APIs create the IOMMU mapping and return a usable dma_addr_t.
Consequences:
- the device can access only mapped ranges
- after unmap, the device must not continue accessing the buffer
- DMA address may differ completely from physical address
- DMA mask and address width matter
- mapping errors must be checked
Do not rely on physical addresses that happened to work on one board. Use DMA APIs and respect map/unmap lifetime.
mmap to User Space Adds Another Boundary
Some drivers mmap DMA buffers to user space to reduce copies.
This is not handing dma_addr_t to the application.
User space needs a process virtual address; the device needs a DMA address; the kernel must manage lifetime and cache attributes.
Typical issues include:
- user space still maps the buffer after the driver frees it
- who owns the buffer: CPU, device, or user space
- when user writes become visible to device
- when device writes become visible to user
- whether the mapping is cached
- how multiple mappings synchronize
DMA plus mmap needs a clear queue protocol, ownership markers, and synchronization points. Otherwise “zero copy” becomes hard-to-reproduce coherency bugs.
What to Check First for Linux DMA Bugs
When DMA data is stale, intermittent, platform-dependent, or breaks under IOMMU, check the chain:
First, does the device receive a dma_addr_t? Do not write user pointers, kernel pointers, or physical addresses into device registers or descriptors.
Second, is the DMA mask correct? Can the device address the returned DMA range? Is dma_mapping_error() checked?
Third, is memory type correct? Are descriptors/rings coherent and data buffers streaming where appropriate?
Fourth, is direction correct? Does DMA_TO_DEVICE, DMA_FROM_DEVICE, or DMA_BIDIRECTIONAL match the real data flow?
Fifth, are map/unmap or sync points correct? Does CPU/device ownership overlap incorrectly?
Sixth, do descriptors contain DMA addresses rather than CPU addresses?
Seventh, is DMA completion ordered before CPU reads results?
Eighth, does mmap define ownership among user space, kernel, and device?
What to Remember in Practice
Linux DMA mapping API is not ceremonial.
It defines the address and visibility boundary among CPU, device, cache, IOMMU, and memory.
Drivers must keep these separate:
- CPU uses
void * - device uses
dma_addr_t - coherent DMA fits shared control structures
- streaming DMA fits phase-based data transfers
- direction defines data flow
- map/unmap/sync defines ownership transfer
DMA bugs are dangerous because they often appear only on some platforms, cache states, loads, or after an upgrade. Using the DMA API correctly keeps the driver from depending on accidental address and cache behavior.