An application waits on a device fd with poll() or epoll_wait(). The hardware has already raised an interrupt, but the application never wakes. Or the application wakes repeatedly, but read() returns no data.
This kind of bug usually does not live in one function. It comes from interrupts, buffers, wait queues, and poll semantics not agreeing with each other.
A hardware interrupt does not deliver data directly to an application. It only tells the CPU that a device event happened. The driver must turn that event into kernel-visible state and wake the processes waiting for it.
The safest first model is this: the interrupt reports the event, the driver updates buffers and state, the wait queue handles sleep and wakeup, and poll reports current readability or writability to the user-space event loop.
hardware event
-> IRQ handler
-> read status / clear IRQ / put data into buffer
-> wake_up wait queue
-> poll/epoll returns readable
-> application read consumes data
If any part of this chain disagrees with the others, user-space behavior becomes confusing.
Interrupt Handlers Should Stay Short
After a hardware interrupt, the kernel enters the interrupt path.
A driver IRQ handler usually does only the urgent work:
- confirm whether this interrupt belongs to the device
- read minimal status
- clear or mask the interrupt source
- record the event
- wake later processing
Interrupt context cannot freely sleep and should not do long work. Spending too much time in the IRQ path hurts system responsiveness and can delay other interrupts and scheduling.
Many drivers split work:
hard IRQ: confirm event, clear IRQ, save minimal state
threaded IRQ / workqueue / tasklet: move data, handle protocol, wake waiters
user space: read result through read/poll
The goal of IRQ handling is not to finish everything the application needs. It is to reliably move a hardware event into a state that later code can process.
Buffers Separate Interrupts From Applications
Hardware events are asynchronous relative to application reads.
The device may produce data while the application is asleep or not calling read(). Without an intermediate buffer, IRQ context and application context are hard to coordinate.
Drivers commonly maintain a buffer or queue:
IRQ / bottom half
-> put data or event into ring buffer
-> update readable state
-> wake_up
application read
-> consume data from ring buffer
-> update readable state
This buffer defines important semantics:
- when full, drop old data, drop new data, or report overflow
- whether one event is one frame or a byte stream
- how multiple readers share data
- whether device errors enter the queue
- whether the queue is cleared when fd closes
If buffer state and poll state disagree, you get “poll says readable but read has no data,” or “data exists but epoll never wakes.”
Wait Queues Sleep and Wake Processes
Drivers often use wait queues:
wait_queue_head_t wait;
They solve one problem: when no data exists, the application thread should sleep instead of spinning; when data arrives, the driver wakes it.
Simplified blocking read path:
application read
-> driver sees empty buffer
-> current process sleeps on wait queue
-> interrupt arrives, driver stores data
-> wake_up
-> read wakes and checks condition again
-> data is copied to user space
The key detail is that wakeup is not a result guarantee. After waking, the driver must check the condition again.
Multiple processes may wait on the same queue, signals may interrupt sleep, and another reader may consume the data first. Correct code waits around a condition, not around the fact that a wakeup happened.
poll Declares Wait Conditions
.poll is often misunderstood as “the driver waits for an event.”
It really does two things:
- attaches the caller to wait queues that may wake it
- immediately returns current readiness: readable, writable, error, hangup, and so on
Simplified model:
application poll/epoll_wait
-> kernel calls driver .poll
-> driver calls poll_wait(file, &wait, wait_table)
-> driver checks buffer
-> data exists, return POLLIN
-> no data, return 0 and process sleeps until wake_up
So .poll must match real state.
If data exists but .poll does not return POLLIN, the application may sleep.
If no data exists but .poll returns POLLIN, the application may wake and read nothing, or spin.
The poll return value is a promise to the user-space event loop. It cannot be guessed loosely.
Nonblocking read Must Agree With poll
Applications often do this:
epoll_wait returns readable
-> read(fd)
-> get data or EAGAIN
Even after epoll says readable, read may still return -EAGAIN. With multiple readers, edge-triggered mode, or state races, this can be normal.
But if poll often reports readable and read immediately returns EAGAIN, check:
- whether
.pollreturns readiness too broadly - whether buffer state is changed concurrently
- whether
readandpollprotect state with the same lock - whether wakeup happens before state update
- whether edge-triggered applications drain all data
Blocking, nonblocking, poll, and read must be built around the same readiness condition.
wake_up Belongs After State Update
A common bug is waking in the wrong order.
Bad sequence:
wake_up
-> application wakes
-> read checks buffer
-> data is not stored yet
-> returns EAGAIN or sleeps again
Better sequence:
store data / update state
-> required locking or memory ordering
-> wake_up
First make the condition true, then wake waiters.
If IRQ context and process context share state, protect it with locks, atomics, or appropriate memory ordering. Otherwise different CPUs may observe inconsistent state.
IRQ Storms and Lost Events Can Come From Clearing Interrupts
Clearing interrupt status correctly is also critical.
If the driver does not clear the hardware interrupt source, the IRQ line may remain active after the handler returns, and the kernel may immediately enter the handler again. That becomes an IRQ storm.
If the driver clears too early or reads status too late, it may lose events.
Common problems include:
- not checking whether the interrupt belongs to this device
- clearing the wrong status bit
- mixing up write-one-to-clear and write-zero-to-clear
- clearing before reading FIFO and losing edge events
- needing a register read to deassert the interrupt
- IRQ trigger type not matching hardware level/edge behavior
From user space, these bugs may look like repeated wakeups, no events, occasional data loss, or high CPU time in interrupts.
threaded IRQ Fits Work That May Sleep
If the work after an interrupt may sleep, do not put it in hard IRQ context. Examples include I2C/SPI access, slow regmap paths, waiting for hardware state, or taking locks that may sleep.
Linux provides threaded IRQs:
hard IRQ handler: confirm interrupt, return IRQ_WAKE_THREAD
threaded handler: process data in thread context, may sleep
This balances interrupt responsiveness with more complex processing.
But threaded IRQ is not a place to put arbitrary logic without discipline. It still needs correct concurrency, device state, IRQ enable/disable, runtime PM, remove handling, and wakeup ordering.
What to Check When poll Does Not Wake Correctly
When poll/epoll does not wake, wakes too often, or read blocks unexpectedly, split the path:
First, check whether the hardware interrupt really happens. Look at IRQ counts, trigger type, GPIO/IRQ configuration, and device status registers.
Second, check whether the IRQ handler confirms and clears the interrupt correctly. Is the source checked? Is the right status bit cleared? Is there an IRQ storm?
Third, check whether data enters the driver buffer. Is there a queue between IRQ and application? What happens on overflow?
Fourth, check state-update and wake_up ordering. The condition should become true before wakeup.
Fifth, check whether wait queues match. Does blocking read sleep on the same event source that .poll registers?
Sixth, check whether .poll returns accurate readiness. Return POLLIN only when data is actually readable, and POLLOUT only when write space exists.
Seventh, check whether nonblocking read and poll agree. Is EAGAIN a normal race, or a readiness-condition bug?
Eighth, check concurrency and teardown. Multiple readers, close, remove, suspend, and runtime PM must not leave waiters asleep forever or accessing freed state.
What to Remember in Practice
Interrupts, wait queues, and poll are not isolated code blocks.
The interrupt brings a hardware event into the kernel. The driver turns it into buffer and state changes. The wait queue sleeps and wakes processes. poll reports current readiness to the user-space event loop.
A reliable driver keeps these conditions aligned:
- when data exists, buffer state is true
.pollreturns readable- blocking
readcan be woken - nonblocking
readhas clear semantics wake_uphappens after state update- the interrupt is correctly confirmed and cleared
The application only sees whether epoll_wait returns. The driver has to preserve the causality between hardware events, kernel state, and user-visible readiness.