Many Linux services eventually become event loops: network sockets, pipes, timerfd, eventfd, device nodes, and signal notifications all enter a select, poll, or epoll loop.
This is not because epoll is a more advanced read(). It solves a more basic problem: one thread cannot block on many read() calls at the same time.
If a program has only one socket, blocking read() is natural. Once it must handle hundreds of connections, one control pipe, several timers, and a device fd, it cannot let the thread get stuck on any single object.
A useful first model is: I/O multiplexing does not read or write data for you. It tells you which file descriptors may currently be readable, writable, or in error. Real transfer still happens through later read(), write(), accept(), or recv().
many file descriptors
-> register with select/poll/epoll
-> thread blocks at one shared wait point
-> kernel finds ready fds
-> thread wakes
-> application handles events one by one
-> go back to waiting
That is the core of an event loop: many I/O waits are collected into one blocking point.
Why Blocking read Is Not Enough
Suppose one thread handles two sockets.
If it calls blocking read() on socket A first and A has no data, the thread sleeps. Socket B may already have data, but the thread cannot process it.
read(A) blocks
-> B has data
-> thread still sleeps on A
One rough solution is one thread per connection. This works at small scale, but many connections create problems with thread count, stack memory, context switches, scheduling delay, and lock contention.
Another solution is setting all fds nonblocking and polling them repeatedly:
read(A) -> EAGAIN
read(B) -> EAGAIN
read(C) -> EAGAIN
repeat
That wastes CPU.
I/O multiplexing solves the middle problem: the sleeping thread is not tied to one fd. The kernel waits for readiness across many fds.
Readiness Is Not Completion
select, poll, and epoll return readiness events.
Readiness usually means that if the application performs that class of I/O now, it is relatively likely not to block immediately.
It does not mean:
- all data has been read
- the buffer contains a full protocol message
- the next
read()must succeed - one
write()can write the entire output - the connection has no error
For example, a readable TCP socket may contain data, or it may mean the peer closed the connection and read() returns 0. Writable does not mean unlimited writing; it only means there is some space in the send buffer.
So event loops must separate “fd is ready” from “business operation is complete.” Readiness only tells the application it can advance an I/O state machine.
Nonblocking fds Are the Basis of Event Loops
Event loops usually set fds to nonblocking mode.
The reason is direct: after using epoll or a similar mechanism to wait for readiness, handling one fd should not block the whole event loop. Otherwise one slow connection, partial protocol message, or abnormal device can stall all other events.
A typical read path is:
epoll_wait returns fd readable
-> read in a loop until EAGAIN
-> pass data to protocol parser
-> handle next event
EAGAIN is not an exception here. It means there is no more data currently available. The event loop returns to waiting.
The write path is similar. A response may not be fully written in one call, so the application stores the remaining output and continues when the fd becomes writable again.
This is why event-driven code becomes more complex: a request is no longer completed by one function call. It is advanced by a state machine across readiness events.
select, poll, and epoll Differ in Interface and Scale
select, poll, and epoll can all wait for multiple fds, but their scale and interface model differ.
select uses fixed-size fd sets. Each call passes the sets to the kernel, and the application scans them after return. It has fd-count limits and copy/scanning costs.
poll uses an array of fd/event entries and avoids the same fixed fd-set limit, but each call still passes the whole array to the kernel, and the application scans it after return.
epoll separates “which fds are interesting” from “wait for events.” The application registers fds with an epoll instance first, then epoll_wait returns ready events. This is better suited to many long-lived connections.
Simplified:
select/poll: bring a full fd list each time
epoll: register fds once, then wait for ready events
Not every program must use epoll. For small fd counts and simple logic, poll may be more direct. epoll shines with many fds and long-running event loops.
Level Trigger and Edge Trigger Are Common Pitfalls
epoll commonly uses level-triggered or edge-triggered behavior.
Level trigger is like repeated notification: as long as the fd still satisfies the condition, epoll_wait may keep reporting it.
buffer still has data
-> each wait may report readable
Edge trigger reports state transitions. For example, it notifies when an fd changes from not readable to readable. If the application reads only part of the data and does not drain to EAGAIN, remaining data may not trigger another notification.
not readable -> readable: notification
application reads only part
still readable, but no new edge
may not notify again
That is why edge-triggered loops usually require nonblocking fds and reading or writing until EAGAIN on each event.
Edge trigger can reduce repeated notifications, but is easier to get wrong. An event loop that occasionally stops receiving data often has an edge-trigger drain bug.
Event Loops Are Not Only for Networking
Linux exposes many objects as fds or fd-like event sources.
Common examples include:
- sockets
- pipes
- eventfd
- timerfd
- signalfd
- inotify fd
- character devices
- input devices
- netlink sockets
This lets one thread handle networking, internal wakeups, timers, signals, device events, and kernel notifications through one event loop.
For example, a device service may wait on:
business socket
control socket
timerfd for periodic work
eventfd for cross-thread wakeup
device node readable event
This unified event model is a major reason Linux services commonly use event loops.
Heavy Handlers Hurt the Whole Loop
Event loops often process events serially in one thread or a small number of threads. If one event handler runs too long, other ready fds must wait.
Common problems include:
- synchronous DNS in an event callback
- slow flash writes in the event-loop thread
- heavy computation inside a callback
- processing too much data from one connection in one turn
- synchronous log flushing
- holding locks too long
Symptoms include latency jitter, connection timeouts, missed heartbeats, and late device-event handling.
The basic discipline is: keep callbacks short, and move expensive work to worker threads, async I/O, state-machine slices, or queues.
Backpressure Matters More Than Reading and Writing More
Event-driven services often make this mistake: if readable, keep reading; if writable, keep writing; ignore whether downstream can keep up.
If input is faster than processing, memory queues grow.
If output is faster than the peer can receive, pending send buffers grow.
If one slow connection occupies the loop, other connections slow down.
Event loops need backpressure:
- cap per-fd work per iteration
- pause reading when output buffers exceed thresholds
- stop accepting new work when downstream queues are full
- time out or close slow connections
- keep low-priority traffic from drowning high-priority events
I/O multiplexing solves waiting. It does not automatically solve flow control.
Device fds Do Not Always Behave Like Network fds
Many drivers implement poll, allowing device fds to enter event loops.
But readiness semantics are defined by the driver. The driver must wake wait queues correctly when data is readable, space is writable, errors occur, or state changes.
If the driver implementation is incomplete, applications may see:
pollnever returns- fd repeatedly reports readable but
read()has no data - edge events are lost
- device removal has unclear event semantics
- nonblocking
read()andpollstate disagree
So when using a device fd in an event loop, do not assume socket semantics. Check what the driver promises for read/write/poll, blocking, nonblocking, and error states.
How to Debug Event Loop Problems
When a service spins CPU, connections stall, latency jitters, events disappear, or a device fd never triggers, split the path into layers.
First, check whether fds are nonblocking. Can any event handler block again?
Second, check whether readiness is fully consumed. With edge trigger, does the code read or write until EAGAIN?
Third, check whether handlers are too heavy. Look for synchronous DNS, disk I/O, computation, long locks, or synchronous logging.
Fourth, check backpressure. Do input queues, output buffers, and downstream queues have limits?
Fifth, check event-source semantics. Sockets, pipes, timerfd, and device fds do not all mean the same thing by readable or writable.
Sixth, check error and close handling. Are EPOLLERR, EPOLLHUP, read returning 0, EINTR, and EAGAIN handled correctly?
Seventh, check cross-thread wakeups. Do eventfd, pipe wakeups, custom queues, and locks cooperate correctly?
These questions are closer to the cause than saying “epoll is buggy.”
What Matters in Practice
I/O multiplexing answers one question: how can one thread wait for many I/O objects?
It does not perform I/O for the application, guarantee complete protocol messages, handle backpressure automatically, or make slow callbacks fast. It collects readiness events from many fds into one event loop so the application can advance each state machine.
The important questions are not only how to call epoll_wait, but:
- are fds nonblocking
- are readiness events drained correctly
- are callbacks short
- is backpressure explicit
- are error and close paths complete
- is each fd’s readiness semantics understood
Once these boundaries are clear, select, poll, and epoll stop being API names and become a core model for Linux services and device-event handling.