Application code calls read(), write(), open(), or mmap() in a way that looks very similar to an ordinary function call. Pass a few arguments, receive a return value, check errno on failure.
But a system call is not a normal function call.
A normal function call stays inside the same process, privilege level, and address space. A system call moves the CPU from user space into kernel space and hands control to the kernel. The kernel does not receive “trusted arguments.” It receives a request from user space: whether the file descriptor is valid, whether the pointer is accessible, whether the length is safe, whether the process has permission, and whether the call should block all have to be checked.
A more useful model is this: a system call is a controlled entry point for user space to request kernel services.
Application code
-> C library wrapper
-> system-call instruction
-> CPU enters kernel mode
-> kernel checks arguments and permissions
-> dispatch to file, process, network, memory, or driver path
-> return to user space
This path explains many runtime behaviors: why read() may sleep, why a successful write() does not always mean data is on persistent storage, why a user pointer cannot simply be stored by a driver, and why the same interface behaves differently for files, sockets, and device nodes.
A Function Call Stays in the Same Layer
An ordinary function call mostly changes the program counter, stack, and registers.
For example:
caller
-> save return address and some registers
-> jump to callee
-> execute in the same address space
-> return to caller
It does not suddenly grant access to hardware registers, and it does not bypass memory protection. Function arguments are normally treated as ordinary data inside the same process.
A system call is different. A user-space program cannot directly modify page tables, schedule threads, access arbitrary device registers, or operate on kernel objects. It can only submit a request to the kernel, and the kernel decides whether it is allowed, how it should run, and when it can return.
So read() may look like a function in source code, but at runtime it crosses a privilege boundary.
CPU State Changes When Entering the Kernel
Modern CPUs usually provide different privilege levels. User space runs ordinary applications. Kernel space runs operating-system core code. A system-call instruction uses a CPU-approved entry path to switch from user mode to kernel mode.
This step usually changes several kinds of state:
- current privilege level
- instruction entry point
- stack or kernel execution context
- where some registers are saved
- control information needed for interrupts, exceptions, and return to user space
The details differ across architectures such as x86, ARM, and RISC-V. The common point is that user space cannot jump into arbitrary kernel addresses. It can only enter through agreed entry points, and the kernel cannot trust the incoming parameters.
This is one reason system calls are heavier than ordinary function calls. They are not just jumps; they cross a protection boundary.
Arguments Come From Untrusted User Space
System-call arguments are supplied by user space. The kernel must treat them as untrusted input.
For example:
read(fd, buf, len)
The kernel has to care about at least these questions:
- whether
fdrefers to an object opened by the current process - whether that object permits reading
- whether
bufis a writable user-space address - whether
lenis reasonable - whether touching those user pages may fault
- whether the call may be interrupted by a signal
The kernel cannot treat buf as an ordinary kernel pointer and keep it indefinitely. It is a user-space virtual address. It may be invalid, lack permission, be freed later, or be modified concurrently by another thread.
That is why system calls often copy data between user space and kernel space, or establish stricter lifetime rules through mappings and pinned pages.
System Calls Dispatch to Kernel Objects
The application sees an integer fd; the kernel sees an object.
After read(fd, buf, len) enters the kernel, the kernel uses the current process’s file descriptor table to find the target object. That object may be a regular file, pipe, socket, character device, block device, or an interface in a pseudo filesystem.
The same read() can lead to very different paths:
- regular file: may hit the page cache or trigger block I/O
- pipe: may read from a kernel buffer or wait for a writer
- socket: may read from the network receive queue
- device node: may call into a driver implementation
- procfs/sysfs: may generate state text dynamically
The system-call number is only the entry. The real semantics come from the kernel object and its operation table.
Uniform interfaces do not imply uniform behavior.
Return Values Do Not Always Mean Hardware Completion
Many system calls return the kernel-level result, not proof that everything has physically completed.
A successful write() to a regular file may only mean data entered the kernel page cache. It may be written to storage later by writeback. A device driver’s write operation may also enqueue a request while real hardware completion arrives later through an interrupt or completion event.
Networking is similar. A successful send() usually means the kernel accepted the data into the send path. It does not mean the peer application has received it, and certainly not that the business operation has completed.
When debugging data loss, power-failure consistency, network timeouts, or device stalls, separate these states:
- user-space call returned successfully
- kernel accepted the request
- request entered a queue
- hardware completed it
- peer or persistent medium can actually see it
Collapsing all of them into “the call succeeded” leads to bad conclusions.
Blocking Makes System Calls Part of Scheduling
A system call does not always return immediately.
When an application reads from a socket, pipe, or device with no data available, the kernel may put the current thread to sleep and wake it later when data arrives. That involves the scheduler, wait queues, interrupts, timeouts, and signals.
application calls read
-> kernel finds no data available
-> current thread blocks
-> scheduler gives the CPU to another thread
-> interrupt or another thread makes data available
-> kernel wakes the waiting thread
-> read returns to user space
Nonblocking mode is different. If the object is not currently readable, the call may return EAGAIN. The application can then use poll, select, epoll, or its own event loop to decide when to retry.
This is another key difference from ordinary function calls: a system call may change thread state and give the CPU to other execution flows.
errno Points to a Boundary
When a system call fails, user space usually sees -1 and errno. These error codes are not just “the system broke.” They describe which boundary rejected the request.
Common examples:
EACCES: insufficient permissionEBADF: invalid file descriptor or wrong modeEFAULT: user address is not accessibleEINTR: call was interrupted by a signalEAGAIN: operation cannot complete now in nonblocking modeENOMEM: kernel could not allocate required resourcesENODEV: device does not exist or is unavailable
Read an error code in the context of the system-call path. It tells you whether the failure is about permissions, handles, addresses, resources, state, or devices.
What Matters in Practice
A system call is not a “lower-level function call.” It is the controlled entry point between user space and kernel space.
When writing applications, debugging drivers, or analyzing system behavior, keep these questions separate:
- whether the call crosses the user/kernel boundary
- whether arguments come from untrusted user space
- which kernel object sits behind an
fd, pid, address, or path - whether the call returns immediately, blocks, is interrupted by a signal, or only submits asynchronous work
- which layer succeeded when the return value says success
- whether an error code points to permissions, addresses, resources, state, or devices
The value of understanding system calls is not memorizing syscall numbers. It is knowing what happens to control, privilege, addresses, objects, and thread state after one line of application code enters the kernel.