Why a System Call Is Not a Normal Function Call

Reading time: 7 minute Word count: 1327

Application code calls read(), write(), open(), or mmap() in a way that looks very similar to an ordinary function call. Pass a few arguments, receive a return value, check errno on failure.

But a system call is not a normal function call.

A normal function call stays inside the same process, privilege level, and address space. A system call moves the CPU from user space into kernel space and hands control to the kernel. The kernel does not receive “trusted arguments.” It receives a request from user space: whether the file descriptor is valid, whether the pointer is accessible, whether the length is safe, whether the process has permission, and whether the call should block all have to be checked.

A more useful model is this: a system call is a controlled entry point for user space to request kernel services.

Application code
-> C library wrapper
-> system-call instruction
-> CPU enters kernel mode
-> kernel checks arguments and permissions
-> dispatch to file, process, network, memory, or driver path
-> return to user space

This path explains many runtime behaviors: why read() may sleep, why a successful write() does not always mean data is on persistent storage, why a user pointer cannot simply be stored by a driver, and why the same interface behaves differently for files, sockets, and device nodes.

A Function Call Stays in the Same Layer

An ordinary function call mostly changes the program counter, stack, and registers.

For example:

caller
-> save return address and some registers
-> jump to callee
-> execute in the same address space
-> return to caller

It does not suddenly grant access to hardware registers, and it does not bypass memory protection. Function arguments are normally treated as ordinary data inside the same process.

A system call is different. A user-space program cannot directly modify page tables, schedule threads, access arbitrary device registers, or operate on kernel objects. It can only submit a request to the kernel, and the kernel decides whether it is allowed, how it should run, and when it can return.

So read() may look like a function in source code, but at runtime it crosses a privilege boundary.

CPU State Changes When Entering the Kernel

Modern CPUs usually provide different privilege levels. User space runs ordinary applications. Kernel space runs operating-system core code. A system-call instruction uses a CPU-approved entry path to switch from user mode to kernel mode.

This step usually changes several kinds of state:

current privilege level
instruction entry point
stack or kernel execution context
where some registers are saved
control information needed for interrupts, exceptions, and return to user space

The details differ across architectures such as x86, ARM, and RISC-V. The common point is that user space cannot jump into arbitrary kernel addresses. It can only enter through agreed entry points, and the kernel cannot trust the incoming parameters.

This is one reason system calls are heavier than ordinary function calls. They are not just jumps; they cross a protection boundary.

Arguments Come From Untrusted User Space

System-call arguments are supplied by user space. The kernel must treat them as untrusted input.

For example:

read(fd, buf, len)

The kernel has to care about at least these questions:

whether fd refers to an object opened by the current process
whether that object permits reading
whether buf is a writable user-space address
whether len is reasonable
whether touching those user pages may fault
whether the call may be interrupted by a signal

The kernel cannot treat buf as an ordinary kernel pointer and keep it indefinitely. It is a user-space virtual address. It may be invalid, lack permission, be freed later, or be modified concurrently by another thread.

That is why system calls often copy data between user space and kernel space, or establish stricter lifetime rules through mappings and pinned pages.

System Calls Dispatch to Kernel Objects

The application sees an integer fd; the kernel sees an object.

After read(fd, buf, len) enters the kernel, the kernel uses the current process’s file descriptor table to find the target object. That object may be a regular file, pipe, socket, character device, block device, or an interface in a pseudo filesystem.

The same read() can lead to very different paths:

regular file: may hit the page cache or trigger block I/O
pipe: may read from a kernel buffer or wait for a writer
socket: may read from the network receive queue
device node: may call into a driver implementation
procfs/sysfs: may generate state text dynamically

The system-call number is only the entry. The real semantics come from the kernel object and its operation table.

Uniform interfaces do not imply uniform behavior.

Return Values Do Not Always Mean Hardware Completion

Many system calls return the kernel-level result, not proof that everything has physically completed.

A successful write() to a regular file may only mean data entered the kernel page cache. It may be written to storage later by writeback. A device driver’s write operation may also enqueue a request while real hardware completion arrives later through an interrupt or completion event.

Networking is similar. A successful send() usually means the kernel accepted the data into the send path. It does not mean the peer application has received it, and certainly not that the business operation has completed.

When debugging data loss, power-failure consistency, network timeouts, or device stalls, separate these states:

user-space call returned successfully
kernel accepted the request
request entered a queue
hardware completed it
peer or persistent medium can actually see it

Collapsing all of them into “the call succeeded” leads to bad conclusions.

Blocking Makes System Calls Part of Scheduling

A system call does not always return immediately.

When an application reads from a socket, pipe, or device with no data available, the kernel may put the current thread to sleep and wake it later when data arrives. That involves the scheduler, wait queues, interrupts, timeouts, and signals.

application calls read
-> kernel finds no data available
-> current thread blocks
-> scheduler gives the CPU to another thread
-> interrupt or another thread makes data available
-> kernel wakes the waiting thread
-> read returns to user space

Nonblocking mode is different. If the object is not currently readable, the call may return EAGAIN. The application can then use poll, select, epoll, or its own event loop to decide when to retry.

This is another key difference from ordinary function calls: a system call may change thread state and give the CPU to other execution flows.

errno Points to a Boundary

When a system call fails, user space usually sees -1 and errno. These error codes are not just “the system broke.” They describe which boundary rejected the request.

Common examples:

EACCES: insufficient permission
EBADF: invalid file descriptor or wrong mode
EFAULT: user address is not accessible
EINTR: call was interrupted by a signal
EAGAIN: operation cannot complete now in nonblocking mode
ENOMEM: kernel could not allocate required resources
ENODEV: device does not exist or is unavailable

Read an error code in the context of the system-call path. It tells you whether the failure is about permissions, handles, addresses, resources, state, or devices.

What Matters in Practice

A system call is not a “lower-level function call.” It is the controlled entry point between user space and kernel space.

When writing applications, debugging drivers, or analyzing system behavior, keep these questions separate:

whether the call crosses the user/kernel boundary
whether arguments come from untrusted user space
which kernel object sits behind an fd, pid, address, or path
whether the call returns immediately, blocks, is interrupted by a signal, or only submits asynchronous work
which layer succeeded when the return value says success
whether an error code points to permissions, addresses, resources, state, or devices

The value of understanding system calls is not memorizing syscall numbers. It is knowing what happens to control, privilege, addresses, objects, and thread state after one line of application code enters the kernel.