Operating Systems

22 Posts

Why a System Call Is Not a Normal Function Call

7 minute

Application code calls read(), write(), open(), or mmap() in a way that looks very similar to an ordinary function call. Pass a few arguments, receive a return value, check errno on failure.

But a system call is not a normal function call.

A normal function call stays inside the same process, privilege level, and address space. A system call moves the CPU from user space into kernel space and hands control to the kernel. The kernel does not receive “trusted arguments.” It receives a request from user space: whether the file descriptor is valid, whether the pointer is accessible, whether the length is safe, whether the process has permission, and whether the call should block all have to be checked.

Read More

What Happens From Power-On to Application Start?

5 minute

When a device boots slowly, an application does not start, a driver does not load, or a network service fails, people often jump straight to application logs.

But the application is only the last part of the boot chain. After power-on, the CPU does not directly jump to business logic. It starts from a fixed entry, initializes the minimal hardware environment, finds the next-stage image, loads an OS or RTOS, initializes memory, devices, and scheduling, and only then reaches the application.

Read More

RTOS vs Linux Is Not Just About Size

7 minute

When comparing an RTOS and Linux, people often start with an intuitive difference: an RTOS is small, Linux is large.

That is true, but too coarse. What affects engineering choices and debugging is not binary size alone. It is the different problems they are designed to solve by default.

An RTOS is common in resource-constrained device-side systems with clear response paths and fixed control cycles. Linux is common when resources are richer and the system needs process isolation, network stacks, filesystems, complex drivers, and application ecosystems.

Read More

Why Filesystems Fear Sudden Power Loss

7 minute

Many field failures end with the same sentence: the device lost power right after writing configuration, and after reboot the file was damaged.

The application clearly called write(), and it even returned success. The filesystem may not be completely broken, but a config file becomes empty, a log tail is garbage, a database rolls back, or an update package fails verification.

This is often misunderstood as “the filesystem is unreliable.” A more accurate view is: filesystems trade off performance, lifetime, and consistency; applications must also define whether they need write return, storage persistence, or a complete business update.

Read More

Why DMA and Cache Can Make Data Inconsistent

8 minute

One of the most confusing driver bugs is “the data is clearly in memory, but the other side cannot see it.”

The CPU prepares a transmit buffer, but the network device sends old data. DMA has written received data into memory, but the driver or application still reads stale values. Adding logs makes the issue disappear; changing optimization brings it back.

These bugs are often not because DMA failed or the pointer is wrong. The problem is that CPU cache, DMA device, and memory visibility were not handled correctly.

Read More

Why Inter-Process Communication Is More Than Passing Data

8 minute

Process isolation solves an important problem: if one process corrupts its own memory, it usually does not directly corrupt another process.

But isolation creates another problem: two processes cannot share ordinary variables like two threads can. A pointer inside one process usually means nothing inside another process. Once a system is split into multiple processes, IPC, or inter-process communication, becomes necessary.

IPC is often understood as “send data from A to B.” That is only part of it.

Read More

Why Virtual Memory Hides Real Memory From Programs

8 minute

When a program prints a pointer, it looks like it has obtained “a memory address.” Many wrong assumptions start there.

On systems such as Linux, the address seen by a user program is usually not a physical memory address. It is a virtual address. The same 0x400000 in two different processes can point to completely different physical pages. A pointer from one process is not directly meaningful in another process.

Read More

What Does a Context Switch Actually Switch?

8 minute

“Switch to another task” sounds light, as if the CPU simply moves from one piece of code to another.

The real operation is more concrete. While the CPU is running an execution flow, registers contain intermediate state, the stack contains the call chain, the program counter points to the next instruction, and the scheduler knows whether the flow is running, ready, or blocked. To run another thread or task, the system must save the current state and restore another one.

Read More

Locks, Deadlocks, and Priority Inversion

9 minute

Many concurrency bugs trigger the same first reaction: add a lock. That is only half right.

A lock can protect shared state, but it does not make concurrency problems disappear. It turns “multiple execution flows modify data at the same time” into rules about who enters first, who waits, and who releases. If those rules are poorly designed, the system may stop corrupting data and instead start hanging, stalling, timing out, or delaying high-priority work.

Read More

Why the Scheduler Decides System Responsiveness

9 minute

Many performance problems first look like “the CPU is too slow”: a button responds late, UART handling misses data, network packets pile up, the UI stalls, or sensor data reaches the application thread late.

But CPU load is only one layer. What often decides responsiveness is when the code gets to run.

Code existing in memory does not mean it can run at any moment. It may not be ready yet. It may be waiting for a lock, queue, or I/O. It may be blocked behind a higher-priority task. It may have just been woken by an interrupt but not yet selected by the scheduler.

Read More