Skip to main content Why Page Cache Makes File I/O Different From Direct Disk Access | IoT Worker

Why Page Cache Makes File I/O Different From Direct Disk Access

Linux devices often show behavior that looks odd at first: a large file is much faster to read the second time; write() returns quickly after writing data; free shows less available memory even though the system is not really leaking memory.

The same mechanism is often behind all of these observations: Page Cache.

Page Cache is the kernel layer that caches file contents in memory. It prevents file reads and writes from touching much slower storage on every operation. On reads, the kernel can check the cache first. On writes, the kernel can update cached pages first and write them back to storage later.

A useful first model is: applications usually interact with the kernel Page Cache when reading and writing files; disk, flash, or block-device I/O may happen later, or may not happen at all if the cache hits.

application read/write
-> system call
-> filesystem
-> Page Cache
-> block device queue
-> storage medium

This path explains why file I/O return time, real storage completion time, memory usage, and power-failure risk are not the same thing.

Why the Second Read Is Faster

When a file is read for the first time, the data may not be in memory. The kernel has to use the filesystem to locate the data blocks, read them from the block device into memory pages, place those pages in Page Cache, and then copy data to user space.

first read
-> Page Cache miss
-> block I/O is issued
-> data enters Page Cache
-> data is copied to application

When the same region is read again, if the cached pages are still present, the kernel can return data directly from Page Cache.

second read
-> Page Cache hit
-> return directly from memory

That is why repeated file reads can be much faster than the storage device itself. The measured speed is the memory-cache path, not disk or flash speed.

Performance tests that do not distinguish cold cache from warm cache often mistake Page Cache hits for fast storage.

Why Writes Can Return Quickly

When writing a file, the kernel usually does not need to wait for data to reach persistent storage immediately.

The application calls write(), the kernel copies user data into Page Cache, marks the affected pages dirty, and then write() returns. Actual writeback to storage can be triggered later by background writeback, memory pressure, fsync(), or other synchronization mechanisms.

application write
-> data copied into Page Cache
-> pages marked dirty
-> write returns
-> background writeback happens later

This greatly improves performance. Small writes can be merged, random writes can be organized, and the application does not wait for slow storage on every operation.

The cost is that a successful write() does not mean data is persistent. It usually means the kernel accepted the write and updated the in-memory file state.

If the device suddenly loses power while dirty pages have not been written back, recent data may be lost.

When Dirty Pages Are Written Back

Dirty pages do not stay in memory forever. The kernel writes them back under conditions such as:

  • dirty page count or ratio reaches a threshold
  • dirty pages remain dirty for too long
  • memory pressure requires page reclaim
  • the application calls fsync(), fdatasync(), or sync()
  • file close or filesystem policy triggers some writeback
  • mount options or block-device policy require more aggressive synchronization

One important point: close() is not a reliable persistence guarantee. In many cases, closing a file releases the file descriptor but does not wait for all data to be safely stored.

If the business requirement is “after return, the data survives power loss,” ordinary write() and close() are not enough. Use fsync(), atomic replacement, directory synchronization, and application-level validation protocols where needed.

Page Cache Memory Is Not a Leak

Linux tries to use free memory for useful caching. Empty memory has no value, while cached file pages can speed up later I/O.

So seeing less available memory in free does not necessarily mean a process is leaking. Much of that memory may be Page Cache, reclaimable when needed.

It is more useful to distinguish:

  • anonymous process memory, such as heap and stack
  • file page cache
  • kernel object caches such as slab
  • unreclaimable memory
  • reclaimable cache

Under memory pressure, clean Page Cache pages can usually be discarded. Dirty pages must be written back before they can be reclaimed.

That is why “lots of cache” is not necessarily dangerous, but “lots of dirty pages with slow writeback” can cause stalls, write amplification, or bursts of I/O.

mmap Also Uses Page Cache

When a regular file is mapped with mmap, the application appears to access memory directly, but the file pages behind that virtual address range are usually managed by Page Cache.

When the application accesses a mapped page that is not loaded yet, the CPU triggers a page fault. The kernel reads the corresponding file page into Page Cache and installs a page-table mapping.

access mmap address
-> page fault
-> file page read into Page Cache
-> page-table mapping installed
-> application continues

Writing to a shared file mapping may also modify cached pages first and later synchronize them to the file. Visibility to other processes, writeback timing, and whether msync() is needed all depend on the mapping type and synchronization policy.

So mmap is not a universal way to bypass Page Cache. For regular files, it often maps cached file pages into the process address space.

Why Direct I/O Bypasses Cache

Some workloads do not want Page Cache, such as databases, virtual machine images, large streaming I/O, or applications with their own cache management. They may use Direct I/O so data moves as directly as possible between user buffers and block devices.

Reasons to bypass Page Cache include:

  • avoiding duplicate copies in application cache and kernel cache
  • preventing large streaming I/O from evicting useful cached pages
  • letting the application control caching and persistence policy
  • reducing some copies and cache pollution on certain paths

But Direct I/O has costs:

  • stricter alignment requirements
  • small I/O may perform worse
  • the application must manage caching and readahead strategy
  • mixing it with ordinary buffered I/O can create consistency complexity

Direct I/O is not “more advanced I/O.” It simply moves some caching decisions from the kernel back to the application.

Readahead and Writeback Change Performance Curves

Page Cache is not only passive. The kernel also performs readahead based on access patterns.

If an application reads a file sequentially, the kernel may prefetch later pages into cache. By the time the application calls read() again, the data may already be in memory.

The write path has a similar effect. Background writeback batches dirty pages to reduce the cost of frequent small writes. But this can make performance uneven: early writes are fast, then a later point stalls because dirty pages accumulated, writeback cannot keep up, or the storage device enters a slow erase/write path.

These stalls are especially visible on embedded flash, SD cards, eMMC, and low-end storage.

So file I/O performance cannot be judged only by the latency of a single write(). Dirty page buildup, background writeback, device queues, write amplification, and tail latency all matter.

Page Cache and Application Cache Are Different Layers

Many applications have their own caches: database buffer pools, language runtime buffers, C stdio buffers, logging library buffers.

These are not the same layer as Page Cache.

application object cache
-> language / stdio buffer
-> system call
-> Page Cache
-> filesystem and block device

Calling fflush() may only flush C library buffers into the kernel. It does not mean data is persistent. fsync() asks the kernel to synchronize file data to storage. A database transaction commit still depends on whether the database uses the correct persistence primitives below it.

When debugging “we flushed but lost data after power failure,” first identify which layer was flushed.

How to Debug Page Cache Issues

When file reads or writes are unexpectedly fast or slow, data disappears after power loss, memory seems consumed, or periodic I/O stalls appear, split the path into layers.

First, identify whether the path uses buffered I/O, mmap, or Direct I/O.

Second, check whether read performance comes from warm cache. Cold-cache and repeated-read behavior must be separated.

Third, check whether writes only created dirty pages. Do you need fsync(), fdatasync(), directory sync, or an application-level commit protocol?

Fourth, separate reclaimable Page Cache from anonymous process memory and unreclaimable kernel memory.

Fifth, check whether dirty writeback keeps up. Look for bursty writes, slow flash, device queue congestion, or write amplification.

Sixth, check whether application-level caches duplicate kernel Page Cache. Databases, logging, and streaming workloads may need explicit cache policy.

These questions turn vague statements like “the filesystem is slow,” “memory is leaking,” or “writes are unreliable” into concrete paths.

What Matters in Practice

Page Cache trades memory for I/O performance.

It makes repeated reads faster, lets writes be merged and delayed, and turns otherwise idle memory into useful cache. But it changes important intuitions: read() may not touch storage, successful write() may not be persistent, less free memory may not be a leak, and regular-file mmap often still depends on cached pages.

For engineering decisions, separate four states:

  • the application handed data to the kernel
  • data exists only in Page Cache
  • dirty pages have been written back to storage
  • the business operation is complete and recoverable

Page Cache is not a minor filesystem detail. It is a primary path for understanding Linux file I/O, memory usage, performance jitter, and power-failure risk.