Many field failures end with the same sentence: the device lost power right after writing configuration, and after reboot the file was damaged.
The application clearly called write(), and it even returned success. The filesystem may not be completely broken, but a config file becomes empty, a log tail is garbage, a database rolls back, or an update package fails verification.
This is often misunderstood as “the filesystem is unreliable.” A more accurate view is: filesystems trade off performance, lifetime, and consistency; applications must also define whether they need write return, storage persistence, or a complete business update.
The safest first model is this: a file write passes through application buffering, kernel page cache, filesystem metadata, block device queues, and storage media. write() success means data reached one layer, not necessarily that it is safely and completely on nonvolatile storage with the intended business meaning.
application write
-> C library buffering
-> kernel page cache
-> filesystem data block and metadata allocation
-> block device / flash driver
-> storage media actually completes write
When power fails, the result depends on where the write path stopped.
write Success Does Not Mean Durable Storage
Many applications treat successful write() as “the data is on storage.” That is not accurate enough.
For performance, operating systems often put writes into the in-memory page cache first and write them to storage later in batches. This merges small writes, reduces random I/O, and improves throughput.
So write() success often means:
- arguments were valid
- data was copied from user space to the kernel
- the kernel accepted the write
- the in-memory file state was updated
It does not necessarily mean:
- data reached the storage chip
- file metadata was synced
- directory entry was persisted
- device internal write cache completed
- the new content can be read after power loss
If an application needs data to survive power loss, it usually needs fsync(), fdatasync(), directory sync, atomic replacement, and storage cache flush behavior to be considered.
File Data and Metadata Are Different
A file is not only content. It also has metadata.
Metadata includes:
- file size
- permissions
- timestamps
- data block locations
- inode information
- directory entries
File data and metadata may be written to storage at different times.
For example, creating and writing a new file involves:
create directory entry
allocate inode
allocate data blocks
write file content
update file size
update directory and inode metadata
Power can fail between any of these steps. Results may include:
- content was written, but directory entry was not persisted, so the file disappears
- directory entry exists, but file size is old
- file size updated, but some data blocks were not written
- old and new data are mixed
Power-loss consistency is not only “did the content get written.” The filesystem also needs data blocks, metadata, and directory structure to recover consistently.
Journaling Mainly Protects Metadata Consistency
Many general-purpose filesystems use journaling.
The basic idea is: before modifying filesystem structures, record the intended metadata updates in a journal. After power loss, the filesystem can replay or discard incomplete operations so the structure returns to a consistent state.
This prevents many severe problems:
- inode points to unallocated blocks
- free-space bitmap disagrees with actual use
- directory structure is corrupted
- filesystem requires long full-volume scan
But journaling usually prioritizes filesystem structural consistency. It does not necessarily guarantee application data semantics.
For example, if a configuration file is overwritten halfway, the filesystem may recover with a consistent structure, while the config content is still half-new, half-old, empty, or otherwise invalid.
So journaling is not an application-level transaction. It helps the filesystem recover; it does not automatically make your business update atomic.
Why Overwrite-in-Place Is Dangerous
Many applications save configuration by opening the original file and overwriting it:
open config
truncate to zero
write new content
close
If power fails after truncate and before the write completes, reboot may see an empty file or half a file.
A safer pattern is usually: write a temporary file, then atomically replace:
write config.tmp
fsync config.tmp
rename config.tmp -> config
fsync directory
The key is rename. Within one filesystem, rename usually provides atomic directory-entry replacement: after reboot, you should see either the old file or the new file, not a half-renamed state.
But that is not enough. The temporary file must be fsynced so its content is durable. After rename, the directory also needs to be synced so the directory entry replacement itself is durable.
Many “I used a temp file but still lost config” bugs miss file or directory sync.
What fsync Actually Guarantees
fsync(fd) aims to synchronize file data and required metadata to storage so the file can recover to that state after power loss.
Engineering boundaries matter:
fsyncon a file does not necessarily sync the parent directory entryfdatasyncmay sync only data and required metadata- the storage device may have its own write cache
- actual flush behavior depends on hardware reliability
- filesystem mount options affect ordering and journaling behavior
If a file is newly created or replaced by rename, parent directory persistence is also important. The content may be durable while the directory update is not.
So when you say “we called fsync,” ask:
- which fd was synced
- whether the directory was synced
- whether the storage device actually completed flush
- whether the business update spans multiple files
fsync is a persistence tool, not an automatic business transaction.
Flash Adds Erase and Wear Problems
Embedded devices often use flash. Flash is not ordinary memory. It usually cannot freely change a byte from 0 back to 1; it must erase by block and program by page.
This creates consequences:
- small file updates may trigger larger erase/write operations
- write latency may be unstable
- erase cycles are limited, requiring wear leveling
- power may fail during erase, move, or writeback
- FTL or filesystem mapping tables may also need updates
Raw NAND/NOR, eMMC, SD card, UFS, SPI NOR, and SPI NAND have different power-loss behavior and reliability.
Some devices have internal controllers and FTLs that map logical blocks to physical flash. Some systems use flash-oriented filesystems such as JFFS2, UBIFS, or LittleFS. Each choice handles power-loss recovery, wear leveling, and write amplification differently.
So on embedded devices, “write a file” may involve erase, relocation, mapping-table updates, and bad-block management. Sudden power loss during these internal steps exposes whether the consistency design is sound.
Why Databases and Logs Still Implement Transactions
If the filesystem has a journal, why do databases, configuration systems, and update systems still implement their own transactions?
Because filesystem journals usually do not know application semantics.
They know filesystem structures such as inodes, directory entries, and block allocation. They do not know:
- these three files must update together
- configuration must satisfy a checksum
- database pages have version relationships
- the boot partition must not switch before the update package is complete
- log records must be replayable in business order
Reliable systems often add an application-level protocol:
- dual configuration copies and version numbers
- checksum or CRC
- temporary file plus atomic rename
- write-ahead log
- transaction commit marker
- A/B partition update
- recovery or rollback at startup
The filesystem keeps lower-level structure consistent. Application protocol makes business state decidable and recoverable.
What to Check After Power-Loss Corruption
When files disappear, configuration becomes empty, a database is damaged, or an update fails after power loss, check these layers.
First, does the application treat write() return as success? Is there fsync or equivalent persistence?
Second, does it overwrite the original file? Is there a truncate-then-write window?
Third, does it use a temporary file and atomic rename? Are both the temp file and directory synced?
Fourth, does business state span multiple files? Is there a commit marker, version number, or recovery logic?
Fifth, is the storage reliable? Does it have write cache, power-loss protection, hold-up time, and correct flush support?
Sixth, does the flash filesystem fit the workload? Do write frequency, erase-block size, wear leveling, and power-loss recovery match?
Seventh, does the test really simulate power loss? Pull timing, load, and power hold-up capacitors affect reproduction.
These questions are closer to the root cause than “the filesystem is unreliable.”
What to Remember in Practice
Filesystems fear sudden power loss because a write is not completed in one instant.
One file update may cross:
- application buffering
- kernel page cache
- data block writes
- metadata updates
- filesystem journal
- block device queue
- storage device cache
- flash erase and writeback
write() success does not mean business data is safely durable. A journaling filesystem can protect filesystem structure without automatically protecting application data semantics.
If data must survive power loss, treat file update as a protocol: write a temp file, sync content, atomically replace, sync the directory, add checksums, keep old versions, and make startup recovery explicit.