Embedded software often fails in places that do not look architecture-related at first.
A firmware image is flashed but never reaches main(). An exception handler is never entered. A Linux program reports Exec format error. A driver gives a buffer to DMA and then reads stale data. Unaligned access works on one chip and becomes a hard fault on another. The same assembly startup code fails completely on a different profile.
These are not just cases of “using ARM” or “using RISC-V.” More precisely, they happen at the boundaries between CPU architecture, concrete chips, operating systems, and toolchains.
For embedded software, the most useful first model is this: CPU architecture defines how software is executed by the processor, how controlled entry paths work, how address spaces are accessed, and how binary interfaces are expected to behave. A concrete chip, bootloader, RTOS, Linux kernel, and toolchain then turn those rules into a real system.
ISA / registers / privilege / exceptions / memory model
-> startup / interrupt / scheduler / syscall / driver / ABI
-> boot failure / fault / stale DMA data / wrong binary / porting bug
So learning ARM or RISC-V should not begin with a full instruction table. The first question is where architecture changes software judgment.
Architecture Is Not a Chip
Engineering discussions often mix several layers:
- ISA: instruction encoding, base registers, and which instructions a program can execute
- privileged architecture: exceptions, privilege levels, interrupt entry, control registers, and address-translation rules
- microarchitecture: pipeline, cache hierarchy, branch prediction, execution units, and concrete implementation choices
- SoC: CPU cores plus interrupt controllers, timers, memory controllers, peripherals, and security blocks
- board: SoC, storage, power, crystals, debug interfaces, and board-level wiring
- software platform: bootloader, RTOS, Linux, C library, toolchain, and ABI
ARM and RISC-V often refer to an ISA or architecture family, not to one chip. A Cortex-M MCU, a Cortex-A Linux SoC, and an AArch64 server processor should not be collapsed into one startup, interrupt, or memory model. RISC-V has the same issue: the base ISA, standard extensions, privileged architecture, SBI, PLIC/CLINT or AIA, vendor peripherals, and concrete SoCs are different layers.
This boundary matters. When a problem appears, ask first: is this defined by the architecture, implemented by this chip, promised by the OS, or required by the toolchain ABI? If the layer is wrong, debugging quickly drifts.
Startup Begins at an Architectural Entry Point
After power-on or reset, a CPU does not know where your main() function is. It starts fetching instructions from an entry point defined by architecture and chip conventions.
On an MCU, that entry may involve a vector table, reset handler, and flash mapping address. On a more complex SoC, the first code may be boot ROM, followed by SPL, a bootloader, a kernel, or an RTOS. Different architectures and chips have their own conventions for reset state, exception-vector location, initial privilege level, address mapping, and boot parameters.
That directly affects several failures:
- the image link address does not match the load address
- the vector table is at the wrong location, so exceptions or interrupts jump away
- C code runs before the stack is set up
- BSS is not cleared or initialized data is not copied
- the bootloader jumps without satisfying the next stage’s entry contract
Boot failure is therefore not only an application problem. Reset entry, linker scripts, image format, boot parameters, exception vectors, and early memory state can all be the real first scene.
Exceptions and Interrupts Are Not Function Calls
An interrupt handler may look like a C function, but it is not called normally by the current code. Hardware events, CPU exception entry, interrupt controllers, and the operating system decide when it runs.
The CPU architecture usually defines the basic exception-entry behavior: how current execution state is saved, where the exception cause is recorded, how the return address is represented, which privilege state the processor enters, and which registers software must continue saving. The interrupt number, priority, masking, routing, and nesting are often handled by the interrupt controller and SoC. An RTOS or Linux then decides how ISR work, deferred handling, thread wakeups, and scheduling happen.
That is why interrupt problems must be split by layer:
- did the CPU enter the exception path?
- did the interrupt controller deliver the event to the target CPU?
- is the vector table or exception table correct?
- did the ISR acknowledge and clear the hardware event?
- did the OS schedule the follow-up work in the right context?
If an ISR is treated only as a callback automatically invoked by the system, nested interrupts, exception-return failures, long interrupt-disabled sections, and delayed task wakeups become hard to explain.
Privilege Decides Who Can Do What
Modern CPUs usually do not give all code the same permissions. Applications, kernels, exception handlers, virtualization layers, secure worlds, or machine modes may run at different privilege levels.
This is not abstract theory. It affects:
- whether user-space code can directly access peripheral registers
- why system calls must enter the kernel through controlled paths
- why RTOS or bare-metal code often manipulates hardware directly
- why Linux drivers run inside the kernel boundary
- why normal code may fault when reading or writing some control registers
- why secure boot, TEE, hypervisors, or SBI may appear in the boot path
ARM and RISC-V use different names, but the common idea is that the CPU uses privilege boundaries to limit what code can do. Crossing that boundary is no longer a normal function jump. It is a controlled path involving exceptions, traps, returns, and state restoration.
An Address Is Not Always Physical Memory
In small bare-metal MCU programs, an address often feels close to a real hardware address. Registers live at fixed addresses, arrays live in SRAM, and flash lives in another range. Even there, aliases, boot remapping, MPU permissions, and bus-access limits may exist.
On systems with an MMU, address meaning changes significantly. A user program sees virtual addresses. The CPU translates them through page tables and TLBs into physical memory. The kernel has its own virtual address space. A DMA device may see a bus address or I/O virtual address instead.
This changes driver and debugging judgment:
- a user pointer cannot be handed directly to DMA hardware
- a kernel virtual address is not necessarily a physical address
- a physical address is not necessarily the DMA address a device can access
- page-table permissions decide whether access succeeds, faults, or causes a page fault
- MPU/MMU configuration can change cache attributes and access permissions
When discussing an address, first ask which system model is involved: bare metal, RTOS, an MCU with MPU, or Linux with MMU. The same hexadecimal value can mean different things in each model.
Cache and Memory Ordering Change Visibility
A fast CPU does not mean a peripheral, another CPU core, or DMA sees the same memory state immediately.
Cache may let the CPU read an old value from a cache line, or keep CPU writes in a cache line before memory is updated. Out-of-order execution, write buffers, and bus transactions may also mean that source-code order is not the order observed by other agents. These issues may be invisible on simple single-core bare-metal systems, but they become concrete with DMA, multicore systems, Linux drivers, and high-performance peripherals.
Common results include:
- the CPU writes a DMA buffer, but the device reads old data
- the device writes memory, but the CPU still reads an old cache line
- one core sets a flag, while another core sees the flag but not the data
volatileprevents some compiler optimizations but does not provide full synchronization semantics- locks, atomics, memory barriers, and cache maintenance are mixed up or omitted
An architecture article does not need to list every barrier instruction, but it must make one point clear: data consistency is not guaranteed merely because code wrote something. Who can see the data, when they can see it, and in what order they can see it depends on the combined guarantees of the CPU, cache, bus, devices, and synchronization mechanism.
ABI Decides Whether a Binary Can Run Correctly
Source code that builds is not necessarily a binary that can run on the target. ABI is where architecture knowledge reaches the toolchain and runtime.
An ABI defines:
- which registers or stack locations carry function arguments and return values
- which registers are caller-saved and callee-saved
- how the stack is aligned
- whether floating-point arguments use hard-float or soft-float conventions
- the ELF machine, endianness, and ABI attributes
- the dynamic-linker path and C library compatibility
These problems often do not look like architecture problems. A program may print not found because the dynamic linker path is wrong. An executable may exist but report Exec format error because the ELF machine does not match. A library may link but crash at runtime because ABI or floating-point calling conventions differ.
Cross-compilation debugging therefore cannot stop at whether the compiler command succeeded. file, readelf, objdump, the dynamic linker, the C library inside the rootfs, and target kernel support all belong to the same path.
Architecture Knowledge Should End in Evidence
CPU architecture is not about memorizing terms. Its practical value is that it lets debugging be split into layers.
For boot failure, inspect reset entry, link address, vector table, image load address, and early serial output. For a fault, inspect exception cause, return address, stack, accessed address, and current privilege level. For a missing interrupt, inspect CPU exception entry, interrupt-controller state, masks, pending bits, and the ISR acknowledgement path. For stale DMA data, inspect buffer address type, cache attributes, mapping APIs, and synchronization direction. For a program that cannot run, inspect ELF machine, ABI, dynamic linker, and rootfs.
These judgments apply across ARM, RISC-V, and other architectures. The registers and instructions change, but the engineering layers remain stable: architecture defines entry points and boundaries, the SoC provides concrete hardware, the operating system organizes runtime paths, and the toolchain emits binaries that follow an ABI.
Once those layers are clear, the specific differences between ARM and RISC-V become much easier to understand without mistaking reference tables for understanding.