ARM Cortex-A Bare Metal: Comprehensive Analysis of Architecture and Assembly Instructions
Learning Stage: ARM Cortex-A Bare Metal Development
Core Objective: Break free from the operating system to master CPU instruction execution, register management, stack mechanisms, and program flow control
Keywords: ARM architecture, RISC, assembly instructions, loops, function calls, stack operations, bare metal development
I. ARM Architecture: More Than Just a CPU, It's a Complete Ecosystem
1. System Layering and the Essence of Bare Metal Development
A typical ARM system's software/hardware layers:
text
Application Layer (APP) → System Calls (SYS) → Kernel → Hardware (SoC)
- Core of Bare Metal Development: Remove the Kernel layer; programs run directly on SoC hardware, requiring manual management of registers, stack, and peripherals.
- SoC Composition: CPU (Cortex-A core) + peripherals (GPIO, UART, PWM, etc.), the hardware core of embedded systems.
2. Key Processing Unit Comparison (Clarifying Development Positioning)
| Name | Meaning | Core Features | Application Scenarios |
|---|---|---|---|
| CPU | Central Processing Unit | Strong general-purpose computing | Core scenarios for all embedded systems |
| MCU | Microcontroller | Rich peripherals, low power | Microcontroller development (Cortex-M series) |
| MPU | Microprocessor | High performance, requires external peripherals | Bare metal/Linux development (Cortex-A series) |
| SoC | System on Chip | CPU + peripherals integrated | Mass-produced embedded products |
| DSP | Digital Signal Processor | Focused on signal processing (filtering, encoding/decoding) | Audio/video processing |
This article focuses on the Cortex-A series, balancing performance and scalability, serving as the foundation for both bare metal and Linux development.
3. RISC Architecture: The Core Advantage of ARM Instruction Sets
ARM employs RISC (Reduced Instruction Set Computing) design, contrasting sharply with x86's CISC (Complex Instruction Set Computing), with core advantages:
- Fixed instruction length (32-bit in ARM state), high execution efficiency;
- Load/Store architecture, with only LDR/STR instructions accessing memory; others operate solely on registers;
- Abundant general-purpose registers (r0-r15), reducing memory access frequency;
- Low-power design, suitable for embedded mobile devices and industrial control scenarios.
II. ARM Register System: The "Foundation" of Program Execution
Registers are the core of CPU temporary data and state storage. Bare metal development requires mastery of their functions and usage rules.
1. General-Purpose Registers (r0-r15)
| Register | Core Role | Key Notes |
|---|---|---|
| r0-r3 | Function parameter passing, return value storage | Can be modified by called functions without protection |
| r4-r12 | General data storage | Called functions must protect (push/pop) to avoid data conflicts |
| r13 (SP) | Stack Pointer | Points to current stack top; must be manually initialized in bare metal |
| r14 (LR) | Link Register | Stores function return address; nested calls require stack protection |
| r15 (PC) | Program Counter | Points to the currently fetched instruction address; cannot be manually modified |
2. Status Registers (CPSR/SPSR)
- CPSR (Current Program Status Register) : Shared across all modes, stores core state information:
- Condition flags (N/Z/C/V): N (sign), Z (zero), C (carry), V (overflow);
- Interrupt disable bits (I/F): Control IRQ/FIQ interrupt enable;
- Mode bits (M[4:0]): Specify CPU current mode (e.g., User, SVC, IRQ).
- SPSR (Saved Program Status Register): Available only in privileged modes, backs up CPSR during exceptions and restores it upon return.
III. Core Assembly Instructions: From Data Operations to Flow Control
Assembly instructions are the "language" of bare metal development. Mastery of data transfer, arithmetic logic, bit operations, and conditional execution is essential. Below are detailed explanations with practical code.
1. Data Transfer and Shift Instructions (MOV/LDR)
Responsible for data movement between registers and between registers and memory, the most commonly used instruction category:
armasm
; 1. Immediate/register transfer (MOV)
mov r1, #0x08 ; r1 = 8 (immediate transfer, must comply with 12-bit rule)
mov r3, r1 ; r3 = r1 (register copy)
mov r4, r1, lsl #2 ; r1 left-shifted by 2 (×4) and stored in r4 (r4=32)
mov r4, r4, lsr #2 ; r4 right-shifted by 2 (÷4) and stored in r4 (r4=8)
mov r4, r1, ror #4 ; r1 rotated right by 4 (8→0x80000001)
; 2. Illegal immediate load (LDR pseudo-instruction)
ldr r0, =0x1FB0 ; 0x1FB0 violates 12-bit rule; use LDR to load large immediate
Key Notes:
- 12-bit immediate rule: Binary expansion must allow even-bit rotation, making high 24 bits all 0 and low 8 bits valid data;
- Shift operations support LSL (left shift), LSR (right shift), ROR (rotate right), with shift range 0-31.
2. Arithmetic Logic Instructions (ADD/SUB/CMP)
Implement data operations and comparisons. The S suffix updates CPSR flags, enabling conditional execution:
armasm
; 1. Arithmetic operations (ADD/SUB)
mov r0, #0xA0 ; r0 = 160
mov r1, #0x08 ; r1 = 8
add r5, r0, #1 ; r5 = 160 + 1 = 161 (immediate operation)
add r6, r0, r1 ; r6 = 160 + 8 = 168 (register operation)
add r6, r0, r1, lsl #2 ; r6 = 160 + (8×4) = 192 (shifted operation)
sub r3, r0, r1 ; r3 = 160 - 8 = 152
adds r3, r0, r1 ; With S suffix, updates CPSR flags (N/Z/C/V)
; 2. Compare instruction (CMP) → Essentially "subtract without storing result, only updates CPSR"
mov r0, #200
mov r1, #100
cmp r0, r1 ; Equivalent to subs r0, r1 (does not change r0, only updates flags)
movge r3, r0 ; If r0≥r1 (N=V), r3 = r0
movlt r3, r1 ; If r0<r1 (N≠V), r3 = r1
Core Rules:
- Direct operations on two immediates (e.g.,
add r0, #3, #2) are illegal; the compiler optimizes constant operations during compilation; - CMP is the foundation of conditional execution; subsequent conditional instructions (movge/movlt) rely on its updated CPSR flags.
3. Bit Operation Instructions (BIC/ORR)
Used for precise control of register bits, core instructions for hardware peripheral configuration:
armasm
; 1. Clear specified bits (BIC: Rd = Rn & ~Operand2)
mov r0, #0xFFFFFFFF ; r0 = all 1s
bic r1, r0, #(1 << 15) ; Clear bit 15 (r1 = 0xFF7FFFFF)
bic r1, r0, r2, lsl #15 ; Use register shift for flexible clearing
; 2. Set specified bits (ORR: Rd = Rn | Operand2)
orr r4, r1, #(1 << 15) ; Set bit 15 (r4 = 0xFFFFFFFF)
Application Scenarios: GPIO pin direction configuration, peripheral register bit control, modifying target bits without affecting others.
IV. Program Flow Control: Loop and Branch Instructions
Bare metal program flow control relies on loops and jumps. Clarify the "three elements of loops" (termination condition, iteration step, loop body). Below are implementations of two core loop structures.
1. while Loop (Check First, Execute After)
Calculate the sum of 1~1000, suitable for scenarios with uncertain loop counts:
armasm
mov r0, #1 ; Loop variable i = 1
mov r1, #0 ; Sum sum = 0
loop_label
cmp r0, #1000 ; Check i ≤ 1000?
bgt loop_finish ; If i > 1000, exit loop
add r1, r1, r0 ; sum += i (loop body)
add r0, r0, #1 ; i++ (iteration step)
b loop_label ; Jump back to loop start
loop_finish
b loop_finish ; Infinite loop halt (bare metal programs lack exit mechanisms)
2. do-while Loop (Execute First, Check After)
Also calculate the sum of 1~1000, executing the loop body at least once:
armasm
mov r0, #1 ; Loop variable i = 1
mov r1, #0 ; Sum sum = 0
do_loop
add r1, r1, r0 ; sum += i (execute loop body first)
add r0, r0, #1 ; i++
cmp r0, #1000 ; Check i ≤ 1000?
ble do_loop ; If condition met, continue loop
loop_finish
b loop_finish
3. Jump Instruction Comparison (B/BL/BX)
| Instruction | Core Function | Application Scenarios | Key Differences |
|---|---|---|---|
| B | Unconditional jump | Loops, general branches | Does not save return address |
| BL | Jump with return address | Function calls | Automatically saves return address in LR |
| BX | Jump with state switch | Function returns, state switches | Supports ARM/Thumb mode switch; bx lr = function return |
V. Function Calls and Stack Mechanism: The Core Safeguard of Bare-Metal Development
The key to function calls lies in "saving the context" and "restoring the context," relying on ARM's unique stack mechanism to avoid register data conflicts and return address loss.
1. ARM Stack Model: Full Descending Stack (FD)
The ARM core defaults to using a Full Descending Stack (STMFD/LDMFD), with the following core rules:
- Full Stack: Before pushing, the stack pointer (SP) points to valid data. During a push, the stack pointer is decremented first (SP -= 4), then the data is stored.
- Descending Stack: The stack pointer grows from high to low addresses (e.g., stack base at 0x40001000, with the stack top gradually decreasing).
- Stack Initialization : Use
ldr sp, =0x40001000(illegal immediates require LDR loading). Stack size must be configured in the project (e.g., 0x1000 = 4KB).
2. Complete Function Call Flow (Including Nested Calls)
(1) Basic Function Definition and Return
armasm
; Function with no parameters and no return value
asm_fun0
mov r0, #10 ; Internal logic: r0=10, r1=20
mov r1, #20
bx lr ; Function return (LR stores the return address)
(2) Nested Function Calls (Requires LR Protection)
armasm
; Function1: Returns the maximum of r0 and r1, nested call to asm_fun0
asm_twoNumMax
cmp r0, r1 ; Compare input parameters r0 and r1
movge r3, r0 ; Store the maximum in r3
movlt r3, r1
stmfd sp!, {lr} ; Push LR to the stack (nested calls require LR protection)
bl asm_fun0 ; Call asm_fun0 (LR is automatically updated to PC+4)
ldmfd sp!, {lr} ; Restore LR
bx lr ; Function return
; Main function: Initialize stack pointer, call nested function
asm_main
ldr sp, =0x40001000 ; Initialize stack pointer (full descending stack)
mov r0, #50 ; Parameter 1: a=50
mov r1, #20 ; Parameter 2: b=20
stmfd sp!, {r0-r12, lr} ; Save main function context (registers + return address)
bl asm_twoNumMax ; Call nested function
ldmfd sp!, {r0-r12, lr} ; Restore main function context
finish
b finish ; Infinite loop to halt
end
3. Core Rules of Function Calls
- Parameter Passing: The first 4 parameters are passed via r0-r3; exceeding 4 requires stack pushing.
- Return Value: Passed through r0.
- Context Saving: r4-r12 and LR must be pushed in nested calls to avoid data loss.
- Stack Alignment : Ensure 8-byte alignment when calling C functions; add the
preserve8pseudo-instruction.
VI. Bare-Metal Development Pitfall Guide (Critical Practical Reminders)
- Immediate Value Legality : 12-bit immediates must satisfy "rotated right by an even number of bits with the upper 24 bits all 0." Illegal immediates (e.g., 0x1FB0) must be loaded via
ldr r0, =0x1FB0. - Stack Initialization : Avoid
mov sp, #0x40001000(illegal immediate); useldr sp, =0x40001000instead. - Conditional Instruction Dependence : Conditional execution instructions (movge/movlt) depend on the preceding arithmetic/logic instruction with an
Ssuffix or a CMP instruction; otherwise, flags are invalid. - Function Return Convention : Use
bx lrfor regular functions; nested functions must restore LR before returning. - Bare-Memory Program Termination : Without an OS, programs cannot exit; use an infinite loop (
b finish), low-power mode, or wait for interrupt.
VII. Mixed C and Assembly Calls (Practical Extensions)
Bare-metal development often requires mixing C and assembly. Core rules:
1. Assembly Calling C Functions
armasm
; 1. Import C function (Keil environment)
import c_add
; 2. Stack alignment pseudo-instruction (avoid compilation errors)
preserve8
; 3. Call flow
asm_call_c
ldr sp, =0x40001000 ; Initialize stack
stmfd sp!, {r0-r12, lr} ; Save context
mov r0, #1 ; C parameter 1: a=1
mov r1, #2 ; C parameter 2: b=2
bl c_add ; Call C function
ldmfd sp!, {r0-r12, lr} ; Restore context
bx lr
2. C Calling Assembly Functions
armasm
; Assembly side: Export function
export asm_fun1
asm_fun1
add r0, r0, r1 ; Implements a+b, stores result in r0
bx lr
c
// C side: Declare function
extern int asm_fun1(int a, int b);
// Call
int main(void) {
int res = asm_fun1(10, 20); // res = 30
while(1);
}
Summary
The core of ARM Cortex-A bare-metal development lies in register operations + stack management + instruction execution:
- Architecture Level: Understand RISC advantages and SoC composition, clarifying bare-metal positioning.
- Instruction Level: Master data transfer, arithmetic/logic, bit operations, and branch instructions for basic functionality.
- Flow Control Level: Grasp loops and function calls, recognizing the stack's role in context protection.
- Practical Level: Avoid pitfalls like illegal immediates and stack misalignment, enabling mixed C/assembly calls.