DAY56 ARM Cortex-A Bare Metal

ARM Cortex-A Bare Metal: Comprehensive Analysis of Architecture and Assembly Instructions

Learning Stage: ARM Cortex-A Bare Metal Development

Core Objective: Break free from the operating system to master CPU instruction execution, register management, stack mechanisms, and program flow control

Keywords: ARM architecture, RISC, assembly instructions, loops, function calls, stack operations, bare metal development

I. ARM Architecture: More Than Just a CPU, It's a Complete Ecosystem

1. System Layering and the Essence of Bare Metal Development

A typical ARM system's software/hardware layers:

text 复制代码
Application Layer (APP) → System Calls (SYS) → Kernel → Hardware (SoC)  
  • Core of Bare Metal Development: Remove the Kernel layer; programs run directly on SoC hardware, requiring manual management of registers, stack, and peripherals.
  • SoC Composition: CPU (Cortex-A core) + peripherals (GPIO, UART, PWM, etc.), the hardware core of embedded systems.

2. Key Processing Unit Comparison (Clarifying Development Positioning)

Name Meaning Core Features Application Scenarios
CPU Central Processing Unit Strong general-purpose computing Core scenarios for all embedded systems
MCU Microcontroller Rich peripherals, low power Microcontroller development (Cortex-M series)
MPU Microprocessor High performance, requires external peripherals Bare metal/Linux development (Cortex-A series)
SoC System on Chip CPU + peripherals integrated Mass-produced embedded products
DSP Digital Signal Processor Focused on signal processing (filtering, encoding/decoding) Audio/video processing

This article focuses on the Cortex-A series, balancing performance and scalability, serving as the foundation for both bare metal and Linux development.

3. RISC Architecture: The Core Advantage of ARM Instruction Sets

ARM employs RISC (Reduced Instruction Set Computing) design, contrasting sharply with x86's CISC (Complex Instruction Set Computing), with core advantages:

  • Fixed instruction length (32-bit in ARM state), high execution efficiency;
  • Load/Store architecture, with only LDR/STR instructions accessing memory; others operate solely on registers;
  • Abundant general-purpose registers (r0-r15), reducing memory access frequency;
  • Low-power design, suitable for embedded mobile devices and industrial control scenarios.

II. ARM Register System: The "Foundation" of Program Execution

Registers are the core of CPU temporary data and state storage. Bare metal development requires mastery of their functions and usage rules.

1. General-Purpose Registers (r0-r15)

Register Core Role Key Notes
r0-r3 Function parameter passing, return value storage Can be modified by called functions without protection
r4-r12 General data storage Called functions must protect (push/pop) to avoid data conflicts
r13 (SP) Stack Pointer Points to current stack top; must be manually initialized in bare metal
r14 (LR) Link Register Stores function return address; nested calls require stack protection
r15 (PC) Program Counter Points to the currently fetched instruction address; cannot be manually modified

2. Status Registers (CPSR/SPSR)

  • CPSR (Current Program Status Register) : Shared across all modes, stores core state information:
    • Condition flags (N/Z/C/V): N (sign), Z (zero), C (carry), V (overflow);
    • Interrupt disable bits (I/F): Control IRQ/FIQ interrupt enable;
    • Mode bits (M[4:0]): Specify CPU current mode (e.g., User, SVC, IRQ).
  • SPSR (Saved Program Status Register): Available only in privileged modes, backs up CPSR during exceptions and restores it upon return.

III. Core Assembly Instructions: From Data Operations to Flow Control

Assembly instructions are the "language" of bare metal development. Mastery of data transfer, arithmetic logic, bit operations, and conditional execution is essential. Below are detailed explanations with practical code.

1. Data Transfer and Shift Instructions (MOV/LDR)

Responsible for data movement between registers and between registers and memory, the most commonly used instruction category:

armasm 复制代码
; 1. Immediate/register transfer (MOV)  
mov r1, #0x08          ; r1 = 8 (immediate transfer, must comply with 12-bit rule)  
mov r3, r1              ; r3 = r1 (register copy)  
mov r4, r1, lsl #2      ; r1 left-shifted by 2 (×4) and stored in r4 (r4=32)  
mov r4, r4, lsr #2      ; r4 right-shifted by 2 (÷4) and stored in r4 (r4=8)  
mov r4, r1, ror #4      ; r1 rotated right by 4 (8→0x80000001)  

; 2. Illegal immediate load (LDR pseudo-instruction)  
ldr r0, =0x1FB0         ; 0x1FB0 violates 12-bit rule; use LDR to load large immediate  

Key Notes:

  • 12-bit immediate rule: Binary expansion must allow even-bit rotation, making high 24 bits all 0 and low 8 bits valid data;
  • Shift operations support LSL (left shift), LSR (right shift), ROR (rotate right), with shift range 0-31.

2. Arithmetic Logic Instructions (ADD/SUB/CMP)

Implement data operations and comparisons. The S suffix updates CPSR flags, enabling conditional execution:

armasm 复制代码
; 1. Arithmetic operations (ADD/SUB)  
mov r0, #0xA0           ; r0 = 160  
mov r1, #0x08           ; r1 = 8  
add r5, r0, #1          ; r5 = 160 + 1 = 161 (immediate operation)  
add r6, r0, r1         ; r6 = 160 + 8 = 168 (register operation)  
add r6, r0, r1, lsl #2  ; r6 = 160 + (8×4) = 192 (shifted operation)  
sub r3, r0, r1         ; r3 = 160 - 8 = 152  
adds r3, r0, r1        ; With S suffix, updates CPSR flags (N/Z/C/V)  

; 2. Compare instruction (CMP) → Essentially "subtract without storing result, only updates CPSR"  
mov r0, #200  
mov r1, #100  
cmp r0, r1              ; Equivalent to subs r0, r1 (does not change r0, only updates flags)  
movge r3, r0            ; If r0≥r1 (N=V), r3 = r0  
movlt r3, r1             ; If r0<r1 (N≠V), r3 = r1  

Core Rules:

  • Direct operations on two immediates (e.g., add r0, #3, #2) are illegal; the compiler optimizes constant operations during compilation;
  • CMP is the foundation of conditional execution; subsequent conditional instructions (movge/movlt) rely on its updated CPSR flags.

3. Bit Operation Instructions (BIC/ORR)

Used for precise control of register bits, core instructions for hardware peripheral configuration:

armasm 复制代码
; 1. Clear specified bits (BIC: Rd = Rn & ~Operand2)  
mov r0, #0xFFFFFFFF     ; r0 = all 1s  
bic r1, r0, #(1 << 15)  ; Clear bit 15 (r1 = 0xFF7FFFFF)  
bic r1, r0, r2, lsl #15 ; Use register shift for flexible clearing  

; 2. Set specified bits (ORR: Rd = Rn | Operand2)  
orr r4, r1, #(1 << 15)  ; Set bit 15 (r4 = 0xFFFFFFFF)  

Application Scenarios: GPIO pin direction configuration, peripheral register bit control, modifying target bits without affecting others.


IV. Program Flow Control: Loop and Branch Instructions

Bare metal program flow control relies on loops and jumps. Clarify the "three elements of loops" (termination condition, iteration step, loop body). Below are implementations of two core loop structures.

1. while Loop (Check First, Execute After)

Calculate the sum of 1~1000, suitable for scenarios with uncertain loop counts:

armasm 复制代码
mov r0, #1              ; Loop variable i = 1  
mov r1, #0              ; Sum sum = 0  
loop_label  
cmp r0, #1000           ; Check i ≤ 1000?  
bgt loop_finish         ; If i > 1000, exit loop  
add r1, r1, r0          ; sum += i (loop body)  
add r0, r0, #1          ; i++ (iteration step)  
b loop_label             ; Jump back to loop start  
loop_finish  
b loop_finish            ; Infinite loop halt (bare metal programs lack exit mechanisms)  

2. do-while Loop (Execute First, Check After)

Also calculate the sum of 1~1000, executing the loop body at least once:

armasm 复制代码
mov r0, #1              ; Loop variable i = 1  
mov r1, #0              ; Sum sum = 0  
do_loop  
add r1, r1, r0          ; sum += i (execute loop body first)  
add r0, r0, #1          ; i++  
cmp r0, #1000           ; Check i ≤ 1000?  
ble do_loop             ; If condition met, continue loop  
loop_finish  
b loop_finish  

3. Jump Instruction Comparison (B/BL/BX)

Instruction Core Function Application Scenarios Key Differences
B Unconditional jump Loops, general branches Does not save return address
BL Jump with return address Function calls Automatically saves return address in LR
BX Jump with state switch Function returns, state switches Supports ARM/Thumb mode switch; bx lr = function return

V. Function Calls and Stack Mechanism: The Core Safeguard of Bare-Metal Development

The key to function calls lies in "saving the context" and "restoring the context," relying on ARM's unique stack mechanism to avoid register data conflicts and return address loss.

1. ARM Stack Model: Full Descending Stack (FD)

The ARM core defaults to using a Full Descending Stack (STMFD/LDMFD), with the following core rules:

  • Full Stack: Before pushing, the stack pointer (SP) points to valid data. During a push, the stack pointer is decremented first (SP -= 4), then the data is stored.
  • Descending Stack: The stack pointer grows from high to low addresses (e.g., stack base at 0x40001000, with the stack top gradually decreasing).
  • Stack Initialization : Use ldr sp, =0x40001000 (illegal immediates require LDR loading). Stack size must be configured in the project (e.g., 0x1000 = 4KB).

2. Complete Function Call Flow (Including Nested Calls)

(1) Basic Function Definition and Return
armasm 复制代码
; Function with no parameters and no return value  
asm_fun0  
mov r0, #10              ; Internal logic: r0=10, r1=20  
mov r1, #20  
bx lr                    ; Function return (LR stores the return address)  
(2) Nested Function Calls (Requires LR Protection)
armasm 复制代码
; Function1: Returns the maximum of r0 and r1, nested call to asm_fun0  
asm_twoNumMax  
cmp r0, r1              ; Compare input parameters r0 and r1  
movge r3, r0            ; Store the maximum in r3  
movlt r3, r1  

stmfd sp!, {lr}         ; Push LR to the stack (nested calls require LR protection)  
bl asm_fun0             ; Call asm_fun0 (LR is automatically updated to PC+4)  
ldmfd sp!, {lr}         ; Restore LR  

bx lr                   ; Function return  

; Main function: Initialize stack pointer, call nested function  
asm_main  
ldr sp, =0x40001000     ; Initialize stack pointer (full descending stack)  
mov r0, #50             ; Parameter 1: a=50  
mov r1, #20             ; Parameter 2: b=20  
stmfd sp!, {r0-r12, lr} ; Save main function context (registers + return address)  
bl asm_twoNumMax        ; Call nested function  
ldmfd sp!, {r0-r12, lr} ; Restore main function context  
finish  
b finish                ; Infinite loop to halt  
end  

3. Core Rules of Function Calls

  • Parameter Passing: The first 4 parameters are passed via r0-r3; exceeding 4 requires stack pushing.
  • Return Value: Passed through r0.
  • Context Saving: r4-r12 and LR must be pushed in nested calls to avoid data loss.
  • Stack Alignment : Ensure 8-byte alignment when calling C functions; add the preserve8 pseudo-instruction.

VI. Bare-Metal Development Pitfall Guide (Critical Practical Reminders)

  1. Immediate Value Legality : 12-bit immediates must satisfy "rotated right by an even number of bits with the upper 24 bits all 0." Illegal immediates (e.g., 0x1FB0) must be loaded via ldr r0, =0x1FB0.
  2. Stack Initialization : Avoid mov sp, #0x40001000 (illegal immediate); use ldr sp, =0x40001000 instead.
  3. Conditional Instruction Dependence : Conditional execution instructions (movge/movlt) depend on the preceding arithmetic/logic instruction with an S suffix or a CMP instruction; otherwise, flags are invalid.
  4. Function Return Convention : Use bx lr for regular functions; nested functions must restore LR before returning.
  5. Bare-Memory Program Termination : Without an OS, programs cannot exit; use an infinite loop (b finish), low-power mode, or wait for interrupt.

VII. Mixed C and Assembly Calls (Practical Extensions)

Bare-metal development often requires mixing C and assembly. Core rules:

1. Assembly Calling C Functions

armasm 复制代码
; 1. Import C function (Keil environment)  
import c_add  
; 2. Stack alignment pseudo-instruction (avoid compilation errors)  
preserve8  
; 3. Call flow  
asm_call_c  
ldr sp, =0x40001000     ; Initialize stack  
stmfd sp!, {r0-r12, lr} ; Save context  
mov r0, #1              ; C parameter 1: a=1  
mov r1, #2              ; C parameter 2: b=2  
bl c_add                ; Call C function  
ldmfd sp!, {r0-r12, lr} ; Restore context  
bx lr  

2. C Calling Assembly Functions

armasm 复制代码
; Assembly side: Export function  
export asm_fun1  
asm_fun1  
add r0, r0, r1         ; Implements a+b, stores result in r0  
bx lr  
c 复制代码
// C side: Declare function  
extern int asm_fun1(int a, int b);  
// Call  
int main(void) {  
    int res = asm_fun1(10, 20); // res = 30  
    while(1);  
}  

Summary

The core of ARM Cortex-A bare-metal development lies in register operations + stack management + instruction execution:

  • Architecture Level: Understand RISC advantages and SoC composition, clarifying bare-metal positioning.
  • Instruction Level: Master data transfer, arithmetic/logic, bit operations, and branch instructions for basic functionality.
  • Flow Control Level: Grasp loops and function calls, recognizing the stack's role in context protection.
  • Practical Level: Avoid pitfalls like illegal immediates and stack misalignment, enabling mixed C/assembly calls.
相关推荐
期末考复习中,蓝桥杯都没时间学了1 小时前
python调用百度智能云API完成文本情感分析
开发语言·python
星陨771 小时前
OpenStack私有云平台API接口练习
linux·运维·网络·openstack
lllsure2 小时前
PostgreSQL
数据库·postgresql
计算机网恋2 小时前
Ubuntu中VSCode配置CC++环境
c语言·vscode·ubuntu
深念Y2 小时前
easylive仿B站项目 后端 单体版 项目构建
java·开发语言
别再下雨辽2 小时前
开发板通过 VSCode Remote-SSH 反向转发复用 PC 代理排障总结
linux·ide·笔记·vscode·ssh
Once_day2 小时前
CC++八股文之内存泄漏
c语言·c++
蒜香拿铁2 小时前
【第五章】python判断语句if
java·服务器·python