打打基础 | 从翻转链表到寄存器、汇编与内存

我作为软件工程师在美国工作了三年，期间接触和浸泡过不少的技术栈，罗列一番的话有 AWS cloud, frontend (React, TypeScript), backend (Django, Springboot, ECS, GraphQL), JVM (Java, Scala, Kotlin), data pipelines (Spark, Snowflake, Prefect, DBT), database (Clickhouse, MySQL, Postgres, DuckDB), devOps (Gitlab CI, Jenkins, Bazel, linter), Observability (Grafana, Splunk, Prometheus), Docker, Kubernetes等一大串内容，做做准备的话可以把自己营造成多面手的人设。但是我明白，绝大多数时候里我都只是在使用各类application罢了，虽然自诩代码品味不错，追求让人一目了然而简洁易懂的程序，但总归还是在浅水里扑腾。

趁着辞职后的充电期，我开始系统性地学习计算机的各类基础内容。虽然缺乏这些知识并没有怎么影响我的日常工作，但我仍然发自内心地对计算机这个事物感到好奇，希望能够对它有更多的了解。我决定写些文章来记录分享我在阅读大名鼎鼎的教材 Computer Systems: A Programmer's Perspective 期间的学习感悟，并尽可能地用高质量的例子来巩固所学。

这篇文章会用程序员们喜闻乐见的翻转链表作为例子，介绍寄存器register, 汇编assembly, 以及栈帧stack frame等内容，希望能够给大家带来一种「串联起来了」的愉悦。由于我比较偏好阅读英文的学习材料（我没有读过计算机领域的中文教材），对于许多技术名词的中文翻译感到别扭（比如把pipeline叫做流水线，stack frame叫做栈帧），所以干脆用英文进行写作。

Reversing a linked-list is like the "Hello World" from programmers. In this article I'm gonna use it as an example (written below in C) to showcase how C code gets translated into assembly, with a specific focus on stack frames, register usage, memory layout, and the power of the popular debugging tool gdb.

c 复制代码

#include <stdio.h>

typedef struct Node {
    int val;
    int id;
    struct Node* next;
} Node;

// Reverses a singly linked list and returns the new head
Node* reverse(Node* head) {
    Node* prev = NULL;
    Node* curr = head;
    Node* next;

    while (curr != NULL) {
        next = curr->next;
        curr->next = prev;
        prev = curr;
        curr = next;
    }

    return prev;
}

// Prints the list
void print_list(Node* head) {
    while (head != NULL) {
        printf("Node ID: %d, Value: %d\n", head->id, head->val);
        head = head->next;
    }
}

int main() {
    // Declare four nodes on the stack and manually connect them
    Node node_4 = {40, 4, NULL};
    Node node_3 = {30, 3, &node_4};
    Node node_2 = {20, 2, &node_3};
    Node node_1 = {10, 1, &node_2};

    printf("Original list:\n");
    print_list(&node_1);

    // Reverse the list
    Node* new_head = reverse(&node_1);

    printf("\nReversed list:\n");
    print_list(new_head);
    return 0;
}

The code above should be pretty straightforward. Before we dig into this piece of code, let's refresh our mind with some basic knowledge on CPU registers, memory and assembly.

Registers and Memory in a 64-bit World

When writing low-level programs like C, the compiler transforms our code into assembly instructions that run on the CPU. Each CPU has a small set of registers, which are fast-access storage units. On a 64-bit machine, these registers are 64 bits wide --- i.e., they can hold a number as large as 2⁶⁴.

Computer memory is essentially a giant array of bytes. Every byte in memory has a unique address (think of it like array index), which is also a 64-bit number on modern architectures. So, each register can natually point to a particular memory address.

Notice that the values in the registers don't always have to be memory address. They can also simply be numbers like -5, 42, etc. The CPU doesn't care about the meaning---it just sees binary values. It's up to us (and the compiler) to interpret them correctly. This dual interpretation --- value or address --- is what makes assembly programming flexible yet confusing to beginners.

Essentially all data inside a machine is just bits. When we see two bytes 0x48 0x69, we can either intepret them as string "Hi", or as decimal number 18537, and the computer itself wouldn't know the difference. Note that the prefix '0x' means this a hex number (0x10 is 16 in decimal). You'll run into hex numbers all the time in the low-level world, because memory addresses are 64-bit and decimal is just too clumsy - you would definitely prefer using 0x401256to 010 000 000 001 001 001 010 110 to denote an address. Hex is compact and maps neatly to binary.

There are 16 general-purpose registers in x86-64 systems:

复制代码

%rax, %rbx, %rcx, %rdx,
%rsi, %rdi, %rbp, %rsp,
%r8 through %r15

Some registers have conventional purposes:

%rsp: stack pointer, always points to the top of the stack
%rax: return value from a function
%rdi, %rsi, %rdx, %rcx, %r8, %r9: used to pass the first 6 function arguments

Common Assembly Instructions

Assembly instructions manipulate the data between registers and memory. These instructions are where we begin to see how higher-level constructs like linked lists are implemented under the hood.

Here are a few of the most common instructions you'll run into all the time. These might seem overwhelming for now, but don't worry, we will see them work in action later.

mov: Copy value from source to destination (register-to-register, memory-to-register, etc.). For example, say register %rax is now 0x402000
- mov $0x28, %rax updates register %rax to value 0x28.
- mov $0x28, (%rax) puts integer 0x28 into memory starting at address 0x402000
lea: Load effective address. Usually used to perform simple arthmetic operations.
- Say %rax is 0x402000, and %rbx is 0x2. Instruction lea 0x1(%rax, %rbx, 4), %rdx sets %rdx = 0x402000 + 2 * 4 + 0x1 = 0x402009.
call: Push return address onto stack and jump to the function
ret: Pop return address and jump back to where the function call was made
push: Decrease %rsp and store a value at the new top of the stack
pop: Opposite of push: Load value from top of stack and increase %rsp
sub, add: Perform arithmetic (often used to grow/shrink stack frame)
cmp: Compare two values
jmp, je, jne, jg, etc.: Control flow via unconditional or conditional jumps. Usually used together with cmp instruction, for example, cmp $0x3, %rax
- je equal: %rax == 3
- jne not equak: %rax != 3
- jg greater: %rax > 3

Dissecting the main Function

I use the following code to compile the C file and then disassemble it:

bash 复制代码

gcc -g -no-pie -Og -o linked_list linked_list.c
objdump -d --no-show-raw-insn linked_list > linked_list.s

Let's look at the disassembled version of our C program's main function. I've added lots of annotations to help reading through. At each line, the leftmost hex number is the address of the instruction.

assembly 复制代码

00000000004011c5 <main>:
  4011c5:	endbr64                    # Control-flow enforcement. Let's ignore it
  4011c9:	push   %rbx              	 # Register rbx will be modified in the process. We push it onto the stack so that we can retrieve later 
  4011ca:	sub    $0x50,%rsp          # Move the top of the stack to allocate 80 bytes on stack(0x50 is 80 in decimal) 
  4011ce:	mov    %fs:0x28,%rax       
  4011d7:	mov    %rax,0x48(%rsp)     # Put special value in the stack to detect stack buffer overflows. We can ignore for now.
  4011dc:	xor    %eax,%eax           # Clear %eax

# === Initialize 4 linked list nodes ===
  4011de:	movl   $0x28,(%rsp)        # node4.val = 40
  4011e5:	movl   $0x4,0x4(%rsp)      # node4.id = 4
  4011ed:	movq   $0x0,0x8(%rsp)      # node4.next = NULL

  4011f6:	movl   $0x1e,0x10(%rsp)    # node3.val = 30
  4011fe:	movl   $0x3,0x14(%rsp)     # node3.id = 3
  401206:	mov    %rsp,%rax           # address of node4
  401209:	mov    %rax,0x18(%rsp)     # node3.next = &node4

  40120e:	movl   $0x14,0x20(%rsp)    # node2.val = 20
  401216:	movl   $0x2,0x24(%rsp)     # node2.id = 2
  40121e:	lea    0x10(%rsp),%rax     # address of node3
  401223:	mov    %rax,0x28(%rsp)     # node2.next = &node3

  401228:	movl   $0xa,0x30(%rsp)     # node1.val = 10
  401230:	movl   $0x1,0x34(%rsp)     # node1.id = 1
  401238:	lea    0x20(%rsp),%rax     # address of node2
  40123d:	mov    %rax,0x38(%rsp)     # node1.next = &node2

# === Print Original List ===
  401242:	lea    0xdd3(%rip),%rdi    # Load "Original list:" format string
  401249:	call   401060 <puts@plt>   # Print string to our screen

  40124e:	lea    0x30(%rsp),%rbx     # Set rbp to point to node1
  401253:	mov    %rbx,%rdi           # Set pointer to node1 as the first argument to function call `print_list` so that we're ready for the next line
  401256:	call   401195 <print_list>

# === Reverse the List ===
  40125b:	mov    %rbx,%rdi           # first arg: head node
  40125e:	call   401176 <reverse>    # returns new head
  401263:	mov    %rax,%rbx           # save new head

# === Print Reversed List ===
  401266:	lea    0xdbe(%rip),%rdi    # Load "Reversed list:" string
  40126d:	call   401060 <puts@plt>

  401272:	mov    %rbx,%rdi
  401275:	call   401195 <print_list>

# === Stack Canary Check ===
  40127a:	mov    0x48(%rsp),%rax
  40127f:	sub    %fs:0x28,%rax
  401288:	jne    401295 <main+0xd0>   # if canary changed → stack overflow → crash

# === Return Normally ===
  40128a:	mov    $0x0,%eax           # 0 is the success return code
  40128f:	add    $0x50,%rsp          # Clean up stack frame
  401293:	pop    %rbx                # Retrieve the original value of register rbx
  401294:	ret                        # Return 0
  401295:	call   401070 <__stack_chk_fail@plt>  # stack corrupted → crash

A very important concept is stack frame - a block of memory to use whenever a function is called. It stores everything that the function needs while it runs, such as return address (where to jump back after the function finishes), local variables, function arguments (if passed via the stack) and sometimes, saved registers (like %rbp, %rbx) to restore later. It's created at the start of a function and destroyed at the end.

The register %rsp is the top of the stack frame, and is extremely important since it marks the boundary of the stack frame. When we say the stack frame is destroyed, we don't mean physically destroying the hardware. Instead, it means the previous chunk of memory is no longered considered part of the current stack frame, and other programs are now allowed to write new values to this memory space to overwrite the existing ones.

Now let's run GDB (GNU Debugger) to execute the program. add a breakpoint to address 401256 (you can find this number in the disassembled code above), right before calling print_list function.

复制代码

$ gdb linked_list
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
(gdb) break *0x401256
(gdb) run
Starting program: /home/ubuntu/linked_list 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Now let's examine the stack frame. The GDB command to examine memory (a very powerful tool!) is x/<count><format><size> <address>.

codes for [format]: x = hex, d = decimal, u = unsigned, c = char, s = string
codes for [size]: b = byte (1 byte), h = halfword (2 bytes), w = word (4 bytes), g = giant (8 bytes)

For example, the following code examines 20 words (4 bytes each) at the current stack pointer:

bash 复制代码

(gdb) x/20wx $rsp
0x7fffffffe350: 0x00000028      0x00000004      0x00000000      0x00000000
0x7fffffffe360: 0x0000001e      0x00000003      0xffffe350      0x00007fff
0x7fffffffe370: 0x00000014      0x00000002      0xffffe360      0x00007fff
0x7fffffffe380: 0x0000000a      0x00000001      0xffffe370      0x00007fff
0x7fffffffe390: 0x00000000      0x00000000      0xfbeca800      0x9629f2d4

The first 4 lines correspond to the 4 linked-list nodes we just created, and we can clearly see each node's value, id and the pointer to next node. Each node consumes 16bytes (two integers, 4 bytes for each, and a 8-byte pointer), and that's why their memory addresses all differ by 0x10.

Here's how the stack looks during execution.

复制代码

|------------------------| <- at 0x50(%rsp), 0x7fffffffe3a0
| stack overflow check   | 
|------------------------| 
|                        | 
|------------------------| <- at 0x40(%rsp), 0x7fffffffe390
| node1 (next)           | 
|------------------------| <- at 0x38(%rsp), 0x7fffffffe368
| node1 (val, id)        | 
|------------------------| <- at 0x30(%rsp), 0x7fffffffe370
| node2 (next)           | 
|------------------------| <- at 0x28(%rsp), 0x7fffffffe368
| node2 (val, id)        | 
|------------------------| <- at 0x20(%rsp), 0x7fffffffe370
| node3 (next)           | 
|------------------------| <- at 0x18(%rsp), 0x7fffffffe368
| node3 (val, id)        | 
|------------------------| <- at 0x10(%rsp), 0x7fffffffe360
| node4 (next)           | 
|------------------------| <- at 0x8(%rsp), 0x7fffffffe358
| node4 (val, id)        | 
|------------------------| <- %rsp, 0x7fffffffe350

The stack frame contains exactly 0x50 bytes of memory, as allocated by the instruction 4011ca: sub $0x50,%rsp above. You may notice that the 8 bytes at address 0x40(%rsp) are simply unused, and you may wonder why did we bother to allocate space for it. On x86-64, the stack must be aligned to 16 bytes before a call instruction, and we need those "0x00" bytes for padding.

Dissecting the print_list Function

Now let's resume from we set the breakpoint, and dig into the print_list function.

assembly 复制代码

0000000000401195 <print_list>:
  401195:	endbr64

# === Function Prologue ===
  401199:	push   %rbx            # Save %rbx
  40119a:	mov    %rdi,%rbx       # Move first arg (head pointer) into %rbx

# === Loop Condition ===
  40119d:	jmp    4011be          # Jump to address 4011be for loop condition check

# === Loop Body ===
  40119f:	mov    (%rbx),%ecx     # Load val into %ecx
  4011a1:	mov    0x4(%rbx),%edx  # Load id into %edx
  4011a4:	lea    0xe59(%rip),%rsi # Load the address of format string "Node ID: %d, Value: %d\n" into %rsi
  4011ab:	mov    $0x1,%edi       # First arg to printf (fd = stdout)
  4011b0:	mov    $0x0,%eax       # Clear %eax before variadic call
  4011b5:	call   401080 <__printf_chk@plt>  # secure version of function printf

  4011ba:	mov    0x8(%rbx),%rbx  # Move to next node (follow .next)

# === Loop Condition ===
  4011be:	test   %rbx,%rbx       # Check if current node is NULL
  4011c1:	jne    40119f          # If not NULL, jump to address 40119f to continue loop

# === Function Epilogue ===
  4011c3:	pop    %rbx            # Restore %rbx
  4011c4:	ret                    # Return

We can add a second breakpoint to the function to examine the stack

bash 复制代码

(gdb) break *0x4011ba
Breakpoint 2 at 0x4011ba: file linked_list.c, line 30.
(gdb) continue
Continuing.
Node ID: 1, Value: 10

Breakpoint 2, print_list (head=head@entry=0x7fffffffe380) at linked_list.c:30
30              head = head->next;
(gdb) x/20wx $rsp
0x7fffffffe340: 0xffffe380      0x00007fff      0x0040125b      0x00000000
0x7fffffffe350: 0x00000028      0x00000004      0x00000000      0x00000000
0x7fffffffe360: 0x0000001e      0x00000003      0xffffe350      0x00007fff

We notice that the stack pointer register %rsp has been changed from 0x7fffffffe350 to 0x7fffffffe340, even though the print_list function doesn't allocate any space to the stack frame. So, why did the stack frame grow by 0x10 (16bytes)?

In the main function, when we execute instruction 401256: call 401195 <print_list>, we push the address of next instruction 40125b: mov %rbx,%rdi onto the stack, so that when function call print_list finishes, we can return to the correct location of main function and continue executing. This return addreses takes 8 bytes.
Inside the print_list function , we run 401199: push %rbx to push the value of register %rbx onto the stack. That takes another 8 bytes.

So, even though register %rsp isn't directly called, it was modified by instructions call and push. Notice that instructions ret and pop are counterparts of them, and will do the opposite - pop data out of the stack.

Dissecting the reverse Function

Now let's move on to the last function reverse.

assembly 复制代码

0000000000401176 <reverse>:
  401176:	endbr64

# === Function Prologue ===
  40117a:	mov    $0x0,%eax       # Set return register %rax = NULL (this will track the new head)

# === Loop Condition (initial jump) ===
  40117f:	jmp    40118f          # Jump to loop condition before first iteration

# === Loop Body ===
  401181:	mov    0x8(%rdi),%rdx  # Load next node: %rdx = current->next
  401185:	mov    %rax,0x8(%rdi)  # Reverse the link: current->next = previous
  401189:	mov    %rdi,%rax       # Update previous: previous = current
  40118c:	mov    %rdx,%rdi       # Move to next: current = next

# === Loop Condition ===
  40118f:	test   %rdi,%rdi       # Check if current (now in %rdi) is NULL
  401192:	jne    401181          # If not, continue the loop

# === Function Epilogue ===
  401194:	ret                    # Return; %rax holds new head of reversed list

This piece of assembly should be rather straightforward now.

One thing interesting here is that in my C function reverse , I declare 3 local variables prev, curr and next, but in the assembly code there isn't any stack space allocated for them. It's because the compiler tries to optimize away unnecessary memory operations, and it's smart enough to realize that we don't actually need any additional memory - simply using registers to update pointers is enough. And this process is just beautiful.

Thanks

I just started learning fundamentals about machines recently, although I've been a developer for many years. This article is my personal attempt to internalize my learnings and share them with others. If you're just getting into the low-level world, I hope you're just as excited as I'm.

One last note: I'm using my M2 Macbook Air for daily task and it doesn't work well with GDB and all those x86-64 stuff. I ended up getting an AWS ECS Ubuntu t2.micro machine.

bash 复制代码

$ uname -a
Linux ip-172-31-18-227 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0