learning_gem5 part1_04 理解gem5统计信息与输出文件

除了模拟脚本自行输出的信息外,运行gem5后会在m5out目录下生成三个文件:

config.ini:列出为模拟创建的所有SimObject及其参数值

config.json:内容与config.ini相同,但采用json格式

stats.txt:以文本形式记录所有已注册的gem5统计信息

config.ini详解

此文件是模拟配置的权威记录。所有SimObject的参数(无论通过配置文件设置还是使用默认值)都会在此显示。以下示例截取自运行simple-config-chapter中的simple.py后生成的config.ini:

复制代码
[root]
type=Root
children=system
eventq_index=0
full_system=false
sim_quantum=0
time_sync_enable=false
time_sync_period=100000000000
time_sync_spin_threshold=100000000

[system]
type=System
children=clk_domain cpu dvfs_handler mem_ctrl membus
boot_osflags=a
cache_line_size=64
clk_domain=system.clk_domain
default_p_state=UNDEFINED
eventq_index=0
exit_on_work_items=false
init_param=0
kernel=
kernel_addr_check=true
kernel_extras=
kvm_vm=Null
load_addr_mask=18446744073709551615
load_offset=0
mem_mode=timing

...

[system.membus]
type=CoherentXBar
children=snoop_filter
clk_domain=system.clk_domain
default_p_state=UNDEFINED
eventq_index=0
forward_latency=4
frontend_latency=3
p_state_clk_gate_bins=20
p_state_clk_gate_max=1000000000000
p_state_clk_gate_min=1000
point_of_coherency=true
point_of_unification=true
power_model=
response_latency=2
snoop_filter=system.membus.snoop_filter
snoop_response_latency=4
system=system
use_default_range=false
width=16
master=system.cpu.interrupts.pio system.cpu.interrupts.int_slave system.mem_ctrl.port
slave=system.cpu.icache_port system.cpu.dcache_port system.cpu.interrupts.int_master system.system_port

[system.membus.snoop_filter]
type=SnoopFilter
eventq_index=0
lookup_latency=1
max_capacity=8388608
system=system

可以看到每个SimObject的描述均以方括号包围的配置命名开头(如[system.membus]),随后逐项显示参数值(包括未在配置中显式设置的参数)。例如配置中设置了时钟域为1GHz(对应1000个时钟单位),但未设置缓存行大小(系统中实际为64字节)。

该文件是验证模拟参数是否按预期设置的重要工具。由于gem5存在多种设置默认值及覆盖默认值的机制,建议始终将检查config.ini作为验证配置参数是否正确传递到SimObject实例的标准流程。

stats.txt详解

gem5拥有灵活的统计信息系统(详见gem5统计文档)。每个SimObject实例都拥有独立统计项,在模拟结束或执行统计转储命令时,所有统计信息将写入此文件。

首先文件会记录常规运行统计:

复制代码
---------- Begin Simulation Statistics ----------
simSeconds                                   0.000057                       # Number of seconds simulated (Second)
simTicks                                     57467000                       # Number of ticks simulated (Tick)
finalTick                                    57467000                       # Number of ticks from beginning of simulation (restored from checkpoints and never reset) (Tick)
simFreq                                  1000000000000                       # The number of ticks per simulated second ((Tick/Second))
hostSeconds                                      0.03                       # Real time elapsed on the host (Second)
hostTickRate                               2295882330                       # The number of ticks simulated per host second (ticks/s) ((Tick/Second))
hostMemory                                     665792                       # Number of bytes of host memory used (Byte)
simInsts                                         6225                       # Number of instructions simulated (Count)
simOps                                          11204                       # Number of ops (including micro ops) simulated (Count)
hostInstRate                                   247382                       # Simulator instruction rate (inst/s) ((Count/Second))
hostOpRate                                     445086                       # Simulator op (including micro ops) rate (op/s) ((Count/Second))

---------- Begin Simulation Statistics ----------
simSeconds                                   0.000490                       # Number of seconds simulated (Second)
simTicks                                    490394000                       # Number of ticks simulated (Tick)
finalTick                                   490394000                       # Number of ticks from beginning of simulation (restored from checkpoints and never reset) (Tick)
simFreq                                  1000000000000                       # The number of ticks per simulated second ((Tick/Second))
hostSeconds                                      0.03                       # Real time elapsed on the host (Second)
hostTickRate                              15979964060                       # The number of ticks simulated per host second (ticks/s) ((Tick/Second))
hostMemory                                     657488                       # Number of bytes of host memory used (Byte)
simInsts                                         6225                       # Number of instructions simulated (Count)
simOps                                          11204                       # Number of ops (including micro ops) simulated (Count)
hostInstRate                                   202054                       # Simulator instruction rate (inst/s) ((Count/Second))
hostOpRate                                     363571                       # Simulator op (including micro ops) rate (op/s) ((Count/Second))

统计内容以"---------- Begin Simulation Statistics ----------"起始(若模拟过程中存在多次统计转储,文件会出现多个该标记)。每项统计包含名称(首列)、数值(第二列)以及带#前缀的描述说明(末列)和单位。

多数统计项通过描述即可理解其含义,其中关键统计包括:

sim_seconds:总模拟时间

sim_insts:CPU提交指令总数

host_inst_rate:反映gem5本身运行性能

随后文件会输出各SimObject的统计信息,例如包含系统调用次数的CPU统计、缓存系统与转译缓冲器统计等:

复制代码
system.clk_domain.clock                          1000                       # Clock period in ticks (Tick)
system.clk_domain.voltage_domain.voltage            1                       # Voltage in Volts (Volt)
system.cpu.numCycles                            57467                       # Number of cpu cycles simulated (Cycle)
system.cpu.numWorkItemsStarted                      0                       # Number of work items this cpu started (Count)
system.cpu.numWorkItemsCompleted                    0                       # Number of work items this cpu completed (Count)
system.cpu.dcache.demandHits::cpu.data           1941                       # number of demand (read+write) hits (Count)
system.cpu.dcache.demandHits::total              1941                       # number of demand (read+write) hits (Count)
system.cpu.dcache.overallHits::cpu.data          1941                       # number of overall hits (Count)
system.cpu.dcache.overallHits::total             1941                       # number of overall hits (Count)
system.cpu.dcache.demandMisses::cpu.data          133                       # number of demand (read+write) misses (Count)
system.cpu.dcache.demandMisses::total             133                       # number of demand (read+write) misses (Count)
system.cpu.dcache.overallMisses::cpu.data          133                       # number of overall misses (Count)
system.cpu.dcache.overallMisses::total            133                       # number of overall misses (Count)
system.cpu.dcache.demandMissLatency::cpu.data     14301000                       # number of demand (read+write) miss ticks (Tick)
system.cpu.dcache.demandMissLatency::total     14301000                       # number of demand (read+write) miss ticks (Tick)
system.cpu.dcache.overallMissLatency::cpu.data     14301000                       # number of overall miss ticks (Tick)
system.cpu.dcache.overallMissLatency::total     14301000                       # number of overall miss ticks (Tick)
system.cpu.dcache.demandAccesses::cpu.data         2074                       # number of demand (read+write) accesses (Count)
system.cpu.dcache.demandAccesses::total          2074                       # number of demand (read+write) accesses (Count)
system.cpu.dcache.overallAccesses::cpu.data         2074                       # number of overall (read+write) accesses (Count)
system.cpu.dcache.overallAccesses::total         2074                       # number of overall (read+write) accesses (Count)
system.cpu.dcache.demandMissRate::cpu.data     0.064127                       # miss rate for demand accesses (Ratio)
system.cpu.dcache.demandMissRate::total      0.064127                       # miss rate for demand accesses (Ratio)
system.cpu.dcache.overallMissRate::cpu.data     0.064127                       # miss rate for overall accesses (Ratio)
system.cpu.dcache.overallMissRate::total     0.064127                       # miss rate for overall accesses (Ratio)
system.cpu.dcache.demandAvgMissLatency::cpu.data 107526.315789                       # average overall miss latency ((Cycle/Count))
system.cpu.dcache.demandAvgMissLatency::total 107526.315789                       # average overall miss latency ((Cycle/Count))
system.cpu.dcache.overallAvgMissLatency::cpu.data 107526.315789                       # average overall miss latency ((Cycle/Count))
system.cpu.dcache.overallAvgMissLatency::total 107526.315789                       # average overall miss latency ((Cycle/Count))
...
system.cpu.mmu.dtb.rdAccesses                    1123                       # TLB accesses on read requests (Count)
system.cpu.mmu.dtb.wrAccesses                     953                       # TLB accesses on write requests (Count)
system.cpu.mmu.dtb.rdMisses                        11                       # TLB misses on read requests (Count)
system.cpu.mmu.dtb.wrMisses                         9                       # TLB misses on write requests (Count)
system.cpu.mmu.dtb.walker.power_state.pwrStateResidencyTicks::UNDEFINED     57467000                       # Cumulative time (in ticks) in various power states (Tick)
system.cpu.mmu.itb.rdAccesses                       0                       # TLB accesses on read requests (Count)
system.cpu.mmu.itb.wrAccesses                    7940                       # TLB accesses on write requests (Count)
system.cpu.mmu.itb.rdMisses                         0                       # TLB misses on read requests (Count)
system.cpu.mmu.itb.wrMisses                        37                       # TLB misses on write requests (Count)
system.cpu.mmu.itb.walker.power_state.pwrStateResidencyTicks::UNDEFINED     57467000                       # Cumulative time (in ticks) in various power states (Tick)
system.cpu.power_state.pwrStateResidencyTicks::ON     57467000                       # Cumulative time (in ticks) in various power states (Tick)
system.cpu.thread_0.numInsts                        0                       # Number of Instructions committed (Count)
system.cpu.thread_0.numOps                          0                       # Number of Ops committed (Count)
system.cpu.thread_0.numMemRefs                      0                       # Number of Memory References (Count)
system.cpu.workload.numSyscalls                    11                       # Number of system calls (Count)

在文件后续部分还会呈现内存控制器统计,包括各组件读取的字节数及其平均带宽使用情况等数据。

复制代码
system.mem_ctrl.bytesReadWrQ                        0                       # Total number of bytes read from write queue (Byte)
system.mem_ctrl.bytesReadSys                    23168                       # Total read bytes from the system interface side (Byte)
system.mem_ctrl.bytesWrittenSys                     0                       # Total written bytes from the system interface side (Byte)
system.mem_ctrl.avgRdBWSys               403153113.96105593                       # Average system read bandwidth in Byte/s ((Byte/Second))
system.mem_ctrl.avgWrBWSys                 0.00000000                       # Average system write bandwidth in Byte/s ((Byte/Second))
system.mem_ctrl.totGap                       57336000                       # Total gap between requests (Tick)
system.mem_ctrl.avgGap                      158386.74                       # Average gap between requests ((Tick/Count))
system.mem_ctrl.requestorReadBytes::cpu.inst        14656                       # Per-requestor bytes read from memory (Byte)
system.mem_ctrl.requestorReadBytes::cpu.data         8512                       # Per-requestor bytes read from memory (Byte)
system.mem_ctrl.requestorReadRate::cpu.inst 255033323.472601681948                       # Per-requestor bytes read from memory rate ((Byte/Second))
system.mem_ctrl.requestorReadRate::cpu.data 148119790.488454252481                       # Per-requestor bytes read from memory rate ((Byte/Second))
system.mem_ctrl.requestorReadAccesses::cpu.inst          229                       # Per-requestor read serviced memory accesses (Count)
system.mem_ctrl.requestorReadAccesses::cpu.data          133                       # Per-requestor read serviced memory accesses (Count)
system.mem_ctrl.requestorReadTotalLat::cpu.inst      6234000                       # Per-requestor read total memory access latency (Tick)
system.mem_ctrl.requestorReadTotalLat::cpu.data      4141000                       # Per-requestor read total memory access latency (Tick)
system.mem_ctrl.requestorReadAvgLat::cpu.inst     27222.71                       # Per-requestor read average memory access latency ((Tick/Count))
system.mem_ctrl.requestorReadAvgLat::cpu.data     31135.34                       # Per-requestor read average memory access latency ((Tick/Count))
system.mem_ctrl.dram.bytesRead::cpu.inst        14656                       # Number of bytes read from this memory (Byte)
system.mem_ctrl.dram.bytesRead::cpu.data         8512                       # Number of bytes read from this memory (Byte)
system.mem_ctrl.dram.bytesRead::total           23168                       # Number of bytes read from this memory (Byte)
system.mem_ctrl.dram.bytesInstRead::cpu.inst        14656                       # Number of instructions bytes read from this memory (Byte)
system.mem_ctrl.dram.bytesInstRead::total        14656                       # Number of instructions bytes read from this memory (Byte)
system.mem_ctrl.dram.numReads::cpu.inst           229                       # Number of read requests responded to by this memory (Count)
system.mem_ctrl.dram.numReads::cpu.data           133                       # Number of read requests responded to by this memory (Count)
system.mem_ctrl.dram.numReads::total              362                       # Number of read requests responded to by this memory (Count)
system.mem_ctrl.dram.bwRead::cpu.inst       255033323                       # Total read bandwidth from this memory ((Byte/Second))
system.mem_ctrl.dram.bwRead::cpu.data       148119790                       # Total read bandwidth from this memory ((Byte/Second))
system.mem_ctrl.dram.bwRead::total          403153114                       # Total read bandwidth from this memory ((Byte/Second))
system.mem_ctrl.dram.bwInstRead::cpu.inst    255033323                       # Instruction read bandwidth from this memory ((Byte/Second))
system.mem_ctrl.dram.bwInstRead::total      255033323                       # Instruction read bandwidth from this memory ((Byte/Second))
system.mem_ctrl.dram.bwTotal::cpu.inst      255033323                       # Total bandwidth to/from this memory ((Byte/Second))
system.mem_ctrl.dram.bwTotal::cpu.data      148119790                       # Total bandwidth to/from this memory ((Byte/Second))
system.mem_ctrl.dram.bwTotal::total         403153114                       # Total bandwidth to/from this memory ((Byte/Second))
system.mem_ctrl.dram.readBursts                   362                       # Number of DRAM read bursts (Count)
system.mem_ctrl.dram.writeBursts                    0                       # Number of DRAM write bursts (Count)
相关推荐
Eloudy1 天前
learning_gem5 part1_02 创建简单的配置脚本
gem5
Eloudy1 天前
全文 -- GPU-Initiated Networking for NCCL
gpu·arch
HyperAI超神经1 天前
【TVM 教程】优化大语言模型
人工智能·语言模型·自然语言处理·cpu·gpu·编程语言·tvm
Eloudy1 天前
learning_gem5 part1_01 构建 gem5
gem5
Felven2 天前
天数智芯MR50推理卡测试
gpu·推理·mr50·天数
杰克逊的日记2 天前
slurm部署
cpu·gpu·作业管理
Eloudy2 天前
AMD Instinct MI300 系列 GPU 技术规格说明
gpu
Eloudy6 天前
GPU-Initiated Networking (GIN)及其核心硬件基础 SCI
ic·gpu
web像素之境6 天前
实时光线追踪加速硬件结构(详细版)
游戏·gpu·计算机图形学