qemu源码解析一

基于qemu9.0.0

简介

QEMU是一个开源的虚拟化软件,它能够模拟各种硬件设备,支持多种虚拟化技术,如TCG、Xen、KVM等

TCG 是 QEMU 中的一个组件,它可以将高级语言编写的代码(例如 C 代码)转换为可在虚拟机中执行的低级代码(例如 x86 机器指令)。TCG 生成的代码通常比直接使用 CPU 指令更简单、更小,但执行速度可能稍慢。同时,TCG不仅可以将高级语言代码转换为低级代码,还可以执行其他优化,例如常量折叠和死代码消除。

Xen 是一种开源虚拟化技术,它直接嵌入到 Linux 内核中。这意味着 Xen 可以直接访问硬件资源,从而提供高性能的虚拟化。然而,Xen 的配置和管理可能比较复杂。Xen 支持准虚拟化,这允许客户机操作系统直接访问某些硬件资源,从而提高性能。

KVM 是 QEMU 中最常使用的一种虚拟化技术,它利用 Linux 内核提供的虚拟化功能。KVM 的优势在于其因为它提供了良好的性能和广泛的操作系统支持。但是要注意一点的是:KVM 依赖于 Linux 内核提供的虚拟化功能,因此它仅适用于 Linux 主机操作系统在 QEMU 中,KVM 的初始化过程主要包括以下步骤:

**加载虚拟机监控器模块:**首先,需要加载 KVM 模块,以便在内核中启用虚拟化功能。这一步通常在系统启动时完成。

**创建虚拟机:**接下来,使用 QEMU 命令或 API 创建一个新的虚拟机实例。在创建过程中,需要指定虚拟机的配置参数,例如内存大小、CPU 数量等。

**分配资源:**在虚拟机创建后,需要为其分配所需的资源,包括 CPU、内存和设备。这些资源由物理硬件提供,并通过虚拟化技术映射到虚拟机上。

**启动虚拟机:**一旦资源分配完成,就可以启动虚拟机了。这时,KVM 将接管虚拟机的执行,并将其与物理硬件隔离。

**执行客户机操作系统:**客户机操作系统现在可以在虚拟机中执行,就好像它直接运行在物理硬件上一样。

相关功能的源码在

target/$(arch)/kvm.c(tcg/)

QEMU 可以模拟几百个设备:

QEMU 所有支持的机器类型QEMU 可以模拟的设备QEMU 在设备模拟上采取了前端和后端分离的设计模式:

前端:

QEMU 虚拟机管理器:负责管理虚拟机实例和提供用户界面。

ARM 虚拟化扩展 (VE):在 ARM 处理器上提供虚拟化支持。

后端:

ARM CPU 模型:模拟 ARM 处理器,包括指令集、寄存器和内存管理单元 (MMU)。

ARM 虚拟 I/O 设备模型:模拟 ARM 架构中的通用虚拟 I/O 设备,例如:virtio-blk:模 拟虚拟块设备

virtio-net:模拟虚拟网络接口

virtio-serial:模拟虚拟串行端口

检查可以支持的后端的方法(字符和网络):

QEMU 初始化过程分析

select_machine函数(选择机器类型)

/system/vl.c

此函数用于选择要运行的机器类型。它从命令行选项或默认值中获取机器类型,然后返回所选机器的 MachineClass 结构。

cpp 复制代码
static MachineClass *select_machine(QDict *qdict, Error **errp)
{
    const char *machine_type = qdict_get_try_str(qdict, "type");
    GSList *machines = object_class_get_list(TYPE_MACHINE, false);
    MachineClass *machine_class;
    Error *local_err = NULL;

    if (machine_type) {
        machine_class = find_machine(machine_type, machines);
        qdict_del(qdict, "type");
        if (!machine_class) {
            error_setg(&local_err, "unsupported machine type");
        }
    } else {
        machine_class = find_default_machine(machines);
        if (!machine_class) {
            error_setg(&local_err, "No machine specified, and there is no default");
        }
    }

    g_slist_free(machines);
    if (local_err) {
        error_append_hint(&local_err, "Use -machine help to list supported machines\n");
        error_propagate(errp, local_err);
    }
    return machine_class;
}

cpu_exec_init_all(初始化所有 CPU 的执行引擎)

io_mem_init函数

此函数初始化 I/O 内存区域。,可见其调用了**memory_region_init_io()**函数

cpp 复制代码
static void io_mem_init(void)
{
    memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
                          NULL, UINT64_MAX);
}

memory_region_init_io函数

  1. 调用**memory_region_init 函数**初始化内存区域的公共部分。
  2. 设置内存区域的操作集。如果未指定操作集,则使用 unassigned_mem_ops 默认操作集。
  3. 设置内存区域的不透明数据指针。
  4. 将内存区域标记为终止区域。这意味着当内存区域被销毁时,它将自动从其父区域中删除。

/system/memory.c

cpp 复制代码
void memory_region_init_io(MemoryRegion *mr,
                           Object *owner,
                           const MemoryRegionOps *ops,
                           void *opaque,
                           const char *name,
                           uint64_t size)
{
    memory_region_init(mr, owner, name, size);
    mr->ops = ops ? ops : &unassigned_mem_ops;
    mr->opaque = opaque;
    mr->terminates = true;
}

memory_region_init函数

用于初始化 MemoryRegion 结构,调用了**object_initialize **函数和memory_region_do_init函数

object_initialize 函数用于初始化一个对象。它执行以下操作:

  1. 分配对象的内存。
  2. 设置对象的类型。
  3. 设置对象的父对象(如果存在)。
  4. 调用对象的 init 函数(如果存在)。

/system/memory.c

cpp 复制代码
void memory_region_init(MemoryRegion *mr,
                        Object *owner,
                        const char *name,
                        uint64_t size)
{
    object_initialize(mr, sizeof(*mr), TYPE_MEMORY_REGION);
    memory_region_do_init(mr, owner, name, size);
}

memory_region_do_init函数

它执行以下操作:

  1. 设置内存区域的大小。如果大小为 UINT64_MAX,则将其设置为 INT128_MAX
  2. 设置内存区域的名称。
  3. 设置内存区域的所有者对象。
  4. 设置内存区域的设备状态对象(如果所有者对象是设备)。
  5. 设置内存区域的 RAM 块(如果存在)。
  6. 如果内存区域有名称,则将其添加到其所有者的子对象列表中。
cpp 复制代码
static void memory_region_do_init(MemoryRegion *mr,
                                  Object *owner,
                                  const char *name,
                                  uint64_t size)
{
    mr->size = int128_make64(size);
    if (size == UINT64_MAX) {
        mr->size = int128_2_64();
    }
    mr->name = g_strdup(name);
    mr->owner = owner;
    mr->dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
    mr->ram_block = NULL;

    if (name) {
        char *escaped_name = memory_region_escape_name(name);
        char *name_array = g_strdup_printf("%s[*]", escaped_name);

        if (!owner) {
            owner = container_get(qdev_get_machine(), "/unattached");
        }

        object_property_add_child(owner, name_array, OBJECT(mr));
        object_unref(OBJECT(mr));
        g_free(name_array);
        g_free(escaped_name);
    }
}

补充:

MemoryRegion 是 QEMU 中表示内存区域的抽象数据结构。它提供了一个统一的接口来访问和操作不同的类型的内存,例如物理内存、I/O 内存和设备内存。可以将 MemoryRegion 想象成一个计算机中的内存块。它有一个名称、大小和地址。你可以通过 MemoryRegion 的接口来读取和写入内存块中的数据,也可以设置回调函数来处理对内存块的访问。

/include/exec/memory.h

memory_map_init函数

  1. 分配内存:

    分配内存用于系统内存和 I/O 空间。

  2. 初始化内存区域:

    使用 memory_region_init 函数初始化系统内存区域。使用 memory_region_init_io 函数初始化 I/O 空间区域。

  3. 初始化地址空间:

    使用 address_space_init 函数初始化用于访问系统内存和 I/O 空间的地址空间。

/system/physmem.c

cpp 复制代码
static void memory_map_init(void)
{
    system_memory = g_malloc(sizeof(*system_memory));

    memory_region_init(system_memory, NULL, "system", UINT64_MAX);
    address_space_init(&address_space_memory, system_memory, "memory");

    system_io = g_malloc(sizeof(*system_io));
    memory_region_init_io(system_io, NULL, &unassigned_io_ops, NULL, "io",
                          65536);
    address_space_init(&address_space_io, system_io, "I/O");
}

通俗的讲:QEMU 是一个城市,而内存映射是城市的地图。memory_map_init 函数负责创建这个地图,它定义了城市中不同区域(内存和 I/O 空间)的位置和大小。system_memoryio_memory 是两个容器,分别代表城市中的住宅区(内存)和商业区(I/O 空间)。address_space_ioaddress_space_memory 是两张地图,分别显示如何到达住宅区和商业区。

page_size_init(初始化页大小)

configure_accelerator(配置加速器)

accel_init_machine函数

  • 将加速器与虚拟机关联起来
  • 调用加速器的 init_machine 函数来进行特定于加速器的初始化
  • 设置加速器的兼容性属性

/accl/accl-system.c

cpp 复制代码
int accel_init_machine(AccelState *accel, MachineState *ms)
{
    AccelClass *acc = ACCEL_GET_CLASS(accel);
    int ret;
    ms->accelerator = accel;
    *(acc->allowed) = true;
    ret = acc->init_machine(ms);
    if (ret < 0) {
        ms->accelerator = NULL;
        *(acc->allowed) = false;
        object_unref(OBJECT(accel));
    } else {
        object_set_accelerator_compat_props(acc->compat_props);
    }
    return ret;
}

machine_run_board_init函数(初始化机器)

machine_run_board_init 函数负责初始化虚拟机的硬件平台。

  • 检查虚拟机的内存大小是否有效
  • 创建默认的内存后端(如果需要)
  • 完成 NUMA 配置
  • 创建虚拟机的 RAM
  • 检查 CPU 类型是否受支持
  • 初始化加速器接口
  • 调用虚拟机类的 init 函数
  • 推进虚拟机生命周期到 PHASE_MACHINE_INITIALIZED 阶段

/hw/core/machine.c

cpp 复制代码
void machine_run_board_init(MachineState *machine, const char *mem_path, Error **errp)
{
    ERRP_GUARD();
    MachineClass *machine_class = MACHINE_GET_CLASS(machine);

    /* This checkpoint is required by replay to separate prior clock
       reading from the other reads, because timer polling functions query
       clock values from the log. */
    replay_checkpoint(CHECKPOINT_INIT);

    if (!xen_enabled()) {
        /* On 32-bit hosts, QEMU is limited by virtual address space */
        if (machine->ram_size > (2047 << 20) && HOST_LONG_BITS == 32) {
            error_setg(errp, "at most 2047 MB RAM can be simulated");
            return;
        }
    }

    if (machine->memdev) {
        ram_addr_t backend_size = object_property_get_uint(OBJECT(machine->memdev),
                                                           "size",  &error_abort);
        if (backend_size != machine->ram_size) {
            error_setg(errp, "Machine memory size does not match the size of the memory backend");
            return;
        }
    } else if (machine_class->default_ram_id && machine->ram_size &&
               numa_uses_legacy_mem()) {
        if (object_property_find(object_get_objects_root(),
                                 machine_class->default_ram_id)) {
            error_setg(errp, "object's id '%s' is reserved for the default"
                " RAM backend, it can't be used for any other purposes",
                machine_class->default_ram_id);
            error_append_hint(errp,
                "Change the object's 'id' to something else or disable"
                " automatic creation of the default RAM backend by setting"
                " 'memory-backend=%s' with '-machine'.\n",
                machine_class->default_ram_id);
            return;
        }
        if (!create_default_memdev(current_machine, mem_path, errp)) {
            return;
        }
    }

    if (machine->numa_state) {
        numa_complete_configuration(machine);
        if (machine->numa_state->num_nodes) {
            machine_numa_finish_cpu_init(machine);
            if (machine_class->cpu_cluster_has_numa_boundary) {
                validate_cpu_cluster_to_numa_boundary(machine);
            }
        }
    }

    if (!machine->ram && machine->memdev) {
        machine->ram = machine_consume_memdev(machine, machine->memdev);
    }

    /* Check if the CPU type is supported */
    if (machine->cpu_type && !is_cpu_type_supported(machine, errp)) {
        return;
    }

    if (machine->cgs) {
        /*
         * With confidential guests, the host can't see the real
         * contents of RAM, so there's no point in it trying to merge
         * areas.
         */
        machine_set_mem_merge(OBJECT(machine), false, &error_abort);

        /*
         * Virtio devices can't count on directly accessing guest
         * memory, so they need iommu_platform=on to use normal DMA
         * mechanisms.  That requires also disabling legacy virtio
         * support for those virtio pci devices which allow it.
         */
        object_register_sugar_prop(TYPE_VIRTIO_PCI, "disable-legacy",
                                   "on", true);
        object_register_sugar_prop(TYPE_VIRTIO_DEVICE, "iommu_platform",
                                   "on", false);
    }

    accel_init_interfaces(ACCEL_GET_CLASS(machine->accelerator));
    machine_class->init(machine);
    phase_advance(PHASE_MACHINE_INITIALIZED);
}

pc_init1函数

该函数初始化 PC 特定的设置,包括创建 CPU 和内存。

  1. 内存分配和 ROM/BIOS 加载

    • 为 RAM 分配内存并从 ROM/BIOS 加载固件。
    • 如果启用了 Xen,则使用 Xen 特定的内存设置。
  2. PCI 总线初始化(如果启用)

    • 创建 PCI 主桥设备并将其连接到系统内存、I/O 和 PCI 内存。
    • 设置 PCI 总线大小和 PCI 孔位 64 位地址空间大小。
    • 将 PCI 设备映射到中断请求 (IRQ)。
  3. ISA 总线初始化(如果 PCI 未启用)

    • 创建 ISA 总线并将其连接到系统内存和 I/O。
    • 注册 ISA 总线输入 IRQ。
  4. 基本设备初始化

    • 初始化基本 PC 硬件,包括:
      • 实时时钟 (RTC)
      • 可编程中断控制器 (PIC)
      • 串口和并口
      • 超级 I/O 设备
  5. 网络设备初始化

    • 根据机器类型初始化网络设备。
  6. IDE 设备初始化(如果 ISA 总线启用)

    • 初始化 IDE 控制器和设备。
  7. ACPI 初始化(如果启用)

    • 创建 ACPI 设备并将其连接到 SMBus 和 SMI 中断。
  8. NV DIMM 初始化(如果启用)

    • 初始化 NV DIMM ACPI 状态,使其与系统 I/O 和固件配置表 (FW_CFG) 交互。
  9. 其他设备初始化

    • 初始化 VGA 控制器。
    • 根据配置设置虚拟机端口 (VMP)。

/hw/i386/pc_piix.c

cpp 复制代码
/* PC hardware initialisation */
static void pc_init1(MachineState *machine, const char *pci_type)
{
    PCMachineState *pcms = PC_MACHINE(machine);
    PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
    X86MachineState *x86ms = X86_MACHINE(machine);
    MemoryRegion *system_memory = get_system_memory();
    MemoryRegion *system_io = get_system_io();
    Object *phb = NULL;
    ISABus *isa_bus;
    Object *piix4_pm = NULL;
    qemu_irq smi_irq;
    GSIState *gsi_state;
    MemoryRegion *ram_memory;
    MemoryRegion *pci_memory = NULL;
    MemoryRegion *rom_memory = system_memory;
    ram_addr_t lowmem;
    uint64_t hole64_size = 0;

    /*
     * Calculate ram split, for memory below and above 4G.  It's a bit
     * complicated for backward compatibility reasons ...
     *
     *  - Traditional split is 3.5G (lowmem = 0xe0000000).  This is the
     *    default value for max_ram_below_4g now.
     *
     *  - Then, to gigabyte align the memory, we move the split to 3G
     *    (lowmem = 0xc0000000).  But only in case we have to split in
     *    the first place, i.e. ram_size is larger than (traditional)
     *    lowmem.  And for new machine types (gigabyte_align = true)
     *    only, for live migration compatibility reasons.
     *
     *  - Next the max-ram-below-4g option was added, which allowed to
     *    reduce lowmem to a smaller value, to allow a larger PCI I/O
     *    window below 4G.  qemu doesn't enforce gigabyte alignment here,
     *    but prints a warning.
     *
     *  - Finally max-ram-below-4g got updated to also allow raising lowmem,
     *    so legacy non-PAE guests can get as much memory as possible in
     *    the 32bit address space below 4G.
     *
     *  - Note that Xen has its own ram setup code in xen_ram_init(),
     *    called via xen_hvm_init_pc().
     *
     * Examples:
     *    qemu -M pc-1.7 -m 4G    (old default)    -> 3584M low,  512M high
     *    qemu -M pc -m 4G        (new default)    -> 3072M low, 1024M high
     *    qemu -M pc,max-ram-below-4g=2G -m 4G     -> 2048M low, 2048M high
     *    qemu -M pc,max-ram-below-4g=4G -m 3968M  -> 3968M low (=4G-128M)
     */
    if (xen_enabled()) {
        xen_hvm_init_pc(pcms, &ram_memory);
    } else {
        ram_memory = machine->ram;
        if (!pcms->max_ram_below_4g) {
            pcms->max_ram_below_4g = 0xe0000000; /* default: 3.5G */
        }
        lowmem = pcms->max_ram_below_4g;
        if (machine->ram_size >= pcms->max_ram_below_4g) {
            if (pcmc->gigabyte_align) {
                if (lowmem > 0xc0000000) {
                    lowmem = 0xc0000000;
                }
                if (lowmem & (1 * GiB - 1)) {
                    warn_report("Large machine and max_ram_below_4g "
                                "(%" PRIu64 ") not a multiple of 1G; "
                                "possible bad performance.",
                                pcms->max_ram_below_4g);
                }
            }
        }

        if (machine->ram_size >= lowmem) {
            x86ms->above_4g_mem_size = machine->ram_size - lowmem;
            x86ms->below_4g_mem_size = lowmem;
        } else {
            x86ms->above_4g_mem_size = 0;
            x86ms->below_4g_mem_size = machine->ram_size;
        }
    }

    pc_machine_init_sgx_epc(pcms);
    x86_cpus_init(x86ms, pcmc->default_cpu_version);

    if (kvm_enabled()) {
        kvmclock_create(pcmc->kvmclock_create_always);
    }

    if (pcmc->pci_enabled) {
        pci_memory = g_new(MemoryRegion, 1);
        memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
        rom_memory = pci_memory;

        phb = OBJECT(qdev_new(TYPE_I440FX_PCI_HOST_BRIDGE));
        object_property_add_child(OBJECT(machine), "i440fx", phb);
        object_property_set_link(phb, PCI_HOST_PROP_RAM_MEM,
                                 OBJECT(ram_memory), &error_fatal);
        object_property_set_link(phb, PCI_HOST_PROP_PCI_MEM,
                                 OBJECT(pci_memory), &error_fatal);
        object_property_set_link(phb, PCI_HOST_PROP_SYSTEM_MEM,
                                 OBJECT(system_memory), &error_fatal);
        object_property_set_link(phb, PCI_HOST_PROP_IO_MEM,
                                 OBJECT(system_io), &error_fatal);
        object_property_set_uint(phb, PCI_HOST_BELOW_4G_MEM_SIZE,
                                 x86ms->below_4g_mem_size, &error_fatal);
        object_property_set_uint(phb, PCI_HOST_ABOVE_4G_MEM_SIZE,
                                 x86ms->above_4g_mem_size, &error_fatal);
        object_property_set_str(phb, I440FX_HOST_PROP_PCI_TYPE, pci_type,
                                &error_fatal);
        sysbus_realize_and_unref(SYS_BUS_DEVICE(phb), &error_fatal);

        pcms->pcibus = PCI_BUS(qdev_get_child_bus(DEVICE(phb), "pci.0"));
        pci_bus_map_irqs(pcms->pcibus,
                         xen_enabled() ? xen_pci_slot_get_pirq
                                       : pc_pci_slot_get_pirq);

        hole64_size = object_property_get_uint(phb,
                                               PCI_HOST_PROP_PCI_HOLE64_SIZE,
                                               &error_abort);
    }

    /* allocate ram and load rom/bios */
    if (!xen_enabled()) {
        pc_memory_init(pcms, system_memory, rom_memory, hole64_size);
    } else {
        assert(machine->ram_size == x86ms->below_4g_mem_size +
                                    x86ms->above_4g_mem_size);

        pc_system_flash_cleanup_unused(pcms);
        if (machine->kernel_filename != NULL) {
            /* For xen HVM direct kernel boot, load linux here */
            xen_load_linux(pcms);
        }
    }

    gsi_state = pc_gsi_create(&x86ms->gsi, pcmc->pci_enabled);

    if (pcmc->pci_enabled) {
        PCIDevice *pci_dev;
        DeviceState *dev;
        size_t i;

        pci_dev = pci_new_multifunction(-1, pcms->south_bridge);
        object_property_set_bool(OBJECT(pci_dev), "has-usb",
                                 machine_usb(machine), &error_abort);
        object_property_set_bool(OBJECT(pci_dev), "has-acpi",
                                 x86_machine_is_acpi_enabled(x86ms),
                                 &error_abort);
        object_property_set_bool(OBJECT(pci_dev), "has-pic", false,
                                 &error_abort);
        object_property_set_bool(OBJECT(pci_dev), "has-pit", false,
                                 &error_abort);
        qdev_prop_set_uint32(DEVICE(pci_dev), "smb_io_base", 0xb100);
        object_property_set_bool(OBJECT(pci_dev), "smm-enabled",
                                 x86_machine_is_smm_enabled(x86ms),
                                 &error_abort);
        dev = DEVICE(pci_dev);
        for (i = 0; i < ISA_NUM_IRQS; i++) {
            qdev_connect_gpio_out_named(dev, "isa-irqs", i, x86ms->gsi[i]);
        }
        pci_realize_and_unref(pci_dev, pcms->pcibus, &error_fatal);

        if (xen_enabled()) {
            pci_device_set_intx_routing_notifier(
                        pci_dev, piix_intx_routing_notifier_xen);

            /*
             * Xen supports additional interrupt routes from the PCI devices to
             * the IOAPIC: the four pins of each PCI device on the bus are also
             * connected to the IOAPIC directly.
             * These additional routes can be discovered through ACPI.
             */
            pci_bus_irqs(pcms->pcibus, xen_intx_set_irq, pci_dev,
                         XEN_IOAPIC_NUM_PIRQS);
        }

        isa_bus = ISA_BUS(qdev_get_child_bus(DEVICE(pci_dev), "isa.0"));
        x86ms->rtc = ISA_DEVICE(object_resolve_path_component(OBJECT(pci_dev),
                                                              "rtc"));
        piix4_pm = object_resolve_path_component(OBJECT(pci_dev), "pm");
        dev = DEVICE(object_resolve_path_component(OBJECT(pci_dev), "ide"));
        pci_ide_create_devs(PCI_DEVICE(dev));
        pcms->idebus[0] = qdev_get_child_bus(dev, "ide.0");
        pcms->idebus[1] = qdev_get_child_bus(dev, "ide.1");
    } else {
        isa_bus = isa_bus_new(NULL, system_memory, system_io,
                              &error_abort);
        isa_bus_register_input_irqs(isa_bus, x86ms->gsi);

        x86ms->rtc = isa_new(TYPE_MC146818_RTC);
        qdev_prop_set_int32(DEVICE(x86ms->rtc), "base_year", 2000);
        isa_realize_and_unref(x86ms->rtc, isa_bus, &error_fatal);

        i8257_dma_init(OBJECT(machine), isa_bus, 0);
        pcms->hpet_enabled = false;
    }

    if (x86ms->pic == ON_OFF_AUTO_ON || x86ms->pic == ON_OFF_AUTO_AUTO) {
        pc_i8259_create(isa_bus, gsi_state->i8259_irq);
    }

    if (phb) {
        ioapic_init_gsi(gsi_state, phb);
    }

    if (tcg_enabled()) {
        x86_register_ferr_irq(x86ms->gsi[13]);
    }

    pc_vga_init(isa_bus, pcmc->pci_enabled ? pcms->pcibus : NULL);

    assert(pcms->vmport != ON_OFF_AUTO__MAX);
    if (pcms->vmport == ON_OFF_AUTO_AUTO) {
        pcms->vmport = xen_enabled() ? ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON;
    }

    /* init basic PC hardware */
    pc_basic_device_init(pcms, isa_bus, x86ms->gsi, x86ms->rtc, true,
                         0x4);

    pc_nic_init(pcmc, isa_bus, pcms->pcibus);

#ifdef CONFIG_IDE_ISA
    if (!pcmc->pci_enabled) {
        DriveInfo *hd[MAX_IDE_BUS * MAX_IDE_DEVS];
        int i;

        ide_drive_get(hd, ARRAY_SIZE(hd));
        for (i = 0; i < MAX_IDE_BUS; i++) {
            ISADevice *dev;
            char busname[] = "ide.0";
            dev = isa_ide_init(isa_bus, ide_iobase[i], ide_iobase2[i],
                               ide_irq[i],
                               hd[MAX_IDE_DEVS * i], hd[MAX_IDE_DEVS * i + 1]);
            /*
             * The ide bus name is ide.0 for the first bus and ide.1 for the
             * second one.
             */
            busname[4] = '0' + i;
            pcms->idebus[i] = qdev_get_child_bus(DEVICE(dev), busname);
        }
    }
#endif

    if (piix4_pm) {
        smi_irq = qemu_allocate_irq(pc_acpi_smi_interrupt, first_cpu, 0);

        qdev_connect_gpio_out_named(DEVICE(piix4_pm), "smi-irq", 0, smi_irq);
        pcms->smbus = I2C_BUS(qdev_get_child_bus(DEVICE(piix4_pm), "i2c"));
        /* TODO: Populate SPD eeprom data.  */
        smbus_eeprom_init(pcms->smbus, 8, NULL, 0);

        object_property_add_link(OBJECT(machine), PC_MACHINE_ACPI_DEVICE_PROP,
                                 TYPE_HOTPLUG_HANDLER,
                                 (Object **)&x86ms->acpi_dev,
                                 object_property_allow_set_link,
                                 OBJ_PROP_LINK_STRONG);
        object_property_set_link(OBJECT(machine), PC_MACHINE_ACPI_DEVICE_PROP,
                                 piix4_pm, &error_abort);
    }

    if (machine->nvdimms_state->is_enabled) {
        nvdimm_init_acpi_state(machine->nvdimms_state, system_io,
                               x86_nvdimm_acpi_dsmio,
                               x86ms->fw_cfg, OBJECT(pcms));
    }
}
创建和初始化CPU
  1. pc_init1:初始化 PC 特定的设置,包括创建 CPU 和内存。
  2. x86_cpus_init:根据配置创建和初始化多个 CPU。
  3. x86_cpu_new:创建一个新的 X86CPU 设备。
  4. qdev_realize:经过 QOM 的 object_property 机制,最后调用到 device_set_realized
  5. device_set_realized:标记设备已实现,并调用设备的 realize 函数。
  6. x86_cpu_realizefn:X86CPU 设备的 realize 函数,负责初始化 CPU 的寄存器、内存映射和中断。
x86_cpus_init函数
  1. 设置默认 CPU 版

  2. 计算 CPU APIC ID 限制( 计算 CPU APIC ID 的最大值,以确保所有 CPU APIC ID 都小于此限制**)**

  3. 检查 APIC ID 255 或更高( 如果启用了 KVM 并且 APIC ID 限制大于 255,则检查是否启用了内核中的 lapic 和 X2APIC 用户空间 API**)**

  4. 设置 KVM 最大 APIC ID( 如果启用了 KVM,则设置 KVM 的最大 APIC ID**)**

  5. 设置 APIC 最大 APIC ID( 如果内核中没有 irqchip,则设置 APIC 的最大 APIC ID**)**

  6. 获取可能的 CPU 架构 ID 列表( 获取机器类支持的可能 CPU 架构 ID 列表**)**

  7. 创建 CPU( 对于每个 CPU,创建并初始化一个新的 CPU**)**

/hw/i386/x86.c

cpp 复制代码
void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
{
    int i;
    const CPUArchIdList *possible_cpus;
    MachineState *ms = MACHINE(x86ms);
    MachineClass *mc = MACHINE_GET_CLASS(x86ms);

    x86_cpu_set_default_version(default_cpu_version);

    /*
     * Calculates the limit to CPU APIC ID values
     *
     * Limit for the APIC ID value, so that all
     * CPU APIC IDs are < x86ms->apic_id_limit.
     *
     * This is used for FW_CFG_MAX_CPUS. See comments on fw_cfg_arch_create().
     */
    x86ms->apic_id_limit = x86_cpu_apic_id_from_index(x86ms,
                                                      ms->smp.max_cpus - 1) + 1;

    /*
     * Can we support APIC ID 255 or higher?  With KVM, that requires
     * both in-kernel lapic and X2APIC userspace API.
     *
     * kvm_enabled() must go first to ensure that kvm_* references are
     * not emitted for the linker to consume (kvm_enabled() is
     * a literal `0` in configurations where kvm_* aren't defined)
     */
    if (kvm_enabled() && x86ms->apic_id_limit > 255 &&
        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
        error_report("current -smp configuration requires kernel "
                     "irqchip and X2APIC API support.");
        exit(EXIT_FAILURE);
    }

    if (kvm_enabled()) {
        kvm_set_max_apic_id(x86ms->apic_id_limit);
    }

    if (!kvm_irqchip_in_kernel()) {
        apic_set_max_apic_id(x86ms->apic_id_limit);
    }

    possible_cpus = mc->possible_cpu_arch_ids(ms);
    for (i = 0; i < ms->smp.cpus; i++) {
        x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
    }
}
x86_cpu_new函数
  1. 创建 CPU 对象

  2. 设置 APIC ID

  3. 实现 CPU

  4. 清理( 取消引用 CPU 对象**)**

/hw/i386/x86.c

cpp 复制代码
void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, Error **errp)
{
    Object *cpu = object_new(MACHINE(x86ms)->cpu_type);

    if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
        goto out;
    }
    qdev_realize(DEVICE(cpu), NULL, errp);

out:
    object_unref(cpu);
}
qdev_realize函数

该函数负责实现设备

/hw/i386/x86.c

cpp 复制代码
bool qdev_realize(DeviceState *dev, BusState *bus, Error **errp)
{
    assert(!dev->realized && !dev->parent_bus);

    if (bus) {
        if (!qdev_set_parent_bus(dev, bus, errp)) {
            return false;
        }
    } else {
        assert(!DEVICE_GET_CLASS(dev)->bus_type);
    }

    return object_property_set_bool(OBJECT(dev), "realized", true, errp);
}
device_set_realized函数
  • 设置设备的已实现标志
  • 调用设备类的 realize 函数(如果存在)
  • 调用设备监听器的 realize 函数
  • 设置设备的规范路径
  • 注册设备的 VM 状态(如果存在)
  • 实现设备的子总线
  • 如果设备是热插拔的,则复位设备并将其插入父总线
  • 设置设备的挂起已删除事件标志
  • 调用设备的热插拔处理程序(如果存在)
  • 释放与设备关联的内存
  • 取消实现设备的子总线
  • 取消注册设备的 VM 状态(如果存在)
  • 设置设备的规范路径为 NULL
  • 调用设备类的 unrealize 函数(如果存在)
  • 调用设备监听器的 unrealize 函数
  • 设置设备的已实现标志为 false

/hw/core/qdev.c

cpp 复制代码
static void device_set_realized(Object *obj, bool value, Error **errp)
{
    DeviceState *dev = DEVICE(obj);
    DeviceClass *dc = DEVICE_GET_CLASS(dev);
    HotplugHandler *hotplug_ctrl;
    BusState *bus;
    NamedClockList *ncl;
    Error *local_err = NULL;
    bool unattached_parent = false;
    static int unattached_count;

    if (dev->hotplugged && !dc->hotpluggable) {
        error_setg(errp, QERR_DEVICE_NO_HOTPLUG, object_get_typename(obj));
        return;
    }

    if (value && !dev->realized) {
        if (!check_only_migratable(obj, errp)) {
            goto fail;
        }

        if (!obj->parent) {
            gchar *name = g_strdup_printf("device[%d]", unattached_count++);

            object_property_add_child(container_get(qdev_get_machine(),
                                                    "/unattached"),
                                      name, obj);
            unattached_parent = true;
            g_free(name);
        }

        hotplug_ctrl = qdev_get_hotplug_handler(dev);
        if (hotplug_ctrl) {
            hotplug_handler_pre_plug(hotplug_ctrl, dev, &local_err);
            if (local_err != NULL) {
                goto fail;
            }
        }

        if (dc->realize) {
            dc->realize(dev, &local_err);
            if (local_err != NULL) {
                goto fail;
            }
        }

        DEVICE_LISTENER_CALL(realize, Forward, dev);

        /*
         * always free/re-initialize here since the value cannot be cleaned up
         * in device_unrealize due to its usage later on in the unplug path
         */
        g_free(dev->canonical_path);
        dev->canonical_path = object_get_canonical_path(OBJECT(dev));
        QLIST_FOREACH(ncl, &dev->clocks, node) {
            if (ncl->alias) {
                continue;
            } else {
                clock_setup_canonical_path(ncl->clock);
            }
        }

        if (qdev_get_vmsd(dev)) {
            if (vmstate_register_with_alias_id(VMSTATE_IF(dev),
                                               VMSTATE_INSTANCE_ID_ANY,
                                               qdev_get_vmsd(dev), dev,
                                               dev->instance_id_alias,
                                               dev->alias_required_for_version,
                                               &local_err) < 0) {
                goto post_realize_fail;
            }
        }

        /*
         * Clear the reset state, in case the object was previously unrealized
         * with a dirty state.
         */
        resettable_state_clear(&dev->reset);

        QLIST_FOREACH(bus, &dev->child_bus, sibling) {
            if (!qbus_realize(bus, errp)) {
                goto child_realize_fail;
            }
        }
        if (dev->hotplugged) {
            /*
             * Reset the device, as well as its subtree which, at this point,
             * should be realized too.
             */
            resettable_assert_reset(OBJECT(dev), RESET_TYPE_COLD);
            resettable_change_parent(OBJECT(dev), OBJECT(dev->parent_bus),
                                     NULL);
            resettable_release_reset(OBJECT(dev), RESET_TYPE_COLD);
        }
        dev->pending_deleted_event = false;

        if (hotplug_ctrl) {
            hotplug_handler_plug(hotplug_ctrl, dev, &local_err);
            if (local_err != NULL) {
                goto child_realize_fail;
            }
       }

       qatomic_store_release(&dev->realized, value);

    } else if (!value && dev->realized) {

        /*
         * Change the value so that any concurrent users are aware
         * that the device is going to be unrealized
         *
         * TODO: change .realized property to enum that states
         * each phase of the device realization/unrealization
         */

        qatomic_set(&dev->realized, value);
        /*
         * Ensure that concurrent users see this update prior to
         * any other changes done by unrealize.
         */
        smp_wmb();

        QLIST_FOREACH(bus, &dev->child_bus, sibling) {
            qbus_unrealize(bus);
        }
        if (qdev_get_vmsd(dev)) {
            vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
        }
        if (dc->unrealize) {
            dc->unrealize(dev);
        }
        dev->pending_deleted_event = true;
        DEVICE_LISTENER_CALL(unrealize, Reverse, dev);
    }

    assert(local_err == NULL);
    return;

child_realize_fail:
    QLIST_FOREACH(bus, &dev->child_bus, sibling) {
        qbus_unrealize(bus);
    }

    if (qdev_get_vmsd(dev)) {
        vmstate_unregister(VMSTATE_IF(dev), qdev_get_vmsd(dev), dev);
    }

post_realize_fail:
    g_free(dev->canonical_path);
    dev->canonical_path = NULL;
    if (dc->unrealize) {
        dc->unrealize(dev);
    }

fail:
    error_propagate(errp, local_err);
    if (unattached_parent) {
        /*
         * Beware, this doesn't just revert
         * object_property_add_child(), it also runs bus_remove()!
         */
        object_unparent(OBJECT(dev));
        unattached_count--;
    }
}
x86_cpu_realizefn函数

该函数负责实现 x86 CPU。其主要功能包括:

* 初始化 CPU 状态,包括 APIC ID、Hyper-V 增强功能、CPU 特性等。

* 调用框架实现函数,执行 CPU 特定的初始化。

* 检查主机 CPUID 要求,确保加速器支持请求的特性。

* 设置微码版本、MWAIT 扩展信息、物理位数等 CPU 参数。

* 初始化缓存信息。

* 创建 APIC(仅限 KVM)。

* 初始化机器检查异常 (MCE)。

* 初始化 VCPU。

* 警告超线程问题(如果存在)。

* 实现 APIC(仅限 KVM)。

* 重置 CPU。

* 调用 CPU 类父类的实现函数。

* 释放与 CPU 关联的内存。

/target/i386/cpu.c

cpp 复制代码
static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
{
    CPUState *cs = CPU(dev);
    X86CPU *cpu = X86_CPU(dev);
    X86CPUClass *xcc = X86_CPU_GET_CLASS(dev);
    CPUX86State *env = &cpu->env;
    Error *local_err = NULL;
    static bool ht_warned;
    unsigned requested_lbr_fmt;

#if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
    /* Use pc-relative instructions in system-mode */
    cs->tcg_cflags |= CF_PCREL;
#endif

    if (cpu->apic_id == UNASSIGNED_APIC_ID) {
        error_setg(errp, "apic-id property was not initialized properly");
        return;
    }

    /*
     * Process Hyper-V enlightenments.
     * Note: this currently has to happen before the expansion of CPU features.
     */
    x86_cpu_hyperv_realize(cpu);

    x86_cpu_expand_features(cpu, &local_err);
    if (local_err) {
        goto out;
    }

    /*
     * Override env->features[FEAT_PERF_CAPABILITIES].LBR_FMT
     * with user-provided setting.
     */
    if (cpu->lbr_fmt != ~PERF_CAP_LBR_FMT) {
        if ((cpu->lbr_fmt & PERF_CAP_LBR_FMT) != cpu->lbr_fmt) {
            error_setg(errp, "invalid lbr-fmt");
            return;
        }
        env->features[FEAT_PERF_CAPABILITIES] &= ~PERF_CAP_LBR_FMT;
        env->features[FEAT_PERF_CAPABILITIES] |= cpu->lbr_fmt;
    }

    /*
     * vPMU LBR is supported when 1) KVM is enabled 2) Option pmu=on and
     * 3)vPMU LBR format matches that of host setting.
     */
    requested_lbr_fmt =
        env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_LBR_FMT;
    if (requested_lbr_fmt && kvm_enabled()) {
        uint64_t host_perf_cap =
            x86_cpu_get_supported_feature_word(FEAT_PERF_CAPABILITIES, false);
        unsigned host_lbr_fmt = host_perf_cap & PERF_CAP_LBR_FMT;

        if (!cpu->enable_pmu) {
            error_setg(errp, "vPMU: LBR is unsupported without pmu=on");
            return;
        }
        if (requested_lbr_fmt != host_lbr_fmt) {
            error_setg(errp, "vPMU: the lbr-fmt value (0x%x) does not match "
                        "the host value (0x%x).",
                        requested_lbr_fmt, host_lbr_fmt);
            return;
        }
    }

    x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);

    if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) {
        error_setg(&local_err,
                   accel_uses_host_cpuid() ?
                       "Host doesn't support requested features" :
                       "TCG doesn't support requested features");
        goto out;
    }

    /* On AMD CPUs, some CPUID[8000_0001].EDX bits must match the bits on
     * CPUID[1].EDX.
     */
    if (IS_AMD_CPU(env)) {
        env->features[FEAT_8000_0001_EDX] &= ~CPUID_EXT2_AMD_ALIASES;
        env->features[FEAT_8000_0001_EDX] |= (env->features[FEAT_1_EDX]
           & CPUID_EXT2_AMD_ALIASES);
    }

    x86_cpu_set_sgxlepubkeyhash(env);

    /*
     * note: the call to the framework needs to happen after feature expansion,
     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
     * These may be set by the accel-specific code,
     * and the results are subsequently checked / assumed in this function.
     */
    cpu_exec_realizefn(cs, &local_err);
    if (local_err != NULL) {
        error_propagate(errp, local_err);
        return;
    }

    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
        goto out;
    }

    if (cpu->ucode_rev == 0) {
        /*
         * The default is the same as KVM's. Note that this check
         * needs to happen after the evenual setting of ucode_rev in
         * accel-specific code in cpu_exec_realizefn.
         */
        if (IS_AMD_CPU(env)) {
            cpu->ucode_rev = 0x01000065;
        } else {
            cpu->ucode_rev = 0x100000000ULL;
        }
    }

    /*
     * mwait extended info: needed for Core compatibility
     * We always wake on interrupt even if host does not have the capability.
     *
     * requires the accel-specific code in cpu_exec_realizefn to
     * have already acquired the CPUID data into cpu->mwait.
     */
    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;

    /* For 64bit systems think about the number of physical bits to present.
     * ideally this should be the same as the host; anything other than matching
     * the host can cause incorrect guest behaviour.
     * QEMU used to pick the magic value of 40 bits that corresponds to
     * consumer AMD devices but nothing else.
     *
     * Note that this code assumes features expansion has already been done
     * (as it checks for CPUID_EXT2_LM), and also assumes that potential
     * phys_bits adjustments to match the host have been already done in
     * accel-specific code in cpu_exec_realizefn.
     */
    if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
        if (cpu->phys_bits &&
            (cpu->phys_bits > TARGET_PHYS_ADDR_SPACE_BITS ||
            cpu->phys_bits < 32)) {
            error_setg(errp, "phys-bits should be between 32 and %u "
                             " (but is %u)",
                             TARGET_PHYS_ADDR_SPACE_BITS, cpu->phys_bits);
            return;
        }
        /*
         * 0 means it was not explicitly set by the user (or by machine
         * compat_props or by the host code in host-cpu.c).
         * In this case, the default is the value used by TCG (40).
         */
        if (cpu->phys_bits == 0) {
            cpu->phys_bits = TCG_PHYS_ADDR_BITS;
        }
    } else {
        /* For 32 bit systems don't use the user set value, but keep
         * phys_bits consistent with what we tell the guest.
         */
        if (cpu->phys_bits != 0) {
            error_setg(errp, "phys-bits is not user-configurable in 32 bit");
            return;
        }

        if (env->features[FEAT_1_EDX] & (CPUID_PSE36 | CPUID_PAE)) {
            cpu->phys_bits = 36;
        } else {
            cpu->phys_bits = 32;
        }
    }

    /* Cache information initialization */
    if (!cpu->legacy_cache) {
        const CPUCaches *cache_info =
            x86_cpu_get_versioned_cache_info(cpu, xcc->model);

        if (!xcc->model || !cache_info) {
            g_autofree char *name = x86_cpu_class_get_model_name(xcc);
            error_setg(errp,
                       "CPU model '%s' doesn't support legacy-cache=off", name);
            return;
        }
        env->cache_info_cpuid2 = env->cache_info_cpuid4 = env->cache_info_amd =
            *cache_info;
    } else {
        /* Build legacy cache information */
        env->cache_info_cpuid2.l1d_cache = &legacy_l1d_cache;
        env->cache_info_cpuid2.l1i_cache = &legacy_l1i_cache;
        env->cache_info_cpuid2.l2_cache = &legacy_l2_cache_cpuid2;
        env->cache_info_cpuid2.l3_cache = &legacy_l3_cache;

        env->cache_info_cpuid4.l1d_cache = &legacy_l1d_cache;
        env->cache_info_cpuid4.l1i_cache = &legacy_l1i_cache;
        env->cache_info_cpuid4.l2_cache = &legacy_l2_cache;
        env->cache_info_cpuid4.l3_cache = &legacy_l3_cache;

        env->cache_info_amd.l1d_cache = &legacy_l1d_cache_amd;
        env->cache_info_amd.l1i_cache = &legacy_l1i_cache_amd;
        env->cache_info_amd.l2_cache = &legacy_l2_cache_amd;
        env->cache_info_amd.l3_cache = &legacy_l3_cache;
    }

#ifndef CONFIG_USER_ONLY
    MachineState *ms = MACHINE(qdev_get_machine());
    qemu_register_reset(x86_cpu_machine_reset_cb, cpu);

    if (cpu->env.features[FEAT_1_EDX] & CPUID_APIC || ms->smp.cpus > 1) {
        x86_cpu_apic_create(cpu, &local_err);
        if (local_err != NULL) {
            goto out;
        }
    }
#endif

    mce_init(cpu);

    qemu_init_vcpu(cs);

    /*
     * Most Intel and certain AMD CPUs support hyperthreading. Even though QEMU
     * fixes this issue by adjusting CPUID_0000_0001_EBX and CPUID_8000_0008_ECX
     * based on inputs (sockets,cores,threads), it is still better to give
     * users a warning.
     *
     * NOTE: the following code has to follow qemu_init_vcpu(). Otherwise
     * cs->nr_threads hasn't be populated yet and the checking is incorrect.
     */
    if (IS_AMD_CPU(env) &&
        !(env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) &&
        cs->nr_threads > 1 && !ht_warned) {
            warn_report("This family of AMD CPU doesn't support "
                        "hyperthreading(%d)",
                        cs->nr_threads);
            error_printf("Please configure -smp options properly"
                         " or try enabling topoext feature.\n");
            ht_warned = true;
    }

#ifndef CONFIG_USER_ONLY
    x86_cpu_apic_realize(cpu, &local_err);
    if (local_err != NULL) {
        goto out;
    }
#endif /* !CONFIG_USER_ONLY */
    cpu_reset(cs);

    xcc->parent_realize(dev, &local_err);

out:
    if (local_err != NULL) {
        error_propagate(errp, local_err);
        return;
    }
}
初始化 PC 的内存和固件
  1. 初始化内存并将其添加到系统中。
  2. 加载 BIOS 映像。
  3. 将 BIOS 映像添加到 ROM 列表中。
  4. 将 ROM 列表插入到系统中。
  5. 将 BIOS 的最后 128KB 映射到 ISA 空间。
  6. 将所有 BIOS 映射到内存顶部。
  7. 创建可选 ROM 区域。
  8. 创建 FWCfgState 并初始化参数。
  9. 使用 FWCfgState 初始化全局 fw_cfg。
  10. 如果指定了内核,则加载内核。
  11. 添加 ROM 镜像。
相关推荐
努力的小雨1 天前
微服务架构——不可或缺的注册中心
分布式·源码分析
血染河山10 天前
Paimon lookup store 实现
源码分析·data-lake
ywang_wnlo17 天前
【Kenel】基于 QEMU 的 Linux 内核编译和安装
linux·qemu·kernel
ywang_wnlo21 天前
【Kernel】基于 QEMU 的 Linux 内核编译和安装
linux·qemu·kernel
liangsheng_g1 个月前
基于Kafka2.1解读Producer原理
java·kafka·源码分析
努力的小雨1 个月前
深入解析Spring AI框架:在Java应用中实现智能化交互的关键
源码分析
努力的小雨1 个月前
深入探索Spring AI:源码分析流式回答
源码分析
努力的小雨1 个月前
深度解析Spring AI:请求与响应机制的核心逻辑
源码分析·ai智能
plmm烟酒僧1 个月前
qemu模拟arm64环境-构建6.1内核以及debian12
linux·debian·qemu·虚拟机·香橙派·aarch64