目录
[2.4、virtio_mmio驱动侧实现(Linux 内核)](#2.4、virtio_mmio驱动侧实现(Linux 内核))
[2.4.2、virtio-mmio probe函数](#2.4.2、virtio-mmio probe函数)
2.5、基于virtio-mmio的virtio-driver
一、MMIO
与物理机上的物理地址空间类似,虚拟机中的IPA空间包含用于访问内存和外围设备的区域,如图1所示:

图1. IPA 地址空间
虚拟机可以使用外围区域(peripheral)来访问真实的物理外设和虚拟外围设备,虚拟外设完全由hypervisor以软件形式模拟,如图2所示:

图2. 虚拟外设和已分配外设的第二阶段映射
已分配的外围设备是指已分配给虚拟机并映射到其IPA空间的真实物理设备,通过映射可以使虚拟机内部运行的软件能够直接与物理设备进行交互。
虚拟外设是指Hypervisor在软件中模拟的外设。相应的第二阶段表条目将会被标记为Fault,虚拟机中的软件认为它可以直接与物理外设通信,但每次访问都会触发Stage 2 Fault,Hypervisor会在异常处理程序中模拟虚拟机对外设的访问。
如图3所示,说明了捕获并模拟访问MMIO的过程

图3. 模拟访问 MMIO 的示例
(1)、虚拟机中的VCPU尝试接收外设UART中的数据。
(2)、该访问会在第 2 阶段转换时被阻止,触发 Stage-2 Fault 并路由到 EL2。
该操作会将有关异常的信息填充到 ESR_EL2 中,包括访问的字节数、目标寄存器以及是加载还是存储。
还会将中止访问的 IPA 填充到 HPFAR_EL2 中。
(3)、Hypervisor 利用 ESR_EL2 和 HPFAR_EL2 中的信息来识别所访问的外设寄存器。这些信息能让Hypervisor能够模拟该操作。然后通过 ERET 将结果返回给 VCPU。
(4)、VCPU返回虚拟机之后会从 LDR 之后的指令开始执行。
二、virtio-mmio
virtio-mmio 是 virtio 框架的一种传输层实现,是一种半虚拟化设备通信协议,通过MMIO来实现将 virtio 设备的一组控制寄存器和队列区域直接映射到虚拟机的物理地址空间,虚拟机通过读写这些内存地址来与设备交互。
virtio-mmio的实现主要包含设备(QEMU、KVM)和驱动(例如linux中的virtio-mmio 驱动)两部分,其核心就是定义好内存映射寄存器布局和中断机制,让双方通过共享内存和MMIO 通信。
2.1、设备资源的定义
bash
virtio_mmio@a003a00 {
dma-coherent;
interrupts = <42>;
reg = <0x00 0xa003a00 0x00 0x200>;
compatible = "virtio,mmio";
};
这个设备树节点描述了一个 virtio-mmio 设备,并使用42号中断,其起始物理地址为 0xa003a00,大小为 0x200,compatible = "virtio,mmio",虚拟机会匹配此字符串来加载 virtio-mmio 驱动程序。
2.2、寄存器布局(规范定义)
virtio-mmio 设备占据一段 MMIO 空间,其中定义了一系列寄存器,偏移量按 4 字节对齐。主要寄存器包括(基于 Virtio 1.2 规范):
| 偏移 | 名称 | 描述 |
|---|---|---|
| 0x000 | MagicValue | 固定值 0x74726976 |
| 0x004 | Version | 版本号 |
| 0x008 | DeviceID | 设备类型 ID(如 1=网络,2=块设备) |
| 0x00c | VendorID | 厂商 ID |
| 0x010 | HostFeatures | 设备支持的特性位 |
| 0x014 | HostFeaturesSel | 特性高 32 位选择 |
| 0x018 | GuestFeatures | 驱动确认的特性 |
| 0x01c | GuestFeaturesSel | 同上 |
| 0x020 | QueueSel | 选择要配置的队列 |
| 0x024 | QueueNumMax | 所选队列的最大大小 |
| 0x028 | QueueNum | 驱动设置的队列大小 |
| 0x02c | QueueAlign | 队列对齐要求(通常 4K) |
| 0x030 | QueuePFN | 队列对应的物理页帧号 |
| 0x038 | QueueReady | 队列准备就绪标志 |
| 0x03c | QueueNotify | 写入任意值触发队列通知(kick) |
| 0x040 | InterruptStatus | 中断状态位 |
| 0x044 | InterruptACK | 中断确认 |
| 0x048 | Status | 设备状态 |
| 0x050 | QueueDescLow | 队列描述符区的 64 位地址(低 32 位) |
| 0x054 | QueueDescHigh | 队列描述符区的 64 位地址(高 32 位) |
| 0x058 | QueueAvailLow | 可用环地址 |
| 0x05c | QueueAvailHigh | |
| 0x060 | QueueUsedLow | 已用环地址 |
| 0x064 | QueueUsedHigh | |
| 0x070 | ConfigGeneration | 配置空间变更时递增 |
| 0x074 | Config | 设备配置空间 |
上述这些寄存器共同定义了virtio-mmio 设备的控制接口,驱动通过读写这些寄存器来探测设备、配置队列、处理中断等。
2.3、设备侧实现
2.3.1、实现mmio地址空间的Trap
为了模拟虚拟机的虚拟设备,hypervisor就需要拦截虚拟机的访问,那实现原理就是将该设备的地址空间取消 Stage 2 映射,当虚拟机读写这些地址时,由hypervisor接管并执行相应操作,具体如下:
objectivec
#define VIRTIO_DEV_BASE 0xa003a00
#define VIRTIO_DEV_SIZE 0x200
void create_dev_mmio_trap(struct vm *vm)
{
create_mmio_trap(vm, VIRTIO_DEV_BASE, VIRTIO_DEV_SIZE, virtio_dev_read, virtio_dev_write);
}
void create_mmio_trap(struct vm *vm, u64 ipa, u64 size,
int (*vmmio_read)(struct vcpu *, u64, u64 *, struct vmmio_access *),
int (*vmmio_write)(struct vcpu *, u64, u64, struct vmmio_access *))
{
u64 *vttbr = vm->vttbr;
if(page_walk(vttbr, ipa, 0)) {
/* if ipa has been mapped, unmap it */
page_unmap(vttbr, ipa, size);
}
vmmio_handler_register(vm, ipa, size, vmmio_read, vmmio_write);
flush_tlb();
return;
}
其主要原理就是在创建虚拟机 Stage 2 映射的时候,对设备树描述的 0xa003a00 + 0x200这个地址范围不进行映射或者取消映射,这样当虚拟机访问这段地址空间的时候就会发生 Stage 2 Fault,hypervisor就会进行相应的模拟操作。
2.3.2、实现相应寄存器的读写回调
objectivec
vmmio_handler_register(vm, ipa, size, vmmio_read, vmmio_write);
vmmio_handler_register函数是用来注册虚拟机发生mmio trap时的读写回调的,也就是说当虚拟机读 0xa003a00 + 0x200这个区域内的地址时,vmmio_read 回调就会被调用,反之vmmio_write回调就会被调用。读写回调的简单实现如下:
objectivec
static int virtio_dev_read(struct vcpu *vcpu, u64 offset, u64 *val, struct vmmio_access *vmmio)
{
switch (offset) {
case VIRTIO_MMIO_MAGIC_VALUE:
*val = ('v' | 'i' << 8 | 'r' << 16 | 't' << 24);
break;
case VIRTIO_MMIO_VERSION:
*val = 0x1;
break;
case VIRTIO_MMIO_DEVICE_ID:
*val = 46; //让驱动知道该设备是什么类型的设备(1:virtio net 2:virtio block)
break;
case VIRTIO_MMIO_VENDOR_ID:
*val = 0x554D4551;
break;
case VIRTIO_MMIO_DEVICE_FEATURES:
if (virtio_state.device_features_sel == 0) {
*val = 1;
} else {
*val = 0;
}
break;
case VIRTIO_MMIO_STATUS:
*val = virtio_state.status;
break;
default:
abort("Unhandled MMIO read offset %d, returning 0\n", offset);
*val = -1;
break;
}
return 0;
}
static int virtio_dev_write(struct vcpu *vcpu, u64 offset, u64 val, struct vmmio_access *vmmio)
{
switch (offset) {
case VIRTIO_MMIO_DEVICE_FEATURES_SEL:
virtio_state.device_features_sel = val;
break;
case VIRTIO_MMIO_DRIVER_FEATURES:
if (virtio_state.driver_features_sel < 2)
virtio_state.driver_features[virtio_state.driver_features_sel] = val;
break;
case VIRTIO_MMIO_DRIVER_FEATURES_SEL:
virtio_state.driver_features_sel = val;
break;
case VIRTIO_MMIO_GUEST_PAGE_SIZE:
virtio_state.guest_page_size = val;
LOG_INFO("Guest page size set to %d (legacy mode)\n", val);
break;
case VIRTIO_MMIO_QUEUE_NUM:
virtio_state.queue_num = val;
break;
case VIRTIO_MMIO_STATUS:
virtio_state.status = val;
if (val & 0x8) {
LOG_INFO("Driver OK - device ready\n");
}
break;
default:
abort("Unhandled MMIO write offset %d, returning 0\n", offset);
break;
}
return 0;
}
qemu中virtio-mmio设备读写回调的实现如下:
objectivec
// hw/virtio/virtio-mmio.c
static uint64_t virtio_mmio_read(void *opaque, hwaddr offset, unsigned size)
{
VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
trace_virtio_mmio_read(offset);
if (!vdev) {
/* If no backend is present, we treat most registers as
* read-as-zero, except for the magic number, version and
* vendor ID. This is not strictly sanctioned by the virtio
* spec, but it allows us to provide transports with no backend
* plugged in which don't confuse Linux's virtio code: the
* probe won't complain about the bad magic number, but the
* device ID of zero means no backend will claim it.
*/
switch (offset) {
case VIRTIO_MMIO_MAGIC_VALUE:
return VIRT_MAGIC;
case VIRTIO_MMIO_VERSION:
if (proxy->legacy) {
return VIRT_VERSION_LEGACY;
} else {
return VIRT_VERSION;
}
case VIRTIO_MMIO_VENDOR_ID:
return VIRT_VENDOR;
default:
return 0;
}
}
if (offset >= VIRTIO_MMIO_CONFIG) {
offset -= VIRTIO_MMIO_CONFIG;
if (proxy->legacy) {
switch (size) {
case 1:
return virtio_config_readb(vdev, offset);
case 2:
return virtio_config_readw(vdev, offset);
case 4:
return virtio_config_readl(vdev, offset);
default:
abort();
}
} else {
switch (size) {
case 1:
return virtio_config_modern_readb(vdev, offset);
case 2:
return virtio_config_modern_readw(vdev, offset);
case 4:
return virtio_config_modern_readl(vdev, offset);
default:
abort();
}
}
}
if (size != 4) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: wrong size access to register!\n",
__func__);
return 0;
}
switch (offset) {
case VIRTIO_MMIO_MAGIC_VALUE:
return VIRT_MAGIC;
case VIRTIO_MMIO_VERSION:
if (proxy->legacy) {
return VIRT_VERSION_LEGACY;
} else {
return VIRT_VERSION;
}
case VIRTIO_MMIO_DEVICE_ID:
return vdev->device_id;
case VIRTIO_MMIO_VENDOR_ID:
return VIRT_VENDOR;
case VIRTIO_MMIO_DEVICE_FEATURES:
if (proxy->legacy) {
if (proxy->host_features_sel) {
return 0;
} else {
return vdev->host_features;
}
} else {
VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
return (vdev->host_features & ~vdc->legacy_features)
>> (32 * proxy->host_features_sel);
}
case VIRTIO_MMIO_QUEUE_NUM_MAX:
if (!virtio_queue_get_num(vdev, vdev->queue_sel)) {
return 0;
}
return VIRTQUEUE_MAX_SIZE;
case VIRTIO_MMIO_QUEUE_PFN:
if (!proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: read from legacy register (0x%"
HWADDR_PRIx ") in non-legacy mode\n",
__func__, offset);
return 0;
}
return virtio_queue_get_addr(vdev, vdev->queue_sel)
>> proxy->guest_page_shift;
case VIRTIO_MMIO_QUEUE_READY:
if (proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: read from non-legacy register (0x%"
HWADDR_PRIx ") in legacy mode\n",
__func__, offset);
return 0;
}
return proxy->vqs[vdev->queue_sel].enabled;
case VIRTIO_MMIO_INTERRUPT_STATUS:
return qatomic_read(&vdev->isr);
case VIRTIO_MMIO_STATUS:
return vdev->status;
case VIRTIO_MMIO_CONFIG_GENERATION:
if (proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: read from non-legacy register (0x%"
HWADDR_PRIx ") in legacy mode\n",
__func__, offset);
return 0;
}
return vdev->generation;
case VIRTIO_MMIO_SHM_LEN_LOW:
case VIRTIO_MMIO_SHM_LEN_HIGH:
/*
* VIRTIO_MMIO_SHM_SEL is unimplemented
* according to the linux driver, if region length is -1
* the shared memory doesn't exist
*/
return -1;
case VIRTIO_MMIO_DEVICE_FEATURES_SEL:
case VIRTIO_MMIO_DRIVER_FEATURES:
case VIRTIO_MMIO_DRIVER_FEATURES_SEL:
case VIRTIO_MMIO_GUEST_PAGE_SIZE:
case VIRTIO_MMIO_QUEUE_SEL:
case VIRTIO_MMIO_QUEUE_NUM:
case VIRTIO_MMIO_QUEUE_ALIGN:
case VIRTIO_MMIO_QUEUE_NOTIFY:
case VIRTIO_MMIO_INTERRUPT_ACK:
case VIRTIO_MMIO_QUEUE_DESC_LOW:
case VIRTIO_MMIO_QUEUE_DESC_HIGH:
case VIRTIO_MMIO_QUEUE_AVAIL_LOW:
case VIRTIO_MMIO_QUEUE_AVAIL_HIGH:
case VIRTIO_MMIO_QUEUE_USED_LOW:
case VIRTIO_MMIO_QUEUE_USED_HIGH:
qemu_log_mask(LOG_GUEST_ERROR,
"%s: read of write-only register (0x%" HWADDR_PRIx ")\n",
__func__, offset);
return 0;
default:
qemu_log_mask(LOG_GUEST_ERROR,
"%s: bad register offset (0x%" HWADDR_PRIx ")\n",
__func__, offset);
return 0;
}
return 0;
}
static void virtio_mmio_write(void *opaque, hwaddr offset, uint64_t value,
unsigned size)
{
VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
trace_virtio_mmio_write_offset(offset, value);
if (!vdev) {
/* If no backend is present, we just make all registers
* write-ignored. This allows us to provide transports with
* no backend plugged in.
*/
return;
}
if (offset >= VIRTIO_MMIO_CONFIG) {
offset -= VIRTIO_MMIO_CONFIG;
if (proxy->legacy) {
switch (size) {
case 1:
virtio_config_writeb(vdev, offset, value);
break;
case 2:
virtio_config_writew(vdev, offset, value);
break;
case 4:
virtio_config_writel(vdev, offset, value);
break;
default:
abort();
}
return;
} else {
switch (size) {
case 1:
virtio_config_modern_writeb(vdev, offset, value);
break;
case 2:
virtio_config_modern_writew(vdev, offset, value);
break;
case 4:
virtio_config_modern_writel(vdev, offset, value);
break;
default:
abort();
}
return;
}
}
if (size != 4) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: wrong size access to register!\n",
__func__);
return;
}
switch (offset) {
case VIRTIO_MMIO_DEVICE_FEATURES_SEL:
if (value) {
proxy->host_features_sel = 1;
} else {
proxy->host_features_sel = 0;
}
break;
case VIRTIO_MMIO_DRIVER_FEATURES:
if (proxy->legacy) {
if (proxy->guest_features_sel) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: attempt to write guest features with "
"guest_features_sel > 0 in legacy mode\n",
__func__);
} else {
virtio_set_features(vdev, value);
}
} else {
proxy->guest_features[proxy->guest_features_sel] = value;
}
break;
case VIRTIO_MMIO_DRIVER_FEATURES_SEL:
if (value) {
proxy->guest_features_sel = 1;
} else {
proxy->guest_features_sel = 0;
}
break;
case VIRTIO_MMIO_GUEST_PAGE_SIZE:
if (!proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to legacy register (0x%"
HWADDR_PRIx ") in non-legacy mode\n",
__func__, offset);
return;
}
proxy->guest_page_shift = ctz32(value);
if (proxy->guest_page_shift > 31) {
proxy->guest_page_shift = 0;
}
trace_virtio_mmio_guest_page(value, proxy->guest_page_shift);
break;
case VIRTIO_MMIO_QUEUE_SEL:
if (value < VIRTIO_QUEUE_MAX) {
vdev->queue_sel = value;
}
break;
case VIRTIO_MMIO_QUEUE_NUM:
trace_virtio_mmio_queue_write(value, VIRTQUEUE_MAX_SIZE);
virtio_queue_set_num(vdev, vdev->queue_sel, value);
if (proxy->legacy) {
virtio_queue_update_rings(vdev, vdev->queue_sel);
} else {
proxy->vqs[vdev->queue_sel].num = value;
}
break;
case VIRTIO_MMIO_QUEUE_ALIGN:
if (!proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to legacy register (0x%"
HWADDR_PRIx ") in non-legacy mode\n",
__func__, offset);
return;
}
virtio_queue_set_align(vdev, vdev->queue_sel, value);
break;
case VIRTIO_MMIO_QUEUE_PFN:
if (!proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to legacy register (0x%"
HWADDR_PRIx ") in non-legacy mode\n",
__func__, offset);
return;
}
if (value == 0) {
virtio_mmio_soft_reset(proxy);
} else {
virtio_queue_set_addr(vdev, vdev->queue_sel,
value << proxy->guest_page_shift);
}
break;
case VIRTIO_MMIO_QUEUE_READY:
if (proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to non-legacy register (0x%"
HWADDR_PRIx ") in legacy mode\n",
__func__, offset);
return;
}
if (value) {
virtio_queue_set_num(vdev, vdev->queue_sel,
proxy->vqs[vdev->queue_sel].num);
virtio_queue_set_rings(vdev, vdev->queue_sel,
((uint64_t)proxy->vqs[vdev->queue_sel].desc[1]) << 32 |
proxy->vqs[vdev->queue_sel].desc[0],
((uint64_t)proxy->vqs[vdev->queue_sel].avail[1]) << 32 |
proxy->vqs[vdev->queue_sel].avail[0],
((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
proxy->vqs[vdev->queue_sel].used[0]);
proxy->vqs[vdev->queue_sel].enabled = 1;
} else {
proxy->vqs[vdev->queue_sel].enabled = 0;
}
break;
case VIRTIO_MMIO_QUEUE_NOTIFY:
if (value < VIRTIO_QUEUE_MAX) {
virtio_queue_notify(vdev, value);
}
break;
case VIRTIO_MMIO_INTERRUPT_ACK:
qatomic_and(&vdev->isr, ~value);
virtio_update_irq(vdev);
break;
case VIRTIO_MMIO_STATUS:
if (!(value & VIRTIO_CONFIG_S_DRIVER_OK)) {
virtio_mmio_stop_ioeventfd(proxy);
}
if (!proxy->legacy && (value & VIRTIO_CONFIG_S_FEATURES_OK)) {
virtio_set_features(vdev,
((uint64_t)proxy->guest_features[1]) << 32 |
proxy->guest_features[0]);
}
virtio_set_status(vdev, value & 0xff);
if (value & VIRTIO_CONFIG_S_DRIVER_OK) {
virtio_mmio_start_ioeventfd(proxy);
}
if (vdev->status == 0) {
virtio_mmio_soft_reset(proxy);
}
break;
case VIRTIO_MMIO_QUEUE_DESC_LOW:
if (proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to non-legacy register (0x%"
HWADDR_PRIx ") in legacy mode\n",
__func__, offset);
return;
}
proxy->vqs[vdev->queue_sel].desc[0] = value;
break;
case VIRTIO_MMIO_QUEUE_DESC_HIGH:
if (proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to non-legacy register (0x%"
HWADDR_PRIx ") in legacy mode\n",
__func__, offset);
return;
}
proxy->vqs[vdev->queue_sel].desc[1] = value;
break;
case VIRTIO_MMIO_QUEUE_AVAIL_LOW:
if (proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to non-legacy register (0x%"
HWADDR_PRIx ") in legacy mode\n",
__func__, offset);
return;
}
proxy->vqs[vdev->queue_sel].avail[0] = value;
break;
case VIRTIO_MMIO_QUEUE_AVAIL_HIGH:
if (proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to non-legacy register (0x%"
HWADDR_PRIx ") in legacy mode\n",
__func__, offset);
return;
}
proxy->vqs[vdev->queue_sel].avail[1] = value;
break;
case VIRTIO_MMIO_QUEUE_USED_LOW:
if (proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to non-legacy register (0x%"
HWADDR_PRIx ") in legacy mode\n",
__func__, offset);
return;
}
proxy->vqs[vdev->queue_sel].used[0] = value;
break;
case VIRTIO_MMIO_QUEUE_USED_HIGH:
if (proxy->legacy) {
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to non-legacy register (0x%"
HWADDR_PRIx ") in legacy mode\n",
__func__, offset);
return;
}
proxy->vqs[vdev->queue_sel].used[1] = value;
break;
case VIRTIO_MMIO_MAGIC_VALUE:
case VIRTIO_MMIO_VERSION:
case VIRTIO_MMIO_DEVICE_ID:
case VIRTIO_MMIO_VENDOR_ID:
case VIRTIO_MMIO_DEVICE_FEATURES:
case VIRTIO_MMIO_QUEUE_NUM_MAX:
case VIRTIO_MMIO_INTERRUPT_STATUS:
case VIRTIO_MMIO_CONFIG_GENERATION:
qemu_log_mask(LOG_GUEST_ERROR,
"%s: write to read-only register (0x%" HWADDR_PRIx ")\n",
__func__, offset);
break;
default:
qemu_log_mask(LOG_GUEST_ERROR,
"%s: bad register offset (0x%" HWADDR_PRIx ")\n",
__func__, offset);
}
}
2.4、virtio_mmio驱动侧实现(Linux 内核)
2.4.1、驱动的probe
如下代码段:
objectivec
// drivers/virtio/virtio_mmio.c
static const struct of_device_id virtio_mmio_match[] = {
{ .compatible = "virtio,mmio", },
{},
};
MODULE_DEVICE_TABLE(of, virtio_mmio_match);
static struct platform_driver virtio_mmio_driver = {
.probe = virtio_mmio_probe,
.remove = virtio_mmio_remove,
.driver = {
.name = "virtio-mmio",
.of_match_table = virtio_mmio_match,
.acpi_match_table = ACPI_PTR(virtio_mmio_acpi_match),
#ifdef CONFIG_PM_SLEEP
.pm = &virtio_mmio_pm_ops,
#endif
},
};
static int __init virtio_mmio_init(void)
{
return platform_driver_register(&virtio_mmio_driver);
}
static void __exit virtio_mmio_exit(void)
{
platform_driver_unregister(&virtio_mmio_driver);
vm_unregister_cmdline_devices();
}
module_init(virtio_mmio_init);
module_exit(virtio_mmio_exit);
可以看到virtio_mmio_match中的compatible = "virtio,mmio",所以和前面设备树中的compatible 相匹配,在系统启动的时候该virtio-mmio 设备就会挂在virtio-bus下面,等待相应驱动的probe。
2.4.2、virtio-mmio probe函数
objectivec
static int virtio_mmio_probe(struct platform_device *pdev)
{
struct virtio_mmio_device *vm_dev;
struct resource *mem;
unsigned long magic;
int rc;
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!mem)
return -EINVAL;
if (!devm_request_mem_region(&pdev->dev, mem->start,
resource_size(mem), pdev->name))
return -EBUSY;
vm_dev = devm_kzalloc(&pdev->dev, sizeof(*vm_dev), GFP_KERNEL);
if (!vm_dev)
return -ENOMEM;
vm_dev->vdev.dev.parent = &pdev->dev;
vm_dev->vdev.dev.release = virtio_mmio_release_dev;
vm_dev->vdev.config = &virtio_mmio_config_ops;
vm_dev->pdev = pdev;
INIT_LIST_HEAD(&vm_dev->virtqueues);
spin_lock_init(&vm_dev->lock);
vm_dev->base = devm_ioremap(&pdev->dev, mem->start, resource_size(mem));
if (vm_dev->base == NULL)
return -EFAULT;
/* Check magic value */
magic = readl(vm_dev->base + VIRTIO_MMIO_MAGIC_VALUE);
if (magic != ('v' | 'i' << 8 | 'r' << 16 | 't' << 24)) {
dev_warn(&pdev->dev, "Wrong magic value 0x%08lx!\n", magic);
return -ENODEV;
}
/* Check device version */
vm_dev->version = readl(vm_dev->base + VIRTIO_MMIO_VERSION);
if (vm_dev->version < 1 || vm_dev->version > 2) {
dev_err(&pdev->dev, "Version %ld not supported!\n",
vm_dev->version);
return -ENXIO;
}
vm_dev->vdev.id.device = readl(vm_dev->base + VIRTIO_MMIO_DEVICE_ID);
if (vm_dev->vdev.id.device == 0) {
/*
* virtio-mmio device with an ID 0 is a (dummy) placeholder
* with no function. End probing now with no error reported.
*/
return -ENODEV;
}
vm_dev->vdev.id.vendor = readl(vm_dev->base + VIRTIO_MMIO_VENDOR_ID);
if (vm_dev->version == 1) {
writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_GUEST_PAGE_SIZE);
rc = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
/*
* In the legacy case, ensure our coherently-allocated virtio
* ring will be at an address expressable as a 32-bit PFN.
*/
if (!rc)
dma_set_coherent_mask(&pdev->dev,
DMA_BIT_MASK(32 + PAGE_SHIFT));
} else {
rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
}
if (rc)
rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (rc)
dev_warn(&pdev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n");
platform_set_drvdata(pdev, vm_dev);
rc = register_virtio_device(&vm_dev->vdev);
if (rc)
put_device(&vm_dev->vdev.dev);
return rc;
}
函数通过 devm_ioremap 将物理地址映射到内核虚拟地址空间,获得寄存器基址 vm_dev->base,然后通过相应的offset来访问设备的寄存器,这些访问都会trap到hypervisor中,并会调用到注册的读写回调。
最后调用 register_virtio_device 将初始化好的 virtio_device 注册到 virtio-bus上面。
2.5、基于virtio-mmio的virtio-driver
经过前面的初始化流程,已经为虚拟机初始化了一个虚拟的外设供虚拟机使用,那虚拟机要怎么使用这个外设呢?即有了virtio-device,怎么实现virtio-driver呢?以virtio_blk为例,如下代码段:
objectivec
// /usr/include/linux/virtio_ids.h
#define VIRTIO_ID_NET 1 /* virtio net */
#define VIRTIO_ID_BLOCK 2 /* virtio block */
#define VIRTIO_ID_CONSOLE 3 /* virtio console */
// drivers/block/virtio_blk.c
static int virtblk_probe(struct virtio_device *vdev)
{
...
}
static void virtblk_remove(struct virtio_device *vdev)
{
...
}
static const struct virtio_device_id id_table[] = {
{ VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID },
{ 0 },
};
static struct virtio_driver virtio_blk = {
.feature_table = features,
.feature_table_size = ARRAY_SIZE(features),
.feature_table_legacy = features_legacy,
.feature_table_size_legacy = ARRAY_SIZE(features_legacy),
.driver.name = KBUILD_MODNAME,
.driver.owner = THIS_MODULE,
.id_table = id_table,
.probe = virtblk_probe,
.remove = virtblk_remove,
.config_changed = virtblk_config_changed,
#ifdef CONFIG_PM_SLEEP
.freeze = virtblk_freeze,
.restore = virtblk_restore,
#endif
};
static int __init init(void)
{
int error;
virtblk_wq = alloc_workqueue("virtio-blk", 0, 0);
if (!virtblk_wq)
return -ENOMEM;
major = register_blkdev(0, "virtblk");
if (major < 0) {
error = major;
goto out_destroy_workqueue;
}
error = register_virtio_driver(&virtio_blk);
if (error)
goto out_unregister_blkdev;
return 0;
out_unregister_blkdev:
unregister_blkdev(major, "virtblk");
out_destroy_workqueue:
destroy_workqueue(virtblk_wq);
return error;
}
static void __exit fini(void)
{
unregister_virtio_driver(&virtio_blk);
unregister_blkdev(major, "virtblk");
destroy_workqueue(virtblk_wq);
}
module_init(init);
module_exit(fini);
virtio-device和virtio-driver(virtio_blk)的匹配主要是通过 VIRTIO_ID_BLOCK 宏进行匹配的,在初始化virtio-device时,通过读取VIRTIO_MMIO_DEVICE_ID来获得device_id,这个值如果等于 VIRTIO_ID_BLOCK ,则表示该设备为virtio-block设备,当virtio_blk驱动初始化时,通过device_id 的值进行驱动和设备的匹配,具体匹配流程请参考virtio-mmio 。
当virtio-device和virtio-driver进行匹配之后就可以在probe函数中初始化设备、配置I/O队列和相关参数,并将该设备注册为Linux虚拟机的块设备以供系统使用。
References
[1] https://mediawiki.hyhsystem.cn/images/a/a5/Virtio-v1.2-cs01.pdf