注:本文为 "Linux PCI Drivers" 相关文章合辑。
英文引文,机翻未校。
中文引文,略作重排。
如有内容异常,请看原文。
How To Write Linux PCI Drivers
翻译:
司延腾 Yanteng Si [email protected]
1. 如何写 Linux PCI 驱动
- 作者 :
- Martin Mares <[email protected]>
- Grant Grundler <[email protected]>
PCI 的世界是巨大的,而且充满了(大多数是不愉快的)惊喜。由于每个 CPU 架构实现了不同的芯片组,并且 PCI 设备有不同的要求(呃,"特性"),结果是 Linux 内核中的 PCI 支持并不像人们希望的那样简单。这篇短文试图向所有潜在的驱动程序作者介绍 Linux APIs 中的 PCI 设备驱动程序。
更完整的资源是 Jonathan Corbet、Alessandro Rubini 和 Greg Kroah-Hartman 的《Linux 设备驱动程序》第三版。LDD3 可以免费获得(在知识共享许可下),网址是:Linux Device Drivers, Third Edition。
然而,请记住,所有的文档都会受到"维护不及时"的影响。如果事情没有按照这里描述的那样进行,请参考源代码。
请将有关 Linux PCI API 的问题/评论/补丁发送到"Linux PCI"<[email protected]>邮件列表。
1.1. PCI 驱动的结构体
PCI 驱动通过 pci_register_driver()
在系统中"发现"PCI 设备。实际上,它是反过来的。当 PCI 通用代码发现一个新设备时,具有匹配"描述"的驱动程序将被通知。下面是这方面的细节。
pci_register_driver()
将大部分探测设备的工作留给了 PCI 层,并支持设备的在线插入/移除(从而在一个驱动中支持可热插拔的 PCI、CardBus 和 Express-Card)。pci_register_driver()
调用需要传入一个函数指针表,从而决定了驱动的高层结构体。
一旦驱动探测到一个 PCI 设备并取得了所有权,驱动通常需要执行以下初始化:
- 启用设备
- 请求 MMIO/IOP 资源
- 设置 DMA 掩码大小(对于流式和一致的 DMA)
- 分配和初始化共享控制数据(
pci_allocate_coherent()
) - 访问设备配置空间(如果需要)
- 注册 IRQ 处理程序(
request_irq()
) - 初始化非 PCI(即芯片的 LAN/SCSI/ 等部分)
- 启用 DMA/处理引擎
当使用完设备后,也许需要卸载模块,驱动需要采取以下步骤:
- 禁用设备产生的 IRQ
- 释放 IRQ(
free_irq()
) - 停止所有 DMA 活动
- 释放 DMA 缓冲区(包括一致性和数据流式)
- 从其他子系统(例如 scsi 或 netdev)上取消注册
- 释放 MMIO/IOP 资源
- 禁用设备
这些主题中的大部分都在下面的章节中有所涉及。其余的内容请参考 LDD3 或 <linux/pci.h>
。
如果没有配置 PCI 子系统(没有设置 CONFIG_PCI
),下面描述的大多数 PCI 函数被定义为内联函数,要么完全为空,要么只是返回一个适当的错误代码,以避免在驱动程序中出现大量的 ifdef
。
1.2. 调用 pci_register_driver()
PCI 设备驱动程序在初始化过程中调用 pci_register_driver()
,并提供一个指向描述驱动程序的结构体的指针(struct pci_driver
):
c
struct pci_driver {
const char *name;
const struct pci_device_id *id_table;
int (*probe)(struct pci_dev *dev, const struct pci_device_id *id);
void (*remove)(struct pci_dev *dev);
int (*suspend)(struct pci_dev *dev, pm_message_t state);
int (*resume)(struct pci_dev *dev);
void (*shutdown)(struct pci_dev *dev);
int (*sriov_configure)(struct pci_dev *dev, int num_vfs);
int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count);
u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf);
const struct pci_error_handlers *err_handler;
const struct attribute_group **groups;
const struct attribute_group **dev_groups;
struct device_driver driver;
struct pci_dynids dynids;
bool driver_managed_dma;
};
成员
name
:驱动程序名称。id_table
:指向设备 ID 表的指针,驱动程序感兴趣的设备 ID 表。大多数驱动程序应该导出此表,使用MODULE_DEVICE_TABLE(pci,...)
。probe
:当发现匹配 ID 表且尚未被其他驱动程序"拥有"的 PCI 设备时,此探测函数会被调用(在执行pci_register_driver()
时用于已存在的设备,或者在插入新设备时)。此函数会传递一个struct pci_dev *
,用于每个匹配设备表条目的设备。如果驱动程序选择"拥有"该设备,则返回零;否则返回错误代码(负数)。remove
:当处理该驱动程序的设备被移除时(无论是注销驱动程序还是手动从热插拔插槽中拔出设备),此函数会被调用。suspend
:将设备置于低功耗状态。resume
:唤醒处于低功耗状态的设备。shutdown
:在重启通知列表(kernel/sys.c
)中注册。用于停止任何空闲的 DMA 操作。sriov_configure
:可选的驱动程序回调,允许通过 sysfs 文件"sriov_numvfs"配置虚拟功能(VF)的数量。sriov_set_msix_vec_count
:PF 驱动程序回调,用于更改 VF 的 MSI-X 向量数量。通过 sysfs 文件"sriov_vf_msix_count"触发。这将更改 VF 消息控制寄存器中的 MSI-X 表大小。sriov_get_vf_total_msix
:PF 驱动程序回调,用于获取可分配给 VF 的 MSI-X 向量总数。err_handler
:参见 PCI 错误恢复。groups
:Sysfs 属性组。dev_groups
:附加到设备的属性,将在设备绑定到驱动程序后创建。driver
:驱动程序模型结构体。dynids
:动态添加的设备 ID 列表。driver_managed_dma
:设备驱动程序不使用内核 DMA API 进行 DMA。对于大多数设备驱动程序,无需关心此标志,只要所有 DMA 都通过内核 DMA API 处理即可。对于一些特殊驱动程序(例如 VFIO 驱动程序),它们知道如何管理自己的 DMA,因此设置此标志,以便 IOMMU 层允许它们设置和管理自己的 I/O 地址空间。
ID 表是一个由 struct pci_device_id
结构体成员组成的数组,以一个全零的成员结束。一般来说,带有静态常数的定义是首选。
c
struct pci_device_id {
__u32 vendor, device;
__u32 subvendor, subdevice;
__u32 class, class_mask;
kernel_ulong_t driver_data;
__u32 override_only;
};
成员
vendor
:要匹配的供应商 ID(或PCI_ANY_ID
)。device
:要匹配的设备 ID(或PCI_ANY_ID
)。subvendor
:要匹配的子系统供应商 ID(或PCI_ANY_ID
)。subdevice
:要匹配的子系统设备 ID(或PCI_ANY_ID
)。class
:要匹配的设备类别、子类别和"接口"。class_mask
:限制类别字段中要比较的子字段。driver_data
:私有驱动程序数据。大多数驱动程序不需要使用driver_data
字段。最佳实践是将driver_data
用作静态列表中等效设备类型的索引,而不是用作指针。override_only
:仅当dev->driver_override
是此驱动程序时才匹配。
大多数驱动程序只需要 PCI_DEVICE()
或 PCI_DEVICE_CLASS()
来设置一个 pci_device_id
表。
新的 PCI ID
可以在运行时被添加到设备驱动的 pci_ids
表中,如下所示:
bash
echo "vendor device subvendor subdevice class class_mask driver_data" > \
/sys/bus/pci/drivers/{driver}/new_id
所有字段都以十六进制值传递(没有前置 0x
)。供应商和设备字段是强制性的,其他字段是可选的。用户只需要传递必要的可选字段:
subvendor
和subdevice
字段默认为PCI_ANY_ID
(FFFFFFFF
)。class
和classmask
字段默认为0
。driver_data
默认为0UL
。override_only
字段默认为0
。
请注意,driver_data
必须与驱动程序中定义的任何一个 pci_device_id
条目所使用的值相匹配。如果所有的 pci_device_id
成员都有一个非零的 driver_data
值,这使得 driver_data
字段是强制性的。
一旦添加,驱动程序探测程序将被调用,以探测其(新更新的)pci_ids
列表中列出的任何无人认领的 PCI 设备。
当驱动退出时,它只是调用 pci_unregister_driver()
,PCI 层会自动调用驱动处理的所有设备的移除钩子。
1.2.1. 驱动程序功能/数据的"属性"
请在适当的地方标记初始化和清理函数(相应的宏在 <linux/init.h>
中定义):
属性 | 描述 |
---|---|
__init |
初始化代码。在驱动程序初始化后被抛弃。 |
__exit |
退出代码。对于非模块化的驱动程序来说是忽略的。 |
关于何时/何地使用上述属性的提示:
module_init()
和module_exit()
函数(以及所有仅由这些函数调用的初始化函数)应该被标记为__init
/__exit
。- 不要标记
struct pci_driver
结构体。 - 如果你不确定应该使用哪种标记,请不要标记一个函数。不标记函数比标记错误的函数更好。
1.3. 如何手动搜索 PCI 设备
PCI 驱动最好有一个非常好的理由不使用 pci_register_driver()
接口来搜索 PCI 设备。PCI 设备被多个驱动程序控制的主要原因是一个 PCI 设备实现了几个不同的 HW 服务。例如,组合的串行/并行端口/软盘控制器。
可以使用以下结构体进行手动搜索:
通过供应商和设备 ID 进行搜索:
c
struct pci_dev *dev = NULL;
while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
configure_device(dev);
按类别 ID 搜索(以类似的方式迭代):
c
pci_get_class(CLASS_ID, dev)
通过供应商/设备和子系统供应商/设备 ID 进行搜索:
c
pci_get_subsys(VENDOR_ID, DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
你可以使用常数 PCI_ANY_ID
作为 VENDOR_ID
或 DEVICE_ID
的通配符替代。例如,这允许搜索来自一个特定供应商的任何设备。
这些函数是热拔插安全的。它们会增加它们所返回的 pci_dev
的参考计数。你最终必须通过调用 pci_dev_put()
来减少这些设备上的参考计数(可能在模块卸载时)。
1.4. 设备初始化步骤
正如介绍中所指出的,大多数 PCI 驱动需要以下步骤进行设备初始化:
- 启用设备
- 请求 MMIO/IOP 资源
- 设置 DMA 掩码大小(对于流式和一致的 DMA)
- 分配和初始化共享控制数据(
pci_allocate_coherent()
) - 访问设备配置空间(如果需要)
- 注册 IRQ 处理程序(
request_irq()
) - 初始化非 PCI(即芯片的 LAN/SCSI/ 等部分)
- 启用 DMA/处理引擎
驱动程序可以在任何时候访问 PCI 配置空间寄存器。(嗯,几乎如此。当运行 BIST 时,配置空间可以消失......但这只会导致 PCI 总线主控中止,读取配置将返回垃圾值。)
1.4.1. 启用 PCI 设备
在接触任何设备寄存器之前,驱动程序需要通过调用 pci_enable_device()
启用 PCI 设备。这将:
- 唤醒处于暂停状态的设备。
- 分配设备的 I/O 和内存区域(如果 BIOS 没有这样做)。
- 分配一个 IRQ(如果 BIOS 没有)。
注意 :pci_enable_device()
可能失败,检查返回值。
警告 :OS BUG:在启用这些资源之前,我们没有检查资源分配情况。如果我们在调用 pci_request_resources()
之前调用 pci_enable_device()
,这个顺序会更合理。目前,当两个设备被分配了相同的范围时,设备驱动无法检测到这个错误。这不是一个常见的问题,不太可能很快得到修复。
这个问题之前已经讨论过了,但从 2.6.19 开始没有改变:
https://lore.kernel.org/r/[email protected]/
pci_set_master()
将通过设置 PCI_COMMAND 寄存器中的总线主控位来启用 DMA。pci_clear_master()
将通过清除总线主控位来禁用 DMA,它还修复了延迟计时器的值,如果它被 BIOS 设置成假的。
如果 PCI 设备可以使用 PCI Memory-Write-Invalidate
事务,请调用 pci_set_mwi()
。这将启用 Mem-Wr-Inval
的 PCI_COMMAND
位,也确保缓存行大小寄存器被正确设置。检查 pci_set_mwi()
的返回值,因为不是所有的架构或芯片组都支持 Memory-Write-Invalidate
。另外,如果 Mem-Wr-Inval
是好的,但不是必须的,可以调用 pci_try_set_mwi()
,让系统尽最大努力来启用 Mem-Wr-Inval
。
1.4.2. 请求 MMIO/IOP 资源
内存(MMIO)和 I/O 端口地址不应该直接从 PCI 设备配置空间中读取。使用 pci_dev
结构体中的值,因为 PCI"总线地址"可能已经被 arch/chip-set 特定的内核支持重新映射为"主机物理"地址。
参见 io_mapping 函数,了解如何访问设备寄存器或设备内存。
设备驱动需要调用 pci_request_region()
来确认没有其他设备已经在使用相同的地址资源。反之,驱动应该在调用 pci_disable_device()
之后调用 pci_release_region()
。这个想法是为了防止两个设备在同一地址范围内发生冲突。
提示 :见上面的操作系统 BUG 注释。目前(2.6.19),驱动程序只能在调用 pci_enable_device()
后确定 MMIO 和 IO 端口资源的可用性。
pci_request_region()
的通用风格是 request_mem_region()
(用于 MMIO 范围)和 request_region()
(用于 IO 端口范围)。对于那些不被"正常"PCI BAR 描述的地址资源,使用这些方法。
也请看下面的 pci_request_selected_regions()
。
1.4.3. 设置 DMA 掩码大小
注意:如果下面有什么不明白的地方,请参考使用通用设备的动态 DMA 映射。本节只是提醒大家,驱动程序需要说明设备的 DMA 功能,并不是 DMA 接口的权威来源。
虽然所有的驱动程序都应该明确指出 PCI 总线主控的 DMA 功能(如 32 位或 64 位),但对于流式数据来说,具有超过 32 位总线主站功能的设备需要驱动程序通过调用带有适当参数的 dma_set_mask()
来"注册"这种功能。一般来说,在系统 RAM 高于 4G 物理地址的情况下,这允许更有效的 DMA。
所有 PCI-X 和 PCIe 兼容设备的驱动程序必须调用 dma_set_mask()
,因为它们是 64 位 DMA 设备。
同样,如果设备可以通过调用 dma_set_coherent_mask()
直接寻址到 4G 物理地址以上的系统 RAM 中的"一致性内存",那么驱动程序也必须"注册"这种功能。同样,这包括所有 PCI-X 和 PCIe 兼容设备的驱动程序。许多 64 位"PCI"设备(在 PCI-X 之前)和一些 PCI-X 设备对有效载荷("流式")数据具有 64 位 DMA 功能,但对控制("一致性")数据则没有。
1.4.4. 设置共享控制数据
一旦 DMA 掩码设置完毕,驱动程序就可以分配"一致的"(又称共享的)内存。参见使用通用设备的动态 DMA 映射,了解 DMA API 的完整描述。本节只是提醒大家,需要在设备上启用 DMA 之前完成。
1.4.5. 初始化设备寄存器
一些驱动程序需要对特定的"功能"字段进行编程,或对其他"供应商专用"寄存器进行初始化或重置。例如,清除挂起的中断。
1.4.6. 注册 IRQ 处理函数
虽然调用 request_irq()
是这里描述的最后一步,但这往往只是初始化设备的另一个中间步骤。这一步通常可以推迟到设备被打开使用时进行。
所有 IRQ 线的中断处理程序都应该用 IRQF_SHARED
注册,并使用 devid
将 IRQ 映射到设备(记住,所有的 PCI IRQ 线都可以共享)。
request_irq()
将把一个中断处理程序和设备句柄与一个中断号联系起来。历史上,中断号码代表从 PCI 设备到中断控制器的 IRQ 线。在 MSI 和 MSI-X 中(更多内容见下文),中断号是 CPU 的一个"向量"。
request_irq()
也启用中断。在注册中断处理程序之前,请确保设备是静止的,并且没有任何中断等待。
MSI 和 MSI-X 是 PCI 功能。两者都是"消息信号中断",通过向本地 APIC 的 DMA 写入来向 CPU 发送中断。MSI 和 MSI-X 的根本区别在于如何分配多个"向量"。MSI 需要连续的向量块,而 MSI-X 可以分配几个单独的向量。
在调用 request_irq()
之前,可以通过调用 pci_alloc_irq_vectors()
的 PCI_IRQ_MSI
和/或 PCI_IRQ_MSIX
标志来启用 MSI 功能。这将导致 PCI 支持将 CPU 向量数据编程到 PCI 设备功能寄存器中。许多架构、芯片组或 BIOS 不支持 MSI 或 MSI-X,调用 pci_alloc_irq_vectors
时只使用 PCI_IRQ_MSI
和 PCI_IRQ_MSIX
标志会失败,所以尽量也要指定 PCI_IRQ_INTX
。
对 MSI/MSI-X 和传统 INTx 有不同中断处理程序的驱动程序应该在调用 pci_alloc_irq_vectors
后根据 pci_dev
结构体中的 msi_enabled
和 msix_enabled
标志选择正确的处理程序。
使用 MSI 有(至少)两个真正好的理由:
- 根据定义,MSI 是一个排他性的中断向量。这意味着中断处理程序不需要验证其设备是否引起了中断。
- MSI 避免了 DMA/IRQ 竞争条件。到主机内存的 DMA 被保证在 MSI 交付时对主机 CPU 是可见的。这对数据一致性和避免控制数据过期都很重要。这个保证允许驱动程序省略 MMIO 读取,以刷新 DMA 流。
参见 drivers/infiniband/hw/mthca/
或 drivers/net/tg3.c
了解 MSI/MSI-X 的使用实例。
1.5. PCI 设备关闭
当一个 PCI 设备驱动程序被卸载时,需要执行以下大部分步骤:
- 禁用设备产生的 IRQ
- 释放 IRQ(
free_irq()
) - 停止所有 DMA 活动
- 释放 DMA 缓冲区(包括流式和一致的)
- 从其他子系统(例如 scsi 或 netdev)上取消注册
- 禁用设备对 MMIO/IO 端口地址的响应
- 释放 MMIO/IO 端口资源
1.5.1. 停止设备上的 IRQ
如何做到这一点是针对芯片/设备的。如果不这样做,如果(也只有在)IRQ 与另一个设备共享,就会出现"尖叫中断"的可能性。
当共享的 IRQ 处理程序被"解钩"时,使用同一 IRQ 线的其余设备仍然需要启用该 IRQ。因此,如果"脱钩"的设备断言 IRQ 线,假设它是其余设备中的一个断言 IRQ 线,系统将作出反应。由于其他设备都不会处理这个 IRQ,系统将"挂起",直到它决定这个 IRQ 不会被处理并屏蔽这个 IRQ(100,000 次之后)。一旦共享的 IRQ 被屏蔽,其余设备将停止正常工作。这不是一个好事情。
这是使用 MSI 或 MSI-X 的另一个原因,如果它可用的话。MSI 和 MSI-X 被定义为独占中断,因此不容易受到"尖叫中断"问题的影响。
1.5.2. 释放 IRQ
一旦设备被静止(不再有 IRQ),就可以调用 free_irq()
。这个函数将在任何待处理的 IRQ 被处理后返回控制,从该 IRQ 上"解钩"驱动程序的 IRQ 处理程序,最后如果没有人使用该 IRQ,则释放它。
1.5.3. 停止所有 DMA 活动
在试图取消分配 DMA 控制数据之前,停止所有的 DMA 操作是非常重要的。如果不这样做,可能会导致内存损坏、挂起,在某些芯片组上还会导致硬崩溃。
在停止 IRQ 后停止 DMA 可以避免 IRQ 处理程序可能重新启动 DMA 引擎的竞争。
虽然这个步骤听起来很明显,也很琐碎,但过去有几个"成熟"的驱动程序没有做好这个步骤。
1.5.4. 释放 DMA 缓冲区
一旦 DMA 被停止,首先要清理流式 DMA。即取消数据缓冲区的映射,如果有的话,将缓冲区返回给"上游"所有者。
然后清理包含控制数据的"一致的"缓冲区。
关于取消映射接口的细节,请参见 Documentation/core-api/dma-api.rst。
1.5.5. 从其他子系统取消注册
大多数低级别的 PCI 设备驱动程序支持其他一些子系统,如 USB、ALSA、SCSI、NetDev、Infiniband 等。请确保你的驱动程序没有从其他子系统中丢失资源。如果发生这种情况,典型的症状是当子系统试图调用已经卸载的驱动程序时,会出现 Oops(恐慌)。
1.5.6. 禁用设备对 MMIO/IO 端口地址的响应
io_unmap()
MMIO 或 IO 端口资源,然后调用 pci_disable_device()
。这与 pci_enable_device()
对称相反。在调用 pci_disable_device()
后不要访问设备寄存器。
1.5.7. 释放 MMIO/IO 端口资源
调用 pci_release_region()
来标记 MMIO 或 IO 端口范围为可用。如果不这样做,通常会导致无法重新加载驱动程序。
1.6. 如何访问 PCI 配置空间
你可以使用 pci_(read|write)_config_(byte|word|dword)
来访问由 struct pci_dev *
表示的设备的配置空间。所有这些函数在成功时返回 0,或者返回一个错误代码(PCIBIOS_...
),这个错误代码可以通过 pcibios_strerror
翻译成文本字符串。大多数驱动程序希望对有效的 PCI 设备的访问不会失败。
如果你没有可用的 pci_dev
结构体,你可以调用 pci_bus_(read|write)_config_(byte|word|dword)
来访问一个给定的设备和该总线上的功能。
如果你访问配置头的标准部分的字段,请使用 <linux/pci.h>
中声明的位置和位的符号名称。
如果你需要访问扩展的 PCI 功能寄存器,只要为特定的功能调用 pci_find_capability()
,它就会为你找到相应的寄存器块。
1.7. 其他有趣的函数
函数 | 描述 |
---|---|
pci_get_domain_bus_and_slot() |
找到与给定的域、总线和槽以及编号相对应的 pci_dev 。如果找到该设备,它的引用计数就会增加。 |
pci_set_power_state() |
设置 PCI 电源管理状态(0=D0 ... 3=D3) |
pci_find_capability() |
在设备的功能列表中找到指定的功能 |
pci_resource_start() |
返回一个给定的 PCI 区域的总线起始地址 |
pci_resource_end() |
返回给定 PCI 区域的总线末端地址 |
pci_resource_len() |
返回一个 PCI 区域的字节长度 |
pci_set_drvdata() |
为一个 pci_dev 设置私有驱动数据指针 |
pci_get_drvdata() |
返回一个 pci_dev 的私有驱动数据指针 |
pci_set_mwi() |
启用设备内存写无效 |
pci_clear_mwi() |
关闭设备内存写无效 |
1.8. 杂项提示
当向用户显示 PCI 设备名称时(例如,当驱动程序想告诉用户它找到了什么卡时),请使用 pci_name(pci_dev)
。
始终通过对 pci_dev
结构体的指针来引用 PCI 设备。所有的 PCI 层函数都使用这个标识,它是唯一合理的标识。除了非常特殊的目的,不要使用总线/插槽/功能号------在有多个主总线的系统上,它们的语义可能相当复杂。
不要试图在你的驱动程序中开启快速寻址周期写入功能。总线上的所有设备都需要有这样的功能,所以这需要由平台和通用代码来处理,而不是由单个驱动程序来处理。
1.9. 供应商和设备标识
不要在 <linux/pci_ids.h>
中添加新的设备或供应商 ID,除非它们是在多个驱动程序中共享。如果有需要的话,你可以在你的驱动程序中添加私有定义,或者直接使用普通的十六进制常量。
设备 ID 是任意的十六进制数字(厂商控制),通常只在一个地方使用,即 pci_device_id
表。
请务必提交新的供应商/设备 ID 到 https://pci-ids.ucw.cz/。在 https://github.com/pciutils/pciids 中,有一个 pci.ids
文件的镜像。
1.10. 过时的函数
当你试图将一个旧的驱动程序移植到新的 PCI 接口时,你可能会遇到几个函数。它们不再存在于内核中,因为它们与热插拔或 PCI 域或具有健全的锁不兼容。
旧函数 | 替代函数 |
---|---|
pci_find_device() |
pci_get_device() |
pci_find_subsys() |
pci_get_subsys() |
pci_find_slot() |
pci_get_domain_bus_and_slot() |
pci_get_slot() |
pci_get_domain_bus_and_slot() |
另一种方法是传统的 PCI 设备驱动,即走 PCI 设备列表。这仍然是可能的,但不鼓励这样做。
1.11. MMIO 空间和"写通知"
将驱动程序从使用 I/O 端口空间转换为使用 MMIO 空间,通常需要一些额外的改变。具体来说,需要处理"写通知"。许多驱动程序(如 tg3
,acenic
,sym53c8xx_2
)已经做了这个。I/O 端口空间保证写事务在 CPU 继续之前到达 PCI 设备。对 MMIO 空间的写入允许 CPU 在事务到达 PCI 设备之前继续。HW weenies 称这为"写通知",因为在事务到达目的地之前,写的完成被"通知"给 CPU。
因此,对时间敏感的代码应该添加 readl()
,CPU 在做其他工作之前应该等待。经典的"位脉冲"序列对 I/O 端口空间很有效:
c
for (i = 8; --i; val >>= 1) {
outb(val & 1, ioport_reg); /* 置位 */
udelay(10);
}
对 MMIO 空间来说,同样的顺序应该是:
c
for (i = 8; --i; val >>= 1) {
writeb(val & 1, mmio_reg); /* 置位 */
readb(safe_mmio_reg); /* 刷新写通知 */
udelay(10);
}
重要的是,safe_mmio_reg
不能有任何干扰设备正确操作的副作用。
另一种需要注意的情况是在重置 PCI 设备时。使用 PCI 配置空间读数来刷新 writeel()
。如果预期 PCI 设备不响应 readl()
,这将在所有平台上优雅地处理 PCI 主控器的中止。大多数 x86 平台将允许 MMIO 读取主控中止(又称"软失败"),并返回垃圾(例如 ~0
)。但许多 RISC 平台会崩溃(又称"硬失败")。
Chapter 12. PCI Drivers
第 12 章 PCI 驱动程序
While Chapter 9 introduced the lowest levels of hardware control, this chapter provides an overview of the higher-level bus architectures. A bus is made up of both an electrical interface and a programming interface. In this chapter, we deal with the programming interface.
虽然 Chapter 9 介绍了硬件控制的最低层,但本章提供了更高层次总线架构的概述。总线由电气接口和编程接口组成。在本章中,我们主要关注编程接口。
This chapter covers a number of bus architectures. However, the primary focus is on the kernel functions that access Peripheral Component Interconnect (PCI) peripherals, because these days the PCI bus is the most commonly used peripheral bus on desktops and bigger computers. The bus is the one that is best supported by the kernel. ISA is still common for electronic hobbyists and is described later, although it is pretty much a bare-metal kind of bus, and there isn't much to say in addition to what is covered in Chapter 9 and Chapter 10.
本章涵盖了多种总线架构,但主要关注访问外围组件互连(Peripheral Component Interconnect,PCI)外设的内核函数,因为如今 PCI 总线是桌面和大型计算机中最常用的外设总线。该总线是内核支持得最好的总线。ISA 仍然在电子爱好者中较为常见,稍后会进行描述,尽管它基本上是一种"裸机"类型的总线,除了 Chapter 9 和 Chapter 10 中的内容外,没有太多额外的内容可说。
The PCI Interface
PCI 接口
Although many computer users think of PCI as a way of laying out electrical wires, it is actually a complete set of specifications defining how different parts of a computer should interact.
尽管许多计算机用户认为 PCI 是一种布线方式,但实际上它是一套完整的规范,定义了计算机的不同部分应如何交互。
The PCI specification covers most issues related to computer interfaces. We are not going to cover it all here; in this section, we are mainly concerned with how a PCI driver can find its hardware and gain access to it. The probing techniques discussed in Chapter 12 and Chapter 10 can be used with PCI devices, but the specification offers an alternative that is preferable to probing.
PCI 规范涵盖了与计算机接口相关的大多数问题。我们在这里不会全部涉及;在本节中,我们主要关注 PCI 驱动程序如何找到其硬件并访问它。在 Chapter 12 和 Chapter 10 中讨论的探测技术可以用于 PCI 设备,但规范提供了一种替代方案,优于探测。
The PCI architecture was designed as a replacement for the ISA standard, with three main goals: to get better performance when transferring data between the computer and its peripherals, to be as platform independent as possible, and to simplify adding and removing peripherals to the system.
PCI 架构是为了取代 ISA 标准而设计的,主要目标有三个:在计算机与其外设之间传输数据时获得更好的性能,尽可能与平台无关,并简化系统中外设的添加和移除。
The PCI bus achieves better performance by using a higher clock rate than ISA; its clock runs at 25 or 33 MHz (its actual rate being a factor of the system clock), and 66-MHz and even 133-MHz implementations have recently been deployed as well. Moreover, it is equipped with a 32-bit data bus, and a 64-bit extension has been included in the specification. Platform independence is often a goal in the design of a computer bus, and it's an especially important feature of PCI, because the PC world has always been dominated by processor-specific interface standards. PCI is currently used extensively on IA-32, Alpha, PowerPC, SPARC64, and IA-64 systems, and some other platforms as well.
PCI 总线通过使用比 ISA 更高的时钟频率来实现更好的性能;其时钟频率为 25 或 33 MHz(实际频率是系统时钟的倍数),最近还部署了 66 MHz 甚至 133 MHz 的实现。此外,它配备了 32 位数据总线,规范中还包含了 64 位扩展。平台无关性通常是计算机总线设计的一个目标,对于 PCI 来说尤其重要,因为 PC 世界一直被特定处理器的接口标准所主导。目前,PCI 广泛应用于 IA-32、Alpha、PowerPC、SPARC64 和 IA-64 系统,以及其他一些平台。
What is most relevant to the driver writer, however, is PCI's support for autodetection of interface boards. PCI devices are jumperless (unlike most older peripherals) and are automatically configured at boot time. Then, the device driver must be able to access configuration information in the device in order to complete initialization. This happens without the need to perform any probing.
然而,对于驱动程序编写者来说,最相关的是 PCI 对接口板的自动检测支持。PCI 设备是无跳线的(与大多数旧外设不同),并且在启动时自动配置。然后,设备驱动程序必须能够访问设备中的配置信息,以便完成初始化。这一过程无需执行任何探测。
PCI Addressing
PCI 寻址
Each PCI peripheral is identified by a bus number, a device number, and a function number. The PCI specification permits a single system to host up to 256 buses, but because 256 buses are not sufficient for many large systems, Linux now supports PCI domains . Each PCI domain can host up to 256 buses. Each bus hosts up to 32 devices, and each device can be a multifunction board (such as an audio device with an accompanying CD-ROM drive) with a maximum of eight functions. Therefore, each function can be identified at hardware level by a 16-bit address, or key. Device drivers written for Linux, though, don't need to deal with those binary addresses, because they use a specific data structure, called pci_dev
, to act on the devices.
每个 PCI 外设通过一个 总线 号、一个 设备 号和一个 功能 号来识别。PCI 规范允许单个系统支持多达 256 个总线,但由于 256 个总线对于许多大型系统来说是不够的,Linux 现在支持 PCI 域 。每个 PCI 域可以支持多达 256 个总线。每个总线可以支持多达 32 个设备,每个设备可以是一个多功能板(例如带有配套 CD-ROM 驱动器的音频设备),最多有八个功能。因此,每个功能可以在硬件级别通过一个 16 位地址或键来识别。然而,为 Linux 编写的设备驱动程序不需要处理这些二进制地址,因为它们使用一个特定的数据结构 pci_dev
来操作设备。
Most recent workstations feature at least two PCI buses. Plugging more than one bus in a single system is accomplished by means of bridges , special-purpose PCI peripherals whose task is joining two buses. The overall layout of a PCI system is a tree where each bus is connected to an upper-layer bus, up to bus 0 at the root of the tree. The CardBus PC-card system is also connected to the PCI system via bridges. A typical PCI system is represented in Figure 12-1, where the various bridges are highlighted.
大多数现代工作站至少有两个 PCI 总线。在一个系统中插入多个总线是通过 桥接器 实现的,桥接器是一种特殊的 PCI 外设,其任务是连接两个总线。PCI 系统的整体布局是一棵树,每个总线都连接到上层总线,一直到树根处的总线 0。CardBus PC 卡系统也通过桥接器连接到 PCI 系统。典型的 PCI 系统如 Figure 12-1 所示,其中突出了各种桥接器。

Figure 12-1. Layout of a typical PCI system
图 12-1. 典型 PCI 系统的布局
The 16-bit hardware addresses associated with PCI peripherals, although mostly hidden in the struct pci_dev
object, are still visible occasionally, especially when lists of devices are being used. One such situation is the output of lspci (part of the pciutils package, available with most distributions) and the layout of information in /proc/pci and /proc/bus/pci . The sysfs representation of PCI devices also shows this addressing scheme, with the addition of the PCI domain information.[1] When the hardware address is displayed, it can be shown as two values (an 8-bit bus number and an 8-bit device and function number), as three values (bus, device, and function), or as four values (domain, bus, device, and function); all the values are usually displayed in hexadecimal.
与 PCI 外设相关的 16 位硬件地址,尽管大多隐藏在 struct pci_dev
对象中,但偶尔仍然可见,尤其是在使用设备列表时。这种情况之一是 lspci (pciutils 包的一部分,大多数发行版中都有)的输出,以及 /proc/pci 和 /proc/bus/pci 中的信息布局。PCI 设备的 sysfs 表示也显示了这种寻址方案,并增加了 PCI 域信息。[1] 当显示硬件地址时,它可以显示为两个值(8 位总线号和 8 位设备及功能号),三个值(总线、设备和功能),或四个值(域、总线、设备和功能);所有值通常以十六进制显示。
For example, /proc/bus/pci/devices uses a single 16-bit field (to ease parsing and sorting), while /proc/bus/ busnumber
splits the address into three fields. The following shows how those addresses appear, showing only the beginning of the output lines:
例如,/proc/bus/pci/devices 使用一个 16 位字段(便于解析和排序),而 /proc/bus/ busnumber
将地址分为三个字段。以下显示了这些地址的外观,仅显示输出行的开头部分:
$ lspci | cut -d: -f1-3
0000:00:00.0 Host bridge
0000:00:00.1 RAM memory
0000:00:00.2 RAM memory
0000:00:02.0 USB Controller
0000:00:04.0 Multimedia audio controller
0000:00:06.0 Bridge
0000:00:07.0 ISA bridge
0000:00:09.0 USB Controller
0000:00:09.1 USB Controller
0000:00:09.2 USB Controller
0000:00:0c.0 CardBus bridge
0000:00:0f.0 IDE interface
0000:00:10.0 Ethernet controller
0000:00:12.0 Network controller
0000:00:13.0 FireWire (IEEE 1394)
0000:00:14.0 VGA compatible controller
$ cat /proc/bus/pci/devices | cut -f1
0000
0001
0002
0010
0020
0030
0038
0048
0049
004a
0060
0078
0080
0090
0098
00a0
$ tree /sys/bus/pci/devices/
/sys/bus/pci/devices/
|-- 0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0
|-- 0000:00:00.1 -> ../../../devices/pci0000:00/0000:00:00.1
|-- 0000:00:00.2 -> ../../../devices/pci0000:00/0000:00:00.2
|-- 0000:00:02.0 -> ../../../devices/pci0000:00/0000:00:02.0
|-- 0000:00:04.0 -> ../../../devices/pci0000:00/0000:00:04.0
|-- 0000:00:06.0 -> ../../../devices/pci0000:00/0000:00:06.0
|-- 0000:00:07.0 -> ../../../devices/pci0000:00/0000:00:07.0
|-- 0000:00:09.0 -> ../../../devices/pci0000:00/0000:00:09.0
|-- 0000:00:09.1 -> ../../../devices/pci0000:00/0000:00:09.1
|-- 0000:00:09.2 -> ../../../devices/pci0000:00/0000:00:09.2
|-- 0000:00:0c.0 -> ../../../devices/pci0000:00/0000:00:0c.0
|-- 0000:00:0f.0 -> ../../../devices/pci0000:00/0000:00:0f.0
|-- 0000:00:10.0 -> ../../../devices/pci0000:00/0000:00:10.0
|-- 0000:00:12.0 -> ../../../devices/pci0000:00/0000:00:12.0
|-- 0000:00:13.0 -> ../../../devices/pci0000:00/0000:00:13.0
`-- 0000:00:14.0 -> ../../../devices/pci0000:00/0000:00:14.0
All three lists of devices are sorted in the same order, since lspci uses the /proc files as its source of information. Taking the VGA video controller as an example, 0x00a0
means 0000:00:14.0
when split into domain (16 bits), bus (8 bits), device (5 bits) and function (3 bits).
所有三个设备列表的排序顺序相同,因为 lspci 使用 /proc 文件作为信息来源。以 VGA 视频控制器为例,0x00a0
表示 0000:00:14.0
,拆分为域(16 位)、总线(8 位)、设备(5 位)和功能(3 位)。
The hardware circuitry of each peripheral board answers queries pertaining to three address spaces: memory locations, I/O ports, and configuration registers. The first two address spaces are shared by all the devices on the same PCI bus (i.e., when you access a memory location, all the devices on that PCI bus see the bus cycle at the same time). The configuration space, on the other hand, exploits geographical addressing . Configuration queries address only one slot at a time, so they never collide.
每个外设板的硬件电路会响应有关三个地址空间的查询:内存位置、I/O 端口和配置寄存器。前两个地址空间由同一 PCI 总线上的所有设备共享(即,当你访问一个内存位置时,该 PCI 总线上的所有设备同时看到总线周期)。另一方面,配置空间利用了 地理寻址。配置查询一次只针对一个插槽,因此它们永远不会冲突。
As far as the driver is concerned, memory and I/O regions are accessed in the usual ways via inb , readb , and so forth. Configuration transactions, on the other hand, are performed by calling specific kernel functions to access configuration registers. With regard to interrupts, every PCI slot has four interrupt pins, and each device function can use one of them without being concerned about how those pins are routed to the CPU. Such routing is the responsibility of the computer platform and is implemented outside of the PCI bus. Since the PCI specification requires interrupt lines to be shareable, even a processor with a limited number of IRQ lines, such as the x86, can host many PCI interface boards (each with four interrupt pins).
就驱动程序而言,内存和 I/O 区域通过 inb 、readb 等方式正常访问。另一方面,配置事务是通过调用特定的内核函数来访问配置寄存器来完成的。关于中断,每个 PCI 插槽都有四个中断引脚,每个设备功能都可以使用其中的一个,而不必担心这些引脚如何连接到 CPU。这种路由是计算机平台的责任,并且是在 PCI 总线之外实现的。由于 PCI 规范要求中断线是可以共享的,即使是像 x86 这样具有有限数量 IRQ 线的处理器,也可以托管许多 PCI 接口板(每个板有四个中断引脚)。
The I/O space in a PCI bus uses a 32-bit address bus (leading to 4 GB of I/O ports), while the memory space can be accessed with either 32-bit or 64-bit addresses. 64-bit addresses are available on more recent platforms. Addresses are supposed to be unique to one device, but software may erroneously configure two devices to the same address, making it impossible to access either one. But this problem never occurs unless a driver is willingly playing with registers it shouldn't touch. The good news is that every memory and I/O address region offered by the interface board can be remapped by means of configuration transactions. That is, the firmware initializes PCI hardware at system boot, mapping each region to a different address to avoid collisions.[2] The addresses to which these regions are currently mapped can be read from the configuration space, so the Linux driver can access its devices without probing. After reading the configuration registers, the driver can safely access its hardware.
PCI 总线中的 I/O 空间使用 32 位地址总线(导致有 4 GB 的 I/O 端口),而内存空间可以用 32 位或 64 位地址访问。较新的平台上可以使用 64 位地址。地址应该是唯一的,但软件可能会错误地将两个设备配置为相同的地址,使得无法访问任何一个。但除非驱动程序故意去操作它不应该触碰的寄存器,否则这个问题永远不会出现。好消息是,接口板提供的每个内存和 I/O 地址区域都可以通过配置事务重新映射。也就是说,固件在系统启动时初始化 PCI 硬件,将每个区域映射到不同的地址以避免冲突。[2] 这些区域当前映射到的地址可以从配置空间读取,因此 Linux 驱动程序可以无需探测地访问其设备。在读取配置寄存器之后,驱动程序可以安全地访问其硬件。
The PCI configuration space consists of 256 bytes for each device function (except for PCI Express devices, which have 4 KB of configuration space for each function), and the layout of the configuration registers is standardized. Four bytes of the configuration space hold a unique function ID, so the driver can identify its device by looking for the specific ID for that peripheral.[3] In summary, each device board is geographically addressed to retrieve its configuration registers; the information in those registers can then be used to perform normal I/O access, without the need for further geographic addressing.
PCI 配置空间由每个设备功能的 256 字节组成(PCI Express 设备除外,每个功能有 4 KB 的配置空间),配置寄存器的布局是标准化的。配置空间的四个字节包含一个唯一的功能 ID,因此驱动程序可以通过查找该外设的特定 ID 来识别其设备。[3] 总之,每个设备板通过地理寻址来检索其配置寄存器;然后可以使用这些寄存器中的信息来执行正常的 I/O 访问,而无需进一步的地理寻址。
It should be clear from this description that the main innovation of the PCI interface standard over ISA is the configuration address space. Therefore, in addition to the usual driver code, a PCI driver needs the ability to access the configuration space, in order to save itself from risky probing tasks.
从这个描述中应该清楚地看出,PCI 接口标准相对于 ISA 的主要创新是配置地址空间。因此,除了通常的驱动程序代码外,PCI 驱动程序还需要能够访问配置空间,以避免冒险的探测任务。
For the remainder of this chapter, we use the word device to refer to a device function, because each function in a multifunction board acts as an independent entity. When we refer to a device, we mean the tuple "domain number, bus number, device number, and function number."
在本章的其余部分中,我们用 device(设备)一词来指代设备功能,因为多功能板上的每个功能都作为一个独立实体来运行。当我们提到一个设备时,我们指的是"域号、总线号、设备号和功能号"这一组信息。
Boot Time
启动时间
To see how PCI works, we start from system boot, since that's when the devices are configured.
要了解 PCI 的工作方式,我们需要从系统启动开始,因为这是设备被配置的时候。
When power is applied to a PCI device, the hardware remains inactive. In other words, the device responds only to configuration transactions. At power on, the device has no memory and no I/O ports mapped in the computer's address space; every other device-specific feature, such as interrupt reporting, is disabled as well.
当电源被应用到 PCI 设备时,硬件保持非活动状态。换句话说,设备只响应配置事务。在加电时,设备没有内存和 I/O 端口被映射到计算机的地址空间中;所有其他设备特定的功能(如中断报告)也都被禁用。
Fortunately, every PCI motherboard is equipped with PCI-aware firmware, called the BIOS, NVRAM, or PROM, depending on the platform. The firmware offers access to the device configuration address space by reading and writing registers in the PCI controller.
幸运的是,每块 PCI 主板都配备了支持 PCI 的固件,根据平台的不同,这些固件被称为 BIOS、NVRAM 或 PROM。固件通过读取和写入 PCI 控制器中的寄存器来提供对设备配置地址空间的访问。
At system boot, the firmware (or the Linux kernel, if so configured) performs configuration transactions with every PCI peripheral in order to allocate a safe place for each address region it offers. By the time a device driver accesses the device, its memory and I/O regions have already been mapped into the processor's address space. The driver can change this default assignment, but it never needs to do that.
在系统启动时,固件(或者如果配置了的话,Linux 内核)会与每个 PCI 外设执行配置事务,以便为每个地址区域分配一个安全的位置。当设备驱动程序访问设备时,其内存和 I/O 区域已经被映射到处理器的地址空间中。驱动程序可以更改这个默认分配,但它永远不需要这么做。
As suggested, the user can look at the PCI device list and the devices' configuration registers by reading /proc/bus/pci/devices and */proc/bus/pci/*/**. The former is a text file with (hexadecimal) device information, and the latter are binary files that report a snapshot of the configuration registers of each device, one file per device. The individual PCI device directories in the sysfs tree can be found in /sys/bus/pci/devices . A PCI device directory contains a number of different files:
如建议的那样,用户可以通过阅读 /proc/bus/pci/devices 和 /proc/bus/pci/*/* 来查看 PCI 设备列表和设备的配置寄存器。前者是一个包含(十六进制)设备信息的文本文件,后者是二进制文件,报告每个设备的配置寄存器的快照,每个设备一个文件。sysfs 树中的各个 PCI 设备目录可以在 /sys/bus/pci/devices 中找到。PCI 设备目录包含许多不同的文件:
$ tree /sys/bus/pci/devices/0000:00:10.0
/sys/bus/pci/devices/0000:00:10.0
|-- class
|-- config
|-- detach_state
|-- device
|-- irq
|-- power
| `-- state
|-- resource
|-- subsystem_device
|-- subsystem_vendor
`-- vendor
The file config is a binary file that allows the raw PCI config information to be read from the device (just like the /proc/bus/pci/*/ * provides.) The files vendor , device , subsystem_device , subsystem_vendor , and class all refer to the specific values of this PCI device (all PCI devices provide this information.) The file irq shows the current IRQ assigned to this PCI device, and the file resource shows the current memory resources allocated by this device.
文件 config 是一个二进制文件,允许从设备中读取原始的 PCI 配置信息(就像 /proc/bus/pci/*/ * 提供的一样)。文件 vendor 、device 、subsystem_device 、subsystem_vendor 和 class 都指的是这个 PCI 设备的特定值(所有 PCI 设备都提供这些信息)。文件 irq 显示分配给此 PCI 设备的当前 IRQ,文件 resource 显示此设备分配的当前内存资源。
Configuration Registers and Initialization
配置寄存器和初始化
In this section, we look at the configuration registers that PCI devices contain. All PCI devices feature at least a 256-byte address space. The first 64 bytes are standardized, while the rest are device dependent. Figure 12-2 shows the layout of the device-independent configuration space.
在本节中,我们查看 PCI 设备包含的配置寄存器。所有 PCI 设备至少具有 256 字节的地址空间。前 64 字节是标准化的,其余部分则取决于设备。Figure 12-2 显示了设备独立配置空间的布局。

Figure 12-2. The standardized PCI configuration registers
图 12-2. 标准化的 PCI 配置寄存器
As the figure shows, some of the PCI configuration registers are required and some are optional. Every PCI device must contain meaningful values in the required registers, whereas the contents of the optional registers depend on the actual capabilities of the peripheral. The optional fields are not used unless the contents of the required fields indicate that they are valid. Thus, the required fields assert the board's capabilities, including whether the other fields are usable.
如图所示,某些 PCI 配置寄存器是必需的,而某些是可选的。每个 PCI 设备都必须在其必需的寄存器中包含有意义的值,而可选寄存器的内容则取决于外设的实际功能。除非必需字段的内容表明它们是有效的,否则不会使用可选字段。因此,必需字段声明了板卡的功能,包括其他字段是否可用。
It's interesting to note that the PCI registers are always little-endian. Although the standard is designed to be architecture independent, the PCI designers sometimes show a slight bias toward the PC environment. The driver writer should be careful about byte ordering when accessing multibyte configuration registers; code that works on the PC might not work on other platforms. The Linux developers have taken care of the byte-ordering problem (see the next section, Section 12.1.8), but the issue must be kept in mind. If you ever need to convert data from host order to PCI order or vice versa, you can resort to the functions defined in <asm/byteorder.h> , introduced in Chapter 11, knowing that PCI byte order is little-endian.
值得注意的是,PCI 寄存器始终是小端模式。尽管该标准旨在与架构无关,但 PCI 设计者有时会略微偏向于 PC 环境。驱动程序编写者在访问多字节配置寄存器时应注意字节顺序;在 PC 上工作的代码可能无法在其他平台上工作。Linux 开发者已经解决了字节顺序问题(参见下一节 Section 12.1.8),但必须牢记这一问题。如果需要将数据从主机顺序转换为 PCI 顺序,或者反之,可以使用在 Chapter 11 中介绍的 <asm/byteorder.h> 中定义的函数,要知道 PCI 字节顺序是小端模式。
Describing all the configuration items is beyond the scope of this book. Usually, the technical documentation released with each device describes the supported registers. What we're interested in is how a driver can look for its device and how it can access the device's configuration space.
描述所有配置项超出了本书的范围。通常,随每个设备发布的技术文档会描述支持的寄存器。我们感兴趣的是驱动程序如何查找其设备以及如何访问设备的配置空间。
Three or five PCI registers identify a device: vendorID
, deviceID
, and class
are the three that are always used. Every PCI manufacturer assigns proper values to these read-only registers, and the driver can use them to look for the device. Additionally, the fields subsystem vendorID
and subsystem deviceID
are sometimes set by the vendor to further differentiate similar devices.
三个或五个 PCI 寄存器用于识别设备:vendorID
、deviceID
和 class
是始终使用的三个寄存器。每个 PCI 制造商都为这些只读寄存器分配适当的值,驱动程序可以使用它们来查找设备。此外,subsystem vendorID
和 subsystem deviceID
字段有时由供应商设置,以进一步区分类似的设备。
Let's look at these registers in more detail:
让我们更详细地看看这些寄存器:
-
vendorID
- This 16-bit register identifies a hardware manufacturer. For instance, every Intel device is marked with the same vendor number,
0x8086
. There is a global registry of such numbers, maintained by the PCI Special Interest Group, and manufacturers must apply to have a unique number assigned to them. - 这个 16 位寄存器用于识别硬件制造商。例如,所有英特尔设备都标有相同的供应商编号
0x8086
。有一个全球性的此类编号注册表,由 PCI 特殊兴趣小组维护,制造商必须申请分配一个唯一的编号。
- This 16-bit register identifies a hardware manufacturer. For instance, every Intel device is marked with the same vendor number,
-
deviceID
- This is another 16-bit register, selected by the manufacturer; no official registration is required for the device ID. This ID is usually paired with the vendor ID to make a unique 32-bit identifier for a hardware device. We use the word signature to refer to the vendor and device ID pair. A device driver usually relies on the signature to identify its device; you can find what value to look for in the hardware manual for the target device.
- 这是另一个 16 位寄存器,由制造商选择;设备 ID 不需要官方注册。此 ID 通常与供应商 ID 配对,形成一个硬件设备的唯一 32 位标识符。我们用"签名"一词来指代供应商 ID 和设备 ID 的组合。设备驱动程序通常依赖签名来识别其设备;您可以在目标设备的硬件手册中找到要查找的值。
-
class
- Every peripheral device belongs to a class . The
class
register is a 16-bit value whose top 8 bits identify the "base class" (or group ). For example, "ethernet" and "token ring" are two classes belonging to the "network" group, while the "serial" and "parallel" classes belong to the "communication" group. Some drivers can support several similar devices, each of them featuring a different signature but all belonging to the same class; these drivers can rely on theclass
register to identify their peripherals, as shown later. - 每个外设都属于一个"类别"。
class
寄存器是一个 16 位的值,其最高 8 位标识"基础类别"(或"组")。例如,"以太网"和"令牌环"是属于"网络"组的两个类别,而"串行"和"并行"类别属于"通信"组。有些驱动程序可以支持几种类似的设备,每种设备都有不同的签名,但都属于同一个类别;这些驱动程序可以依赖class
寄存器来识别其外设,稍后会展示。
- Every peripheral device belongs to a class . The
-
subsystem vendorID
subsystem deviceID
- These fields can be used for further identification of a device. If the chip is a generic interface chip to a local (onboard) bus, it is often used in several completely different roles, and the driver must identify the actual device it is talking with. The subsystem identifiers are used to this end.
- 这些字段可用于进一步识别设备。如果芯片是用于本地(板载)总线的通用接口芯片,它通常用于几种完全不同的角色,驱动程序必须识别它正在通信的实际设备。子系统标识符用于此目的。
Using these different identifiers, a PCI driver can tell the kernel what kind of devices it supports. The struct
pci_device_id
structure is used to define a list of the different types of PCI devices that a driver supports. This structure contains the following fields:
通过使用这些不同的标识符,PCI 驱动程序可以告诉内核它支持哪些类型的设备。struct
pci_device_id
结构用于定义驱动程序支持的不同类型的 PCI 设备列表。这个结构包含以下字段:
-
_ _u32 vendor;
_ _u32 device;
- These specify the PCI vendor and device IDs of a device. If a driver can handle any vendor or device ID, the value
PCI_ANY_ID
should be used for these fields. - 这些字段指定设备的 PCI 供应商 ID 和设备 ID。如果驱动程序可以处理任何供应商或设备 ID,则应为这些字段使用
PCI_ANY_ID
值。
- These specify the PCI vendor and device IDs of a device. If a driver can handle any vendor or device ID, the value
-
_ _u32 subvendor;
_ _u32 subdevice;
- These specify the PCI subsystem vendor and subsystem device IDs of a device. If a driver can handle any type of subsystem ID, the value
PCI_ANY_ID
should be used for these fields. - 这些字段指定设备的 PCI 子系统供应商和子系统设备 ID。如果驱动程序可以处理任何类型的子系统 ID,则应为这些字段使用
PCI_ANY_ID
值。
- These specify the PCI subsystem vendor and subsystem device IDs of a device. If a driver can handle any type of subsystem ID, the value
-
_ _u32 class;
_ _u32 class_mask;
- These two values allow the driver to specify that it supports a type of PCI class device. The different classes of PCI devices (a VGA controller is one example) are described in the PCI specification. If a driver can handle any type of subsystem ID, the value
PCI_ANY_ID
should be used for these fields. - 这两个值允许驱动程序指定它支持一种 PCI 类别设备。PCI 规范中描述了不同类别的 PCI 设备(VGA 控制器就是一个例子)。如果驱动程序可以处理任何类型的子系统 ID,则应为这些字段使用
PCI_ANY_ID
值。
- These two values allow the driver to specify that it supports a type of PCI class device. The different classes of PCI devices (a VGA controller is one example) are described in the PCI specification. If a driver can handle any type of subsystem ID, the value
-
kernel_ulong_t driver_data;
- This value is not used to match a device but is used to hold information that the PCI driver can use to differentiate between different devices if it wants to.
- 此值不用于匹配设备,而是用于保存 PCI 驱动程序可以用来区分不同设备的信息(如果它想的话)。
There are two helper macros that should be used to initialize a struct
pci_device_id
structure:
有两个辅助宏可用于初始化 struct
pci_device_id
结构:
-
PCI_DEVICE(vendor, device)
- This creates a
struct
pci_device_id
that matches only the specific vendor and device ID. The macro sets thesubvendor
andsubdevice
fields of the structure toPCI_ANY_ID
. - 此宏创建一个
struct
pci_device_id
,仅匹配特定的供应商和设备 ID。该宏将结构的subvendor
和subdevice
字段设置为PCI_ANY_ID
。
- This creates a
-
PCI_DEVICE_CLASS(device_class, device_class_mask)
- This creates a
struct
pci_device_id
that matches a specific PCI class. - 此宏创建一个
struct
pci_device_id
,匹配特定的 PCI 类别。
- This creates a
An example of using these macros to define the type of devices a driver supports can be found in the following kernel files:
以下内核文件中可以找到使用这些宏定义驱动程序支持的设备类型的示例:
drivers/usb/host/ehci-hcd.c:
static const struct pci_device_id pci_ids[ ] = { {
/* handle any USB 2.0 EHCI controller */
PCI_DEVICE_CLASS(((PCI_CLASS_SERIAL_USB << 8) | 0x20), ~0),
.driver_data = (unsigned long) &ehci_driver,
},
{ /* end: all zeroes */ }
};
drivers/i2c/busses/i2c-i810.c:
static struct pci_device_id i810_ids[ ] = {
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82810_IG1) },
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82810_IG3) },
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82810E_IG) },
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82815_CGC) },
{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82845G_IG) },
{ 0, },
};
These examples create a list of struct
pci_device_id
structures, with an empty structure set to all zeros as the last value in the list. This array of IDs is used in the struct
pci_driver
(described below), and it is also used to tell user space which devices this specific driver supports.
这些示例创建了一个 struct
pci_device_id
结构列表,列表的最后一个值是一个设置为全零的空结构。这个 ID 数组用于 struct
pci_driver
(稍后描述),也用于告诉用户空间这个特定驱动程序支持哪些设备。
MODULE_DEVICE_TABLE
模块设备表
This pci_device_id
structure needs to be exported to user space to allow the hotplug and module loading systems to know what module works with what hardware devices. The macro MODULE_DEVICE_TABLE
accomplishes this. An example is:
需要将此 pci_device_id
结构导出到用户空间,以便让热插拔和模块加载系统知道哪个模块与哪个硬件设备配合工作。MODULE_DEVICE_TABLE
宏可以实现这一点。示例如下:
MODULE_DEVICE_TABLE(pci, i810_ids);
This statement creates a local variable called _ _mod_pci_device_table
that points to the list of struct
pci_device_id
. Later in the kernel build process, the depmod
program searches all modules for the symbol _ _mod_pci_device_table
. If that symbol is found, it pulls the data out of the module and adds it to the file /lib/modules/KERNEL_VERSION/modules.pcimap . After depmod
completes, all PCI devices that are supported by modules in the kernel are listed, along with their module names, in that file. When the kernel tells the hotplug system that a new PCI device has been found, the hotplug system uses the modules.pcimap file to find the proper driver to load.
该语句创建了一个名为 _ _mod_pci_device_table
的局部变量,指向 struct
pci_device_id
列表。在内核构建过程的后续阶段中,depmod
程序会在所有模块中搜索 _ _mod_pci_device_table
符号。如果找到该符号,它会将模块中的数据提取出来,并将其添加到文件 /lib/modules/KERNEL_VERSION/modules.pcimap 中。depmod
完成后,内核中模块支持的所有 PCI 设备都会被列在该文件中,并附上它们的模块名称。当内核告知热插拔系统发现了一个新的 PCI 设备时,热插拔系统会使用 modules.pcimap 文件来找到要加载的正确驱动程序。
Registering a PCI Driver
注册 PCI 驱动程序
The main structure that all PCI drivers must create in order to be registered with the kernel properly is the struct pci_driver
structure. This structure consists of a number of function callbacks and variables that describe the PCI driver to the PCI core. Here are the fields in this structure that a PCI driver needs to be aware of:
所有 PCI 驱动程序必须创建的主要结构是 struct pci_driver
结构,以便正确地在内核中注册。该结构包含许多函数回调和变量,用于向 PCI 核心描述 PCI 驱动程序。以下是 PCI 驱动程序需要了解的该结构中的字段:
-
const char *name;
- The name of the driver. It must be unique among all PCI drivers in the kernel and is normally set to the same name as the module name of the driver. It shows up in sysfs under /sys/bus/pci/drivers/ when the driver is in the kernel.
- 驱动程序的名称。它必须在内核中的所有 PCI 驱动程序中是唯一的,并且通常设置为与驱动程序模块名称相同的名称。当驱动程序在内核中时,它会出现在 sysfs 的 /sys/bus/pci/drivers/ 下。
-
const struct pci_device_id *id_table;
- Pointer to the
struct
pci_device_id
table described earlier in this chapter. - 指向本章前面描述的
struct
pci_device_id
表的指针。
- Pointer to the
-
int (*probe) (struct pci_dev *dev, const struct pci_device_id *id);
- Pointer to the probe function in the PCI driver. This function is called by the PCI core when it has a
struct pci_dev
that it thinks this driver wants to control. A pointer to thestruct
pci_device_id
that the PCI core used to make this decision is also passed to this function. If the PCI driver claims thestruct
pci_dev
that is passed to it, it should initialize the device properly and return0
. If the driver does not want to claim the device, or an error occurs, it should return a negative error value. More details about this function follow later in this chapter. - 指向 PCI 驱动程序中的探测函数的指针。当 PCI 核心有一个
struct pci_dev
,它认为这个驱动程序想要控制时,会调用这个函数。PCI 核心用来做出这个决定的struct
pci_device_id
的指针也会传递给这个函数。如果 PCI 驱动程序声称传递给它的struct
pci_dev
,它应该正确初始化设备并返回0
。如果驱动程序不想声称该设备,或者发生错误,它应该返回一个负的错误值。关于这个函数的更多细节将在本章后面介绍。
- Pointer to the probe function in the PCI driver. This function is called by the PCI core when it has a
-
void (*remove) (struct pci_dev *dev);
- Pointer to the function that the PCI core calls when the
struct
pci_dev
is being removed from the system, or when the PCI driver is being unloaded from the kernel. More details about this function follow later in this chapter. - 指向 PCI 核心在从系统中移除
struct
pci_dev
,或者从内核中卸载 PCI 驱动程序时调用的函数的指针。关于这个函数的更多细节将在本章后面介绍。
- Pointer to the function that the PCI core calls when the
-
int (*suspend) (struct pci_dev *dev, u32 state);
- Pointer to the function that the PCI core calls when the
struct
pci_dev
is being suspended. The suspend state is passed in thestate
variable. This function is optional; a driver does not have to provide it. - 指向 PCI 核心在挂起
struct
pci_dev
时调用的函数的指针。挂起状态通过state
变量传递。这个函数是可选的;驱动程序不需要提供它。
- Pointer to the function that the PCI core calls when the
-
int (*resume) (struct pci_dev *dev);
- Pointer to the function that the PCI core calls when the
struct
pci_dev
is being resumed. It is always called aftersuspend
has been called. This function is optional; a driver does not have to provide it. - 指向 PCI 核心在恢复
struct
pci_dev
时调用的函数的指针。它总是在调用suspend
之后被调用。这个函数是可选的;驱动程序不需要提供它。
- Pointer to the function that the PCI core calls when the
In summary, to create a proper struct
pci_driver
structure, only four fields need to be initialized:
总之,要创建一个合适的 struct
pci_driver
结构,只需要初始化四个字段:
static struct pci_driver pci_driver = {
.name = "pci_skel",
.id_table = ids,
.probe = probe,
.remove = remove,
};
To register the struct pci_driver
with the PCI core, a call to pci_register_driver is made with a pointer to the struct
pci_driver
. This is traditionally done in the module initialization code for the PCI driver:
要将 struct pci_driver
注册到 PCI 核心,需要使用指向 struct
pci_driver
的指针调用 pci_register_driver。这通常在 PCI 驱动程序的模块初始化代码中完成:
static int _ _init pci_skel_init(void)
{
return pci_register_driver(&pci_driver);
}
Note that the pci_register_driver function either returns a negative error number or 0
if everything was registered successfully. It does not return the number of devices that were bound to the driver or an error number if no devices were bound to the driver. This is a change from kernels prior to the 2.6 release and was done because of the following situations:
请注意,pci_register_driver 函数如果注册成功会返回 0
,否则返回一个负的错误编号。它不会返回绑定到驱动程序的设备数量,或者如果没有设备绑定到驱动程序,则返回一个错误编号。这与 2.6 版本之前的内核有所不同,原因是以下情况:
-
On systems that support PCI hotplug, or CardBus systems, a PCI device can appear or disappear at any point in time. It is helpful if drivers can be loaded before the device appears, to reduce the time it takes to initialize a device.
- 在支持 PCI 热插拔或 CardBus 的系统中,PCI 设备可以在任何时候出现或消失。如果驱动程序可以在设备出现之前加载,将有助于减少初始化设备所需的时间。
-
The 2.6 kernel allows new PCI IDs to be dynamically allocated to a driver after it has been loaded. This is done through the file
new_id
that is created in all PCI driver directories in sysfs. This is very useful if a new device is being used that the kernel doesn't know about just yet. A user can write the PCI ID values to the new_id file, and then the driver binds to the new device. If a driver was not allowed to load until a device was present in the system, this interface would not be able to work.- 2.6 内核允许在驱动程序加载后动态分配新的 PCI ID。这是通过在 sysfs 中所有 PCI 驱动程序目录中创建的文件
new_id
来完成的。如果正在使用内核尚未知晓的新设备,这将非常有用。用户可以将 PCI ID 值写入 new_id 文件,然后驱动程序将绑定到新设备。如果驱动程序在系统中存在设备之前不允许加载,这个接口将无法工作。
- 2.6 内核允许在驱动程序加载后动态分配新的 PCI ID。这是通过在 sysfs 中所有 PCI 驱动程序目录中创建的文件
When the PCI driver is to be unloaded, the struct pci_driver
needs to be unregistered from the kernel. This is done with a call to pci_unregister_driver . When this call happens, any PCI devices that were currently bound to this driver are removed, and the remove function for this PCI driver is called before the pci_unregister_driver function returns.
当要卸载 PCI 驱动程序时,需要从内核中注销 struct pci_driver
。这是通过调用 pci_unregister_driver 来完成的。当进行此调用时,当前绑定到此驱动程序的所有 PCI 设备都将被移除,并且在 pci_unregister_driver 函数返回之前,将调用此 PCI 驱动程序的 remove 函数。
static void _ _exit pci_skel_exit(void)
{
pci_unregister_driver(&pci_driver);
}
Old-Style PCI Probing
旧式 PCI 探测
In older kernel versions, the function, pci_register_driver , was not always used by PCI drivers. Instead, they would either walk the list of PCI devices in the system by hand, or they would call a function that could search for a specific PCI device. The ability to walk the list of PCI devices in the system within a driver has been removed from the 2.6 kernel in order to prevent drivers from crashing the kernel if they happened to modify the PCI device lists while a device was being removed at the same time.
在旧版本的内核中,PCI 驱动程序并不总是使用 pci_register_driver 函数。相反,它们会手动遍历系统中的 PCI 设备列表,或者调用一个可以搜索特定 PCI 设备的函数。在 2.6 内核中,已经移除了驱动程序在系统中遍历 PCI 设备列表的能力,以防止驱动程序在设备被移除的同时修改 PCI 设备列表而导致内核崩溃。
If the ability to find a specific PCI device is really needed, the following functions are available:
如果确实需要找到一个特定的 PCI 设备,以下函数是可用的:
struct pci_dev *pci_get_device(unsigned int vendor, unsigned int device
,struct pci_dev *from);
- This function scans the list of PCI devices currently present in the system, and if the input arguments match the specified
vendor
anddevice
IDs, it increments the reference count on thestruct
pci_dev
variable found, and returns it to the caller. This prevents the structure from disappearing without any notice and ensures that the kernel does not oops. After the driver is done with thestruct
pci_dev
returned by the function, it must call the function pci_dev_put to decrement the usage count properly back to allow the kernel to clean up the device if it is removed. - 此函数扫描系统中当前存在的 PCI 设备列表,如果输入参数与指定的
vendor
和device
ID 匹配,则会增加找到的struct
pci_dev
变量的引用计数,并将其返回给调用者。这可以防止该结构在没有任何通知的情况下消失,并确保内核不会出现错误。驱动程序在使用完函数返回的struct
pci_dev
后,必须调用 pci_dev_put 函数来正确减少使用计数,以便内核在设备被移除时清理设备。
- This function scans the list of PCI devices currently present in the system, and if the input arguments match the specified
The from
argument is used to get hold of multiple devices with the same signature; the argument should point to the last device that has been found, so that the search can continue instead of restarting from the head of the list. To find the first device, from
is specified as NULL
. If no (further) device is found, NULL
is returned.
from
参数用于获取具有相同签名的多个设备;该参数应指向已找到的最后一个设备,以便继续搜索而不是从列表头部重新开始。要查找第一个设备,from
应指定为 NULL
。如果没有(进一步)找到设备,则返回 NULL
。
An example of how to use this function properly is:
正确使用此函数的示例如下:
struct pci_dev *dev;
dev = pci_get_device(PCI_VENDOR_FOO, PCI_DEVICE_FOO, NULL);
if (dev) {
/* Use the PCI device */
...
pci_dev_put(dev);
}
This function cannot be called from interrupt context. If it is, a warning is printed out to the system log.
此函数不能从中断上下文中调用。如果这样做了,系统日志中会打印出警告信息。
struct pci_dev *pci_get_subsys(unsigned int vendor, unsigned int device
,unsigned int ss_vendor, unsigned int ss_device, struct pci_dev *from);
- This function works just like pci_get_device, but it allows the subsystem vendor and subsystem device IDs to be specified when looking for the device.
- 此函数的工作方式与 pci_get_device 完全相同,但在查找设备时,它允许指定子系统供应商和子系统设备 ID。
This function cannot be called from interrupt context. If it is, a warning is printed out to the system log.
此函数不能从中断上下文中调用。如果这样做了,系统日志中会打印出警告信息。
struct pci_dev *pci_get_slot(struct pci_bus *bus, unsigned int devfn);
- This function searches the list of PCI devices in the system on the specified
struct pci_bus
for the specified device and function number of the PCI device. If a device is found that matches, its reference count is incremented and a pointer to it is returned. When the caller is finished accessing thestruct
pci_dev
, it must call pci_dev_put. - 此函数在系统中指定的
struct pci_bus
上的 PCI 设备列表中搜索指定的设备和功能号的 PCI 设备。如果找到匹配的设备,其引用计数将增加,并返回指向它的指针。调用者在访问完struct
pci_dev
后,必须调用 pci_dev_put。
- This function searches the list of PCI devices in the system on the specified
All of these functions cannot be called from interrupt context. If they are, a warning is printed out to the system log.
所有这些函数都不能从中断上下文中调用。如果这样做了,系统日志中会打印出警告信息。
Enabling the PCI Device
启用 PCI 设备
In the probe function for the PCI driver, before the driver can access any device resource (I/O region or interrupt) of the PCI device, the driver must call the pci_enable_device function:
在 PCI 驱动程序的 probe 函数中,在驱动程序可以访问 PCI 设备的任何设备资源(I/O 区域或中断)之前,驱动程序必须调用 pci_enable_device 函数:
int pci_enable_device(struct pci_dev *dev);
- This function actually enables the device. It wakes up the device and in some cases also assigns its interrupt line and I/O regions. This happens, for example, with CardBus devices (which have been made completely equivalent to PCI at the driver level).
- 此函数实际上启用了设备。它唤醒设备,在某些情况下还会为其分配中断线和 I/O 区域。例如,CardBus 设备(在驱动程序级别已完全等同于 PCI)就是这种情况。
Accessing the Configuration Space
访问配置空间
After the driver has detected the device, it usually needs to read from or write to the three address spaces: memory, port, and configuration. In particular, accessing the configuration space is vital to the driver, because it is the only way it can find out where the device is mapped in memory and in the I/O space.
在驱动程序检测到设备之后,它通常需要读取或写入三个地址空间:内存、端口和配置。特别是,访问配置空间对驱动程序至关重要,因为这是它唯一可以找到设备在内存和 I/O 空间中映射位置的方法。
Because the microprocessor has no way to access the configuration space directly, the computer vendor has to provide a way to do it. To access configuration space, the CPU must write and read registers in the PCI controller, but the exact implementation is vendor dependent and not relevant to this discussion, because Linux offers a standard interface to access the configuration space.
由于微处理器无法直接访问配置空间,因此计算机供应商必须提供一种方法来实现。为了访问配置空间,CPU 必须在 PCI 控制器中写入和读取寄存器,但具体实现取决于供应商,与本次讨论无关,因为 Linux 提供了一个标准接口来访问配置空间。
As far as the driver is concerned, the configuration space can be accessed through 8-bit, 16-bit, or 32-bit data transfers. The relevant functions are prototyped in <linux/pci.h> :
就驱动程序而言,配置空间可以通过 8 位、16 位或 32 位数据传输来访问。相关的函数在 <linux/pci.h> 中声明:
-
int pci_read_config_byte(struct pci_dev *dev, int where, u8 *val);
int pci_read_config_word(struct pci_dev *dev, int where, u16 *val);
int pci_read_config_dword(struct pci_dev *dev, int where, u32 *val);
- Read one, two, or four bytes from the configuration space of the device identified by
dev
. Thewhere
argument is the byte offset from the beginning of the configuration space. The value fetched from the configuration space is returned through theval
pointer, and the return value of the functions is an error code. The word and dword functions convert the value just read from little-endian to the native byte order of the processor, so you need not deal with byte ordering. - 从由
dev
指定的设备的配置空间中读取一个、两个或四个字节。where
参数是从配置空间开头的字节偏移量。从配置空间中获取的值通过val
指针返回,函数的返回值是一个错误代码。word 和 dword 函数将刚刚读取的值从小端模式转换为处理器的本地字节顺序,因此您无需处理字节顺序。
- Read one, two, or four bytes from the configuration space of the device identified by
-
int pci_write_config_byte(struct pci_dev *dev, int where, u8 val);
int pci_write_config_word(struct pci_dev *dev, int where, u16 val);
int pci_write_config_dword(struct pci_dev *dev, int where, u32 val);
- Write one, two, or four bytes to the configuration space. The device is identified by
dev
as usual, and the value being written is passed asval
. The word and dword functions convert the value to little-endian before writing to the peripheral device. - 向配置空间写入一个、两个或四个字节。设备如往常一样通过
dev
识别,要写入的值作为val
传递。word 和 dword 函数在写入外围设备之前将值转换为小端模式。
- Write one, two, or four bytes to the configuration space. The device is identified by
All of the previous functions are implemented as inline functions that really call the following functions. Feel free to use these functions instead of the above in case the driver does not have access to a struct pci_dev
at any particular moment in time:
所有前面的函数都作为内联函数实现,实际上调用了以下函数。如果驱动程序在某个特定时刻没有访问 struct pci_dev
的权限,可以自由使用这些函数代替上述函数:
-
int pci_bus_read_config_byte (struct pci_bus *bus, unsigned int devfn, int
where, u8 *val);
int pci_bus_read_config_word (struct pci_bus *bus, unsigned int devfn, int
where, u16 *val);
int pci_bus_read_config_dword (struct pci_bus *bus, unsigned int devfn, int
where, u32 *val);
- Just like the pci_read_ functions, but
struct
pci_bus
*
anddevfn
variables are needed instead of astruct
pci_dev *
. - 与 pci_read_ 函数类似,但需要
struct
pci_bus
*
和devfn
变量,而不是struct
pci_dev *
。
- Just like the pci_read_ functions, but
-
int pci_bus_write_config_byte (struct pci_bus *bus, unsigned int devfn, int
where, u8 val);
int pci_bus_write_config_word (struct pci_bus *bus, unsigned int devfn, int
where, u16 val);
int pci_bus_write_config_dword (struct pci_bus *bus, unsigned int devfn, int
where, u32 val);
- Just like the pci_write_ functions, but
struct
pci_bus
*
anddevfn
variables are needed instead of astruct pci_dev *
. - 与 pci_write_ 函数类似,但需要
struct
pci_bus
*
和devfn
变量,而不是struct pci_dev *
。
- Just like the pci_write_ functions, but
The best way to address the configuration variables using the pci_read_ functions is by means of the symbolic names defined in <linux/pci.h> . For example, the following small function retrieves the revision ID of a device by passing the symbolic name for where
to pci_read_config_byte :
使用 pci_read_ 函数访问配置变量的最佳方式是通过在 <linux/pci.h> 中定义的符号名称。例如,以下小函数通过将 where
的符号名称传递给 pci_read_config_byte 来检索设备的修订 ID:
static unsigned char skel_get_revision(struct pci_dev *dev)
{
u8 revision;
pci_read_config_byte(dev, PCI_REVISION_ID, &revision);
return revision;
}
Accessing the I/O and Memory Spaces
访问 I/O 和内存空间
A PCI device implements up to six I/O address regions. Each region consists of either memory or I/O locations. Most devices implement their I/O registers in memory regions, because it's generally a saner approach. However, unlike normal memory, I/O registers should not be cached by the CPU because each access can have side effects. The PCI device that implements I/O registers as a memory region marks the difference by setting a "memory-is-prefetchable" bit in its configuration register.[4] If the memory region is marked as prefetchable, the CPU can cache its contents and do all sorts of optimization with it; nonprefetchable memory access, on the other hand, can't be optimized because each access can have side effects, just as with I/O ports. Peripherals that map their control registers to a memory address range declare that range as nonprefetchable, whereas something like video memory on PCI boards is prefetchable. In this section, we use the word region to refer to a generic I/O address space that is memory-mapped or port-mapped.
PCI 设备实现了多达六个 I/O 地址区域。每个区域由内存或 I/O 位置组成。大多数设备在其内存区域中实现 I/O 寄存器,因为这通常是一种更合理的方法。然而,与普通内存不同,I/O 寄存器不应被 CPU 缓存,因为每次访问都可能产生副作用。实现 I/O 寄存器为内存区域的 PCI 设备通过在其配置寄存器中设置"内存可预取"位来标记差异。[4] 如果内存区域被标记为可预取,CPU 可以缓存其内容并对其进行各种优化;而非预取内存访问则无法优化,因为每次访问都可能产生副作用,就像 I/O 端口一样。将控制寄存器映射到内存地址范围的外设会声明该范围为不可预取,而像 PCI 板上的视频内存则是可预取的。在本节中,我们使用"区域"一词来指代一个通用的 I/O 地址空间,它可以是内存映射或端口映射的。
An interface board reports the size and current location of its regions using configuration registers---the six 32-bit registers shown in Figure 12-2, whose symbolic names are PCI_BASE_ADDRESS_0
through PCI_BASE_ADDRESS_5
. Since the I/O space defined by PCI is a 32-bit address space, it makes sense to use the same configuration interface for memory and I/O. If the device uses a 64-bit address bus, it can declare regions in the 64-bit memory space by using two consecutive PCI_BASE_ADDRESS
registers for each region, low bits first. It is possible for one device to offer both 32-bit regions and 64-bit regions.
接口板使用配置寄存器报告其区域的大小和当前位置------如 Figure 12-2 中所示的六个 32 位寄存器,其符号名称为 PCI_BASE_ADDRESS_0
到 PCI_BASE_ADDRESS_5
。由于 PCI 定义的 I/O 空间是一个 32 位地址空间,因此使用相同的配置接口来处理内存和 I/O 是合理的。如果设备使用 64 位地址总线,它可以通过为每个区域使用两个连续的 PCI_BASE_ADDRESS
寄存器(先低后高)来声明 64 位内存空间中的区域。一个设备可以同时提供 32 位区域和 64 位区域。
In the kernel, the I/O regions of PCI devices have been integrated into the generic resource management. For this reason, you don't need to access the configuration variables in order to know where your device is mapped in memory or I/O space. The preferred interface for getting region information consists of the following functions:
在内核中,PCI 设备的 I/O 区域已被整合到通用资源管理中。因此,您无需访问配置变量来了解您的设备在内存或 I/O 空间中的映射位置。获取区域信息的首选接口由以下函数组成:
-
unsigned long pci_resource_start(struct pci_dev *dev, int bar);
- The function returns the first address (memory address or I/O port number) associated with one of the six PCI I/O regions. The region is selected by the integer
bar
(the base address register), ranging from 0-5 (inclusive). - 该函数返回与六个 PCI I/O 区域之一相关联的第一个地址(内存地址或 I/O 端口编号)。区域由整数
bar
(基地址寄存器)选择,范围为 0-5(含)。
- The function returns the first address (memory address or I/O port number) associated with one of the six PCI I/O regions. The region is selected by the integer
-
unsigned long pci_resource_end(struct pci_dev *dev, int bar);
- The function returns the last address that is part of the I/O region number
bar
. Note that this is the last usable address, not the first address after the region. - 该函数返回 I/O 区域号
bar
的最后一个地址。请注意,这是该区域的最后一个可用地址,而不是该区域之后的第一个地址。
- The function returns the last address that is part of the I/O region number
-
unsigned long pci_resource_flags(struct pci_dev *dev, int bar);
- This function returns the flags associated with this resource.
- 此函数返回与该资源相关联的标志。
Resource flags are used to define some features of the individual resource. For PCI resources associated with PCI I/O regions, the information is extracted from the base address registers, but can come from elsewhere for resources not associated with PCI devices.
资源标志用于定义各个资源的一些特性。对于与 PCI I/O 区域相关的 PCI 资源,信息是从基地址寄存器中提取的,但对于不与 PCI 设备相关的资源,信息可能来自其他地方。
All resource flags are defined in <linux/ioport.h> ; the most important are:
所有资源标志都在 <linux/ioport.h> 中定义,其中最重要的是:
-
IORESOURCE_IO
IORESOURCE_MEM
- If the associated I/O region exists, one and only one of these flags is set.
- 如果相关联的 I/O 区域存在,则且仅设置其中一个标志。
-
IORESOURCE_PREFETCH
IORESOURCE_READONLY
- These flags tell whether a memory region is prefetchable and/or write protected. The latter flag is never set for PCI resources.
- 这些标志表明内存区域是否可预取和/或是否受写保护。后一个标志永远不会为 PCI 资源设置。
By making use of the pci_resource_ functions, a device driver can completely ignore the underlying PCI registers, since the system already used them to structure resource information.
通过使用 pci_resource_ 函数,设备驱动程序可以完全忽略底层的 PCI 寄存器,因为系统已经使用它们来组织资源信息。
PCI Interrupts
PCI 中断
As far as interrupts are concerned, PCI is easy to handle. By the time Linux boots, the computer's firmware has already assigned a unique interrupt number to the device, and the driver just needs to use it. The interrupt number is stored in configuration register 60 (PCI_INTERRUPT_LINE
), which is one byte wide. This allows for as many as 256 interrupt lines, but the actual limit depends on the CPU being used. The driver doesn't need to bother checking the interrupt number, because the value found in PCI_INTERRUPT_LINE
is guaranteed to be the right one.
就中断而言,PCI 很容易处理。在 Linux 启动时,计算机的固件已经为设备分配了一个唯一的中断号,驱动程序只需使用它即可。中断号存储在配置寄存器 60 (PCI_INTERRUPT_LINE
) 中,这是一个字节宽。这允许最多有 256 个中断线,但实际限制取决于所使用的 CPU。驱动程序无需费心检查中断号,因为 PCI_INTERRUPT_LINE
中的值保证是正确的。
If the device doesn't support interrupts, register 61 (PCI_INTERRUPT_PIN
) is 0
; otherwise, it's nonzero. However, since the driver knows if its device is interrupt driven or not, it doesn't usually need to read PCI_INTERRUPT_PIN
.
如果设备不支持中断,则寄存器 61 (PCI_INTERRUPT_PIN
) 为 0
;否则,它不为零。然而,由于驱动程序知道其设备是否由中断驱动,因此它通常不需要读取 PCI_INTERRUPT_PIN
。
Thus, PCI-specific code for dealing with interrupts just needs to read the configuration byte to obtain the interrupt number that is saved in a local variable, as shown in the following code. Beyond that, the information in Chapter 10 applies.
因此,处理中断的 PCI 特定代码只需读取配置字节以获取保存在本地变量中的中断号,如下所示代码。除此之外,Chapter 10 中的信息适用。
result = pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &myirq);
if (result) {
/* deal with error */
}
The rest of this section provides additional information for the curious reader but isn't needed for writing drivers.
本节的其余部分为好奇的读者提供了额外的信息,但对编写驱动程序并非必需。
A PCI connector has four interrupt pins, and peripheral boards can use any or all of them. Each pin is individually routed to the motherboard's interrupt controller, so interrupts can be shared without any electrical problems. The interrupt controller is then responsible for mapping the interrupt wires (pins) to the processor's hardware; this platform-dependent operation is left to the controller in order to achieve platform independence in the bus itself.
PCI 连接器有四个中断引脚,外围板卡可以使用其中的任意一个或全部。每个引脚都单独连接到主板的中断控制器,因此中断可以在没有任何电气问题的情况下共享。然后由中断控制器负责将中断线(引脚)映射到处理器的硬件;这种依赖于平台的操作留给控制器是为了实现总线本身的平台独立性。
The read-only configuration register located at PCI_INTERRUPT_PIN
is used to tell the computer which single pin is actually used. It's worth remembering that each device board can host up to eight devices; each device uses a single interrupt pin and reports it in its own configuration register. Different devices on the same device board can use different interrupt pins or share the same one.
位于 PCI_INTERRUPT_PIN
的只读配置寄存器用于告诉计算机实际使用的是哪一个引脚。值得注意的是,每个设备板可以托管多达八个设备;每个设备使用一个单独的中断引脚,并在其自己的配置寄存器中报告它。同一个设备板上的不同设备可以使用不同的中断引脚,或者共享同一个引脚。
The PCI_INTERRUPT_LINE
register, on the other hand, is read/write. When the computer is booted, the firmware scans its PCI devices and sets the register for each device according to how the interrupt pin is routed for its PCI slot. The value is assigned by the firmware, because only the firmware knows how the motherboard routes the different interrupt pins to the processor. For the device driver, however, the PCI_INTERRUPT_LINE
register is read-only. Interestingly, recent versions of the Linux kernel under some circumstances can assign interrupt lines without resorting to the BIOS.
另一方面,PCI_INTERRUPT_LINE
寄存器是可读写的。当计算机启动时,固件会扫描其 PCI 设备,并根据其 PCI 插槽的中断引脚路由方式为每个设备设置该寄存器。该值由固件分配,因为只有固件才知道主板如何将不同的中断引脚路由到处理器。然而,对于设备驱动程序来说,PCI_INTERRUPT_LINE
寄存器是只读的。有趣的是,在某些情况下,最近版本的 Linux 内核可以不依赖 BIOS 来分配中断线。
Hardware Abstractions
硬件抽象
We complete the discussion of PCI by taking a quick look at how the system handles the plethora of PCI controllers available on the marketplace. This is just an informational section, meant to show the curious reader how the object-oriented layout of the kernel extends down to the lowest levels.
我们通过快速了解系统如何处理市场上众多的 PCI 控制器来完成对 PCI 的讨论。这只是一个信息性的小节,旨在向好奇的读者展示内核的面向对象布局如何延伸到最低层。
The mechanism used to implement hardware abstraction is the usual structure containing methods. It's a powerful technique that adds just the minimal overhead of dereferencing a pointer to the normal overhead of a function call. In the case of PCI management, the only hardware-dependent operations are the ones that read and write configuration registers, because everything else in the PCI world is accomplished by directly reading and writing the I/O and memory address spaces, and those are under direct control of the CPU.
实现硬件抽象的机制是通常包含方法的结构。这是一种强大的技术,它只是在函数调用的正常开销上增加了最小的指针解引用开销。在 PCI 管理的情况下,唯一依赖硬件的操作是读取和写入配置寄存器的操作,因为 PCI 世界中的其他一切都是通过直接读取和写入 I/O 和内存地址空间来完成的,而这些空间直接受 CPU 控制。
Thus, the relevant structure for configuration register access includes only two fields:
因此,用于配置寄存器访问的相关结构仅包含两个字段:
struct pci_ops {
int (*read)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val);
int (*write)(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 val);
};
The structure is defined in <linux/pci.h> and used by drivers/pci/pci.c , where the actual public functions are defined.
该结构在 <linux/pci.h> 中定义,并由 drivers/pci/pci.c 使用,实际的公共函数在那里定义。
The two functions that act on the PCI configuration space have more overhead than dereferencing a pointer; they use cascading pointers due to the high object-orientedness of the code, but the overhead is not an issue in operations that are performed quite rarely and never in speed-critical paths. The actual implementation of pci_read_config_byte(dev, where, val) , for instance, expands to:
对 PCI 配置空间操作的两个函数的开销比指针解引用要大;由于代码的高度面向对象性,它们使用了级联指针,但在执行相当罕见且从未在速度关键路径中的操作中,开销并不是问题。例如,pci_read_config_byte(dev, where, val) 的实际实现展开为:
dev->bus->ops->read(bus, devfn, where, 8, val);
The various PCI buses in the system are detected at system boot, and that's when the struct pci_bus
items are created and associated with their features, including the ops
field.
系统启动时会检测系统中的各个 PCI 总线,这也是创建 struct pci_bus
项并将其与包括 ops
字段在内的特性相关联的时候。
Implementing hardware abstraction via "hardware operations" data structures is typical in the Linux kernel. One important example is the struct
alpha_machine_vector
data structure. It is defined in <asm-alpha/machvec.h> and takes care of everything that may change across different Alpha-based computers.
通过"硬件操作"数据结构实现硬件抽象在 Linux 内核中是典型的。一个重要的例子是 struct
alpha_machine_vector
数据结构。它在 <asm-alpha/machvec.h> 中定义,负责处理可能在不同基于 Alpha 的计算机之间发生变化的所有内容。
A Look Back: ISA
回顾:ISA
The ISA bus is quite old in design and is a notoriously poor performer, but it still holds a good part of the market for extension devices. If speed is not important and you want to support old motherboards, an ISA implementation is preferable to PCI. An additional advantage of this old standard is that if you are an electronic hobbyist, you can easily build your own ISA devices, something definitely not possible with PCI.
ISA 总线设计相当陈旧,性能也很差,但它仍然占据了扩展设备市场的很大一部分。如果速度不重要,且你想支持旧主板,那么 ISA 实现比 PCI 更可取。此外,这一旧标准的另一个优点是,如果你是一个电子爱好者,你可以很容易地制作自己的 ISA 设备,这在 PCI 中肯定是不可能的。
On the other hand, a great disadvantage of ISA is that it's tightly bound to the PC architecture; the interface bus has all the limitations of the 80286 processor and causes endless pain to system programmers. The other great problem with the ISA design (inherited from the original IBM PC) is the lack of geographical addressing, which has led to many problems and lengthy unplug-rejumper-plug-test cycles to add new devices. It's interesting to note that even the oldest Apple II computers were already exploiting geographical addressing, and they featured jumperless expansion boards.
另一方面,ISA 的一个巨大缺点是它紧密绑定于 PC 架构;接口总线具有 80286 处理器的所有限制,并给系统程序员带来了无尽的痛苦。ISA 设计的另一个大问题是(从最初的 IBM PC 继承而来)缺乏地理寻址,这导致了许多问题,并且为了添加新设备,需要经历漫长的拔插-重新跳线-插回-测试周期。有趣的是,即使是最早的 Apple II 计算机也已经利用了地理寻址,并且它们的扩展板是无跳线的。
Despite its great disadvantages, ISA is still used in several unexpected places. For example, the VR41xx series of MIPS processors used in several palmtops features an ISA-compatible expansion bus, strange as it seems. The reason behind these unexpected uses of ISA is the extreme low cost of some legacy hardware, such as 8390-based Ethernet cards, so a CPU with ISA electrical signaling can easily exploit the awful, but cheap, PC devices.
尽管存在诸多缺点,ISA 仍然被用于一些意想不到的地方。例如,用于多个掌上电脑的 MIPS 处理器的 VR41xx 系列具有一个与 ISA 兼容的扩展总线,这似乎有些奇怪。这些对 ISA 的意外使用背后的原因是一些旧硬件的极低成本,例如基于 8390 的以太网卡,因此具有 ISA 电气信号的 CPU 可以轻松利用这些糟糕但便宜的 PC 设备。
Hardware Resources
硬件资源
An ISA device can be equipped with I/O ports, memory areas, and interrupt lines.
ISA 设备可以配备 I/O 端口、内存区域和中断线。
Even though the x86 processors support 64 KB of I/O port memory (i.e., the processor asserts 16 address lines), some old PC hardware decodes only the lowest 10 address lines. This limits the usable address space to 1024 ports, because any address in the range 1 KB to 64 KB is mistaken for a low address by any device that decodes only the low address lines. Some peripherals circumvent this limitation by mapping only one port into the low kilobyte and using the high address lines to select between different device registers. For example, a device mapped at 0x340
can safely use port 0x740
, 0xB40
, and so on.
尽管 x86 处理器支持 64 KB 的 I/O 端口内存(即,处理器断言 16 个地址线),但某些旧的 PC 硬件仅解码最低的 10 个地址线。这将可用地址空间限制为 1024 个端口,因为任何在 1 KB 到 64 KB 范围内的地址都会被仅解码低地址线的设备误认为是低地址。一些外围设备通过将仅一个端口映射到低千字节,并使用高地址线选择不同的设备寄存器来绕过这一限制。例如,映射在 0x340
的设备可以安全地使用端口 0x740
、0xB40
等。
If the availability of I/O ports is limited, memory access is still worse. An ISA device can use only the memory range between 640 KB and 1 MB and between 15 MB and 16 MB for I/O register and device control. The 640-KB to 1-MB range is used by the PC BIOS, by VGA-compatible video boards, and by various other devices, leaving little space available for new devices. Memory at 15 MB, on the other hand, is not directly supported by Linux, and hacking the kernel to support it is a waste of programming time nowadays.
如果 I/O 端口的可用性有限,那么内存访问的情况更糟。ISA 设备只能使用 640 KB 到 1 MB 和 15 MB 到 16 MB 之间的内存范围用于 I/O 寄存器和设备控制。640 KB 到 1 MB 的范围被 PC BIOS、VGA 兼容视频板和各种其他设备使用,留给新设备的空间很少。另一方面,Linux 不直接支持 15 MB 的内存,如今试图修改内核来支持它是一种浪费编程时间的行为。
The third resource available to ISA device boards is interrupt lines. A limited number of interrupt lines is routed to the ISA bus, and they are shared by all the interface boards. As a result, if devices aren't properly configured, they can find themselves using the same interrupt lines.
ISA 设备板可用的第三种资源是中断线。有限数量的中断线连接到 ISA 总线,并且由所有接口板共享。因此,如果设备没有正确配置,它们可能会发现自己使用了相同的中断线。
Although the original ISA specification doesn't allow interrupt sharing across devices, most device boards allow it.[5] Interrupt sharing at the software level is described in Chapter 10.
尽管最初的 ISA 规范不允许设备之间共享中断,但大多数设备板允许这样做。[5] 软件级别的中断共享在 Chapter 10 中有描述。
ISA Programming
ISA 编程
As far as programming is concerned, there's no specific aid in the kernel or the BIOS to ease access to ISA devices (like there is, for example, for PCI). The only facilities you can use are the registries of I/O ports and IRQ lines, described in Section 10.2.
就编程而言,内核或 BIOS 中没有任何特定的辅助功能可以方便地访问 ISA 设备(例如,对于 PCI 就有)。您可以使用的唯一设施是 I/O 端口和 IRQ 线的注册表,这些在 Section 10.2 中有描述。
The programming techniques shown throughout the first part of this book apply to ISA devices; the driver can probe for I/O ports, and the interrupt line must be autodetected with one of the techniques shown in Section 10.2.2.
本书第一部分展示的编程技术适用于 ISA 设备;驱动程序可以探测 I/O 端口,中断线必须使用 Section 10.2.2 中展示的技术之一自动检测。
The helper functions isa_readb and friends have been briefly introduced in Chapter 9, and there's nothing more to say about them.
辅助函数 isa_readb 及其相关函数已在 Chapter 9 中简要介绍,关于它们没有更多要说的了。
The Plug-and-Play Specification
即插即用规范
Some new ISA device boards follow peculiar design rules and require a special initialization sequence intended to simplify installation and configuration of add-on interface boards. The specification for the design of these boards is called plug and play (PnP) and consists of a cumbersome rule set for building and configuring jumperless ISA devices. PnP devices implement relocatable I/O regions; the PC's BIOS is responsible for the relocation---reminiscent of PCI.
一些新的 ISA 设备板遵循特殊的设计规则,并需要一个特殊的初始化序列,旨在简化附加接口板的安装和配置。这些板卡的设计规范被称为 即插即用(PnP),它包含了一套繁琐的规则,用于构建和配置无跳线的 ISA 设备。PnP 设备实现了可重新定位的 I/O 区域;PC 的 BIOS 负责重新定位------这让人想起了 PCI。
In short, the goal of PnP is to obtain the same flexibility found in PCI devices without changing the underlying electrical interface (the ISA bus). To this end, the specs define a set of device-independent configuration registers and a way to geographically address the interface boards, even though the physical bus doesn't carry per-board (geographical) wiring---every ISA signal line connects to every available slot.
简而言之,PnP 的目标是在不改变底层电气接口(ISA 总线)的情况下,获得与 PCI 设备相同的灵活性。为此,规范定义了一组与设备无关的配置寄存器,以及一种对接口板进行地理寻址的方法,尽管物理总线没有携带每块板(地理)的布线------每条 ISA 信号线都连接到每个可用的插槽。
Geographical addressing works by assigning a small integer, called the card select number (CSN), to each PnP peripheral in the computer. Each PnP device features a unique serial identifier, 64 bits wide, that is hardwired into the peripheral board. CSN assignment uses the unique serial number to identify the PnP devices. But the CSNs can be assigned safely only at boot time, which requires the BIOS to be PnP aware. For this reason, old computers require the user to obtain and insert a specific configuration diskette, even if the device is PnP capable.
地理寻址通过为计算机中的每个 PnP 外围设备分配一个小整数,称为 卡选择号(CSN)来工作。每个 PnP 设备都有一个唯一的序列标识符,宽度为 64 位,它被硬连接到外围设备板中。CSN 分配使用唯一的序列号来识别 PnP 设备。但是,只有在启动时才能安全地分配 CSNs,这要求 BIOS 支持 PnP。因此,旧计算机需要用户获取并插入一个特定的配置软盘,即使设备支持 PnP 也是如此。
Interface boards following the PnP specs are complicated at the hardware level. They are much more elaborate than PCI boards and require complex software. It's not unusual to have difficulty installing these devices, and even if the installation goes well, you still face the performance constraints and the limited I/O space of the ISA bus. It's much better to install PCI devices whenever possible and enjoy the new technology instead.
遵循 PnP 规范的接口板在硬件层面很复杂。它们比 PCI 板复杂得多,需要复杂的软件。安装这些设备时遇到困难并不罕见,即使安装顺利,您仍然会面临 ISA 总线的性能限制和有限的 I/O 空间。如果可能的话,最好安装 PCI 设备并享受新技术。
If you are interested in the PnP configuration software, you can browse drivers/net/3c509.c , whose probing function deals with PnP devices. The 2.6 kernel saw a lot of work in the PnP device support area, so a lot of the inflexible interfaces have been cleaned up compared to previous kernel releases.
如果您对 PnP 配置软件感兴趣,可以查看 drivers/net/3c509.c,其探测函数处理 PnP 设备。2.6 内核在 PnP 设备支持方面做了大量工作,与之前的内核版本相比,许多不灵活的接口已经被清理。
PC/104 and PC/104+
PC/104 和 PC/104+
Currently in the industrial world, two bus architectures are quite fashionable: PC/104 and PC/104+. Both are standard in PC-class single-board computers.
目前在工业领域,有两种总线架构相当流行:PC/104 和 PC/104+。两者都是 PC 类单板计算机的标准。
Both standards refer to specific form factors for printed circuit boards, as well as electrical/mechanical specifications for board interconnections. The practical advantage of these buses is that they allow circuit boards to be stacked vertically using a plug-and-socket kind of connector on one side of the device.
这两个标准都涉及印刷电路板的特定外形尺寸,以及板间连接的电气/机械规范。这些总线的实际优势在于,它们允许使用设备一侧的插头和插座式连接器将电路板垂直堆叠。
The electrical and logical layout of the two buses is identical to ISA (PC/104) and PCI (PC/104+), so software won't notice any difference between the usual desktop buses and these two.
这两种总线的电气和逻辑布局分别与 ISA(PC/104)和 PCI(PC/104+)相同,因此软件不会察觉到这两种总线与普通桌面总线之间的任何差异。
Other PC Buses
其他 PC 总线
PCI and ISA are the most commonly used peripheral interfaces in the PC world, but they aren't the only ones. Here's a summary of the features of other buses found in the PC market.
PCI 和 ISA 是 PC 世界中最常用的外围接口,但它们并不是唯一的。以下是 PC 市场上其他总线的特点总结。
MCA
微通道架构
Micro Channel Architecture (MCA) is an IBM standard used in PS/2 computers and some laptops. At the hardware level, Micro Channel has more features than ISA. It supports multimaster DMA, 32-bit address and data lines, shared interrupt lines, and geographical addressing to access per-board configuration registers. Such registers are called Programmable Option Select (POS), but they don't have all the features of the PCI registers. Linux support for Micro Channel includes functions that are exported to modules.
微通道架构(MCA)是 IBM 的一个标准,用于 PS/2 计算机和某些笔记本电脑。在硬件层面,微通道比 ISA 具有更多的功能。它支持多主 DMA、32 位地址和数据线、共享中断线以及用于访问每块板的配置寄存器的地理寻址。这些寄存器被称为 可编程选项选择(POS),但它们并不具备 PCI 寄存器的所有功能。Linux 对微通道的支持包括导出到模块的函数。
A device driver can read the integer value MCA_bus
to see if it is running on a Micro Channel computer. If the symbol is a preprocessor macro, the macro MCA_bus_ _is_a_macro
is defined as well. If MCA_bus_ _is_a_macro
is undefined, then MCA_bus
is an integer variable exported to modularized code. Both MCA_BUS
and MCA_bus_ _is_a_macro
are defined in <asm/processor.h> .
设备驱动程序可以读取整数值 MCA_bus
来判断是否运行在微通道计算机上。如果该符号是一个预处理器宏,则宏 MCA_bus_ _is_a_macro
也会被定义。如果 MCA_bus_ _is_a_macro
未定义,那么 MCA_bus
是导出到模块化代码的整数变量。MCA_BUS
和 MCA_bus_ _is_a_macro
都在 <asm/processor.h> 中定义。
EISA
扩充型工业标准架构
The Extended ISA (EISA) bus is a 32-bit extension to ISA, with a compatible interface connector; ISA device boards can be plugged into an EISA connector. The additional wires are routed under the ISA contacts.
扩展 ISA(EISA)总线是 ISA 的一个 32 位扩展,具有兼容的接口连接器;ISA 设备板可以插入 EISA 连接器。额外的电线在 ISA 接触点下方布线。
Like PCI and MCA, the EISA bus is designed to host jumperless devices, and it has the same features as MCA: 32-bit address and data lines, multimaster DMA, and shared interrupt lines. EISA devices are configured by software, but they don't need any particular operating system support. EISA drivers already exist in the Linux kernel for Ethernet devices and SCSI controllers.
与 PCI 和 MCA 一样,EISA 总线旨在托管无跳线设备,并且它具有与 MCA 相同的特性:32 位地址和数据线、多主 DMA 和共享中断线。EISA 设备通过软件进行配置,但它们不需要任何特定操作系统的支持。Linux 内核已经存在用于以太网设备和 SCSI 控制器的 EISA 驱动程序。
An EISA driver checks the value EISA_bus
to determine if the host computer carries an EISA bus. Like MCA_bus
, EISA_bus
is either a macro or a variable, depending on whether EISA_bus_ _is_a_macro
is defined. Both symbols are defined in <asm/processor.h> .
EISA 驱动程序会检查值 EISA_bus
,以确定宿主计算机是否带有 EISA 总线。与 MCA_bus
一样,EISA_bus
是一个宏还是一个变量,取决于是否定义了 EISA_bus_ _is_a_macro
。这两个符号都在 <asm/processor.h> 中定义。
The kernel has full EISA support for devices with sysfs and resource management functionality. This is located in the drivers/eisa directory.
内核对带有 sysfs 和资源管理功能的 EISA 设备提供了完整的支持。这些功能位于 drivers/eisa 目录中。
VLB
VESA 局部总线
Another extension to ISA is the VESA Local Bus (VLB) interface bus, which extends the ISA connectors by adding a third lengthwise slot. A device can just plug into this extra connector (without plugging in the two associated ISA connectors), because the VLB slot duplicates all important signals from the ISA connectors. Such "standalone" VLB peripherals not using the ISA slot are rare, because most devices need to reach the back panel so that their external connectors are available.
ISA 的另一个扩展是 VESA 本地总线(VLB)接口总线,它通过增加第三个纵向插槽来扩展 ISA 连接器。设备可以直接插入这个额外的连接器(而无需插入两个相关的 ISA 连接器),因为 VLB 插槽复制了 ISA 连接器的所有重要信号。不使用 ISA 插槽的"独立"VLB 外围设备很少见,因为大多数设备都需要到达后挡板,以便其外部连接器可用。
The VESA bus is much more limited in its capabilities than the EISA, MCA, and PCI buses and is disappearing from the market. No special kernel support exists for VLB. However, both the Lance Ethernet driver and the IDE disk driver in Linux 2.0 can deal with VLB versions of their devices.
与 EISA、MCA 和 PCI 总线相比,VESA 总线的能力要有限得多,并且正在从市场上消失。内核对 VLB 没有特殊的支持。然而,Linux 2.0 中的 Lance 以太网驱动程序和 IDE 磁盘驱动程序都可以处理其设备的 VLB 版本。
SBus
While most computers nowadays are equipped with a PCI or ISA interface bus, most older SPARC-based workstations use SBus to connect their peripherals.
虽然如今大多数计算机都配备了 PCI 或 ISA 接口总线,但大多数较旧的基于 SPARC 的工作站使用 SBus 来连接其外围设备。
SBus is quite an advanced design, although it has been around for a long time. It is meant to be processor independent (even though only SPARC computers use it) and is optimized for I/O peripheral boards. In other words, you can't plug additional RAM into SBus slots (RAM expansion boards have long been forgotten even in the ISA world, and PCI does not support them either). This optimization is meant to simplify the design of both hardware devices and system software, at the expense of some additional complexity in the motherboard.
SBus 是一种相当先进的设计,尽管它已经存在很长时间了。它旨在成为处理器无关的(尽管只有 SPARC 计算机使用它),并且针对 I/O 外围设备板进行了优化。换句话说,您不能将额外的 RAM 插入 SBus 插槽(即使在 ISA 领域,RAM 扩展板也早已被遗忘,PCI 也不支持它们)。这种优化旨在简化硬件设备和系统软件的设计,尽管这可能会使主板增加一些额外的复杂性。
This I/O bias of the bus results in peripherals using virtual addresses to transfer data, thus bypassing the need to allocate a contiguous DMA buffer. The motherboard is responsible for decoding the virtual addresses and mapping them to physical addresses. This requires attaching an MMU (memory management unit) to the bus; the chipset in charge of the task is called IOMMU. Although somehow more complex than using physical addresses on the interface bus, this design is greatly simplified by the fact that SPARC processors have always been designed by keeping the MMU core separate from the CPU core (either physically or at least conceptually). Actually, this design choice is shared by other smart processor designs and is beneficial overall. Another feature of this bus is that device boards exploit massive geographical addressing, so there's no need to implement an address decoder in every peripheral or to deal with address conflicts.
这种总线的 I/O 倾向导致外围设备使用 虚拟 地址来传输数据,从而避免了分配连续 DMA 缓冲区的需要。主板负责解码虚拟地址并将它们映射到物理地址。这需要将一个 MMU(内存管理单元)连接到总线上;负责这项任务的芯片组称为 IOMMU。尽管这种设计比在接口总线上使用物理地址要复杂一些,但由于 SPARC 处理器始终被设计为将 MMU 核心与 CPU 核心分开(无论是物理上还是至少概念上),这种设计得到了极大的简化。实际上,这种设计选择被其他一些智能处理器设计所共享,并且总体上是有益的。这种总线的另一个特点是设备板利用了大量的地理寻址,因此无需在每个外围设备中实现地址解码器,也无需处理地址冲突。
SBus peripherals use the Forth language in their PROMs to initialize themselves. Forth was chosen because the interpreter is lightweight and, therefore, can be easily implemented in the firmware of any computer system. In addition, the SBus specification outlines the boot process, so that compliant I/O devices fit easily into the system and are recognized at system boot. This was a great step to support multi-platform devices; it's a completely different world from the PC-centric ISA stuff we were used to. However, it didn't succeed for a variety of commercial reasons.
SBus 外围设备在其 PROM 中使用 Forth 语言来初始化自身。选择 Forth 是因为它的解释器很轻量级,因此可以轻松地实现在任何计算机系统的固件中。此外,SBus 规范概述了启动过程,以便符合规范的 I/O 设备能够轻松地融入系统并在系统启动时被识别。这是支持多平台设备的一个重要举措;它与我们习惯的以 PC 为中心的 ISA 设备完全不同。然而,由于各种商业原因,它并未成功。
Although current kernel versions offer quite full-featured support for SBus devices, the bus is used so little nowadays that it's not worth covering in detail here. Interested readers can look at source files in arch/sparc/kernel and arch/sparc/mm .
尽管当前的内核版本为 SBus 设备提供了相当完整的支持,但如今这种总线的使用如此之少,以至于在这里详细讨论并不值得。有兴趣的读者可以查看 arch/sparc/kernel 和 arch/sparc/mm 中的源文件。
NuBus
Another interesting, but nearly forgotten, interface bus is NuBus. It is found on older Mac computers (those with the M68k family of CPUs).
另一个有趣但几乎被遗忘的接口总线是 NuBus。它出现在较旧的 Mac 电脑上(那些使用 M68k 系列 CPU 的电脑)。
All of the bus is memory-mapped (like everything with the M68k), and the devices are only geographically addressed. This is good and typical of Apple, as the much older Apple II already had a similar bus layout. What is bad is that it's almost impossible to find documentation on NuBus, due to the close-everything policy Apple has always followed with its Mac computers (and unlike the previous Apple II, whose source code and schematics were available at little cost).
整个总线都是内存映射的(就像 M68k 的一切),并且设备仅通过地理寻址。这很好,也很符合苹果的风格,因为更早的 Apple II 已经有了类似的总线布局。糟糕的是,由于苹果公司一直以来对其 Mac 电脑采取的封闭政策,几乎不可能找到关于 NuBus 的文档(这与之前的 Apple II 不同,其源代码和原理图几乎可以免费获取)。
The file drivers/nubus/nubus.c includes almost everything we know about this bus, and it's interesting reading; it shows how much hard reverse engineering developers had to do.
文件 drivers/nubus/nubus.c 包含了我们几乎知道的关于这种总线的所有内容,值得一读;它展示了开发人员不得不进行的艰难的逆向工程工作。
External Buses
外部总线
One of the most recent entries in the field of interface buses is the whole class of external buses. This includes USB, FireWire, and IEEE1284 (parallel-port-based external bus). These interfaces are somewhat similar to older and not-so-external technology, such as PCMCIA/CardBus and even SCSI.
在接口总线领域中,最近的一个新类别是整个外部总线类别。这包括 USB、FireWire 和 IEEE1284(基于并行端口的外部总线)。这些接口与较旧的、不太外部的技术(如 PCMCIA/CardBus,甚至是 SCSI)有些相似。
Conceptually, these buses are neither full-featured interface buses (like PCI is) nor dumb communication channels (like the serial ports are). It's hard to classify the software that is needed to exploit their features, as it's usually split into two levels: the driver for the hardware controller (like drivers for PCI SCSI adaptors or PCI controllers introduced in the Section 12.1) and the driver for the specific "client" device (like sd.c handles generic SCSI disks and so-called PCI drivers deal with cards plugged in the bus).
从概念上讲,这些总线既不是功能完备的接口总线(像 PCI 那样),也不是简单的通信通道(像串行端口那样)。很难对利用其功能所需的软件进行分类,因为它们通常被分为两个层次:硬件控制器的驱动程序(如 Section 12.1 中介绍的 PCI SCSI 适配器或 PCI 控制器的驱动程序)和特定"客户端"设备的驱动程序(如 sd.c 处理通用 SCSI 磁盘,所谓的 PCI 驱动程序处理插入总线的卡)。
Quick Reference
快速参考
This section summarizes the symbols introduced in the chapter:
本节总结了本章介绍的符号:
-
#include <linux/pci.h>
- Header that includes symbolic names for the PCI registers and several vendor and device ID values.
- 包含 PCI 寄存器的符号名称以及多个供应商和设备 ID 值的头文件。
-
struct pci_dev;
- Structure that represents a PCI device within the kernel.
- 在内核中表示 PCI 设备的结构。
-
struct pci_driver;
- Structure that represents a PCI driver. All PCI drivers must define this.
- 表示 PCI 驱动程序的结构。所有 PCI 驱动程序都必须定义它。
-
struct pci_device_id;
- Structure that describes the types of PCI devices this driver supports.
- 描述该驱动程序支持的 PCI 设备类型的结构。
-
int pci_register_driver(struct pci_driver *drv);
int pci_module_init(struct pci_driver *drv);
void pci_unregister_driver(struct pci_driver *drv);
- Functions that register or unregister a PCI driver from the kernel.
- 用于在内核中注册或注销 PCI 驱动程序的函数。
-
struct pci_dev *pci_find_device(unsigned int vendor, unsigned int device
,struct pci_dev *from);
struct pci_dev *pci_find_device_reverse(unsigned int vendor, unsigned int
device, const struct pci_dev *from);
struct pci_dev *pci_find_subsys (unsigned int vendor, unsigned int device
,unsigned int ss_vendor, unsigned int ss_device, const struct pci_dev *from);
struct pci_dev *pci_find_class(unsigned int class, struct pci_dev *from);
- Functions that search the device list for devices with a specific signature or those belonging to a specific class. The return value is
NULL
if none is found.from
is used to continue a search; it must beNULL
the first time you call either function, and it must point to the device just found if you are searching for more devices. These functions are not recommended to be used, use thepci_get_
variants instead. - 搜索设备列表以查找具有特定签名或属于特定类别的设备的函数。如果未找到任何设备,则返回值为
NULL
。from
用于继续搜索;首次调用这些函数时,from
必须为NULL
,并且如果要继续搜索更多设备,它必须指向刚刚找到的设备。不建议使用这些函数,建议使用pci_get_
变体。
- Functions that search the device list for devices with a specific signature or those belonging to a specific class. The return value is
-
struct pci_dev *pci_get_device(unsigned int vendor, unsigned int device
,struct pci_dev *from);
struct pci_dev *pci_get_subsys(unsigned int vendor, unsigned int device
,unsigned int ss_vendor, unsigned int ss_device, struct pci_dev *from);
struct pci_dev *pci_get_slot(struct pci_bus *bus, unsigned int devfn);
- Functions that search the device list for devices with a specific signature or belonging to a specific class. The return value is
NULL
if none is found.from
is used to continue a search; it must beNULL
the first time you call either function, and it must point to the device just found if you are searching for more devices. The structure returned has its reference count incremented, and after the caller is finished with it, the function pci_dev_put must be called. - 搜索设备列表以查找具有特定签名或属于特定类别的设备的函数。如果未找到任何设备,则返回值为
NULL
。from
用于继续搜索;首次调用这些函数时,from
必须为NULL
,并且如果要继续搜索更多设备,它必须指向刚刚找到的设备。返回的结构会增加其引用计数,调用者使用完毕后,必须调用函数 pci_dev_put。
- Functions that search the device list for devices with a specific signature or belonging to a specific class. The return value is
-
int pci_read_config_byte(struct pci_dev *dev, int where, u8 *val);
int pci_read_config_word(struct pci_dev *dev, int where, u16 *val);
int pci_read_config_dword(struct pci_dev *dev, int where, u32 *val);
int pci_write_config_byte (struct pci_dev *dev, int where, u8 *val);
int pci_write_config_word (struct pci_dev *dev, int where, u16 *val);
int pci_write_config_dword (struct pci_dev *dev, int where, u32 *val);
- Functions that read or write a PCI configuration register. Although the Linux kernel takes care of byte ordering, the programmer must be careful about byte ordering when assembling multibyte values from individual bytes. The PCI bus is little-endian.
- 用于读取或写入 PCI 配置寄存器的函数。尽管 Linux 内核会处理字节顺序,但在从单个字节组装多字节值时,程序员必须小心字节顺序。PCI 总线是小端模式。
-
int pci_enable_device(struct pci_dev *dev);
- Enables a PCI device.
- 启用一个 PCI 设备。
-
unsigned long pci_resource_start(struct pci_dev *dev, int bar);
unsigned long pci_resource_end(struct pci_dev *dev, int bar);
unsigned long pci_resource_flags(struct pci_dev *dev, int bar);
- Functions that handle PCI device resources.
- 处理 PCI 设备资源的函数。
Writing a PCI device driver for Linux
Oleg Kutkov / January 7, 2021
In this article, I want to discuss some basics of the Linux PCI/PCIe drivers development. I think this issue is not properly covered, and some existing information is might be outdated.
在本文中,我想讨论一些关于 Linux PCI/PCIe 驱动程序开发的基础知识。我认为这个问题没有得到充分的覆盖,并且一些现有的信息可能已经过时了。
I will show basic concepts and important structures, and this is might be a good beginner guide for newbie driver developers.
我将介绍一些基本概念和重要的结构,这可能是一个适合新手驱动程序开发者的入门指南。
The PCI bus is the most popular way to connect high-speed peripheral inside a modern computer system. It's a video and network adapters, sound cards, storage devices, etc. Some custom and special devices, some acquisition boards with ADC, or any other interface might be custom and special devices. Even your modern laptop uses this bus to connect internal devices to the CPU, even without actual physical connectors.
PCI 总线是现代计算机系统中连接高速外设的最流行方式,例如视频和网络适配器、声卡、存储设备等。一些定制和特殊的设备,例如带有 ADC 的采集卡或其他接口,也可能是定制和特殊的设备。即使是现代笔记本电脑也使用这种总线将内部设备连接到 CPU,即使没有实际的物理连接器。
This bus is widely available on a different platforms, like x86 and ARM. These days, it's quite common to use a PCI bus to connect a high-performance wireless chip to the SoC inside WiFi routers.
这种总线在不同的平台上广泛使用,例如 x86 和 ARM。如今,使用 PCI 总线将高性能无线芯片连接到 WiFi 路由器中的 SoC 已经相当常见。

PCI and PCI Express
PCI 与 PCI Express
The original PCI bus was parallel with a lot of contacts and is currently obsolete. I will not focus on the obsolete PCI bus.
最初的 PCI 总线是并行的,有大量的连接点,目前已经过时。我不会关注这种过时的 PCI 总线。
Modern and faster PCIe bus uses single or multiple (1 - 16) pairs of differential wires (lanes, one pair for TX, and one for RX). You can tell the number of differential lines by the bus name, x1, x4, and x16. More lanes give a bigger throughput. Another difference between the PCI Express bus and the older PCI is the bus topology; PCI uses a shared parallel bus architecture. PCI Express is based on point - to - point topology, with separate serial links connecting every device to the root complex controller that can be integrated into the CPU. The PCI host and all devices share a common set of address, data, and control lines. You can read an excellent architecture explanation in this Wikipedia article.
现代且速度更快的 PCIe 总线使用单个或多个(1 - 16)对差分线(通道,一对用于 TX,一对用于 RX)。您可以通过总线名称(如 x1、x4 和 x16)来判断差分线的数量。通道越多,吞吐量越大。PCI Express 总线与旧的 PCI 总线之间的另一个区别是总线拓扑结构;PCI 使用共享并行总线架构,而 PCI Express 基于点对点拓扑,通过单独的串行链路将每个设备连接到可以集成到 CPU 中的根复合体控制器。PCI 主机和所有设备共享一组公共的地址、数据和控制线。您可以在 这个 Wikipedia 文章中阅读到一个优秀的架构解释。
From the typical driver's point of view, there is no difference between PCI and PCI Express. All differences are handled by the hardware and lower bus drivers of the Linux kernel. For the driver developer, API is the same.
从典型驱动程序的角度来看,PCI 和 PCI Express 之间没有区别。所有区别都由硬件和 Linux 内核的低级总线驱动程序处理。对于驱动程序开发者来说,API 是相同的。
Linux PCI subsystem
Linux PCI 子系统
The operating system PCI subsystem reflects the actual hardware configuration and interconnections. There might be multiple PCI buses and multiple devices on those buses. Every bus and device is assigned a unique number, which allows identifying each module. Also, a PCI device might have different "functions" or "endpoints." All those endpoints are also numbered. The full system path to the device might look like this: ::
操作系统中的 PCI 子系统反映了实际的硬件配置和连接关系。可能存在多个 PCI 总线以及这些总线上的多个设备。每个总线和设备都被分配了一个唯一的编号,用于识别每个模块。此外,PCI 设备可能具有不同的"功能"或"端点",这些端点也被编号。设备的完整系统路径可能如下所示:<总线编号>:<设备编号>:<功能编号>
Additionally, every PCI device contains factory - programmed Vendor and Device IDs. These IDs are also unique and assigned by the PCI regulatory consortium. The Linux kernel can properly identify a device and load the proper driver using these IDs. Of course, every driver should have ID verification routines.
此外,每个 PCI 设备都包含工厂编程的供应商 ID 和设备 ID。这些 ID 也是唯一的,由 PCI 管理协会 分配。Linux 内核可以利用这些 ID 正确识别设备并加载相应的驱动程序。当然,每个驱动程序都应具备 ID 验证功能。
The primary userspace utility is lspci
This command can show a lot of useful information. Run this command with "-nn" argument to get all devices with IDs.
主要的用户空间工具是 lspci
,该命令可以显示大量有用信息。使用"-nn"参数运行该命令,可以获取所有带有 ID 的设备信息。
 You can see many internal PCIe devices here, bridges, USB controllers, Audio and Network controllers, etc. All this information can be obtained manually from the **sysfs**: 您可以在此处看到许多内部 PCIe 设备,如桥接器、USB 控制器、音频控制器和网络控制器等。所有这些信息也可以手动从 **sysfs** 中获取: > ls -la /sys/bus/pci/devices > > lrwxrwxrwx 1 root root 0 Dec 21 14:05 0000:00:00.0 -\> .../.../.../devices/pci0000:00/0000:00:00.0 > > lrwxrwxrwx 1 root root 0 Dec 21 14:05 0000:00:00.2 -\> .../.../.../devices/pci0000:00/0000:00:00.2 > > lrwxrwxrwx 1 root root 0 Dec 21 14:05 0000:00:01.0 -\> .../.../.../devices/pci0000:00/0000:00:01.0 > > lrwxrwxrwx 1 root root 0 Dec 21 14:05 0000:00:01.2 -\> .../.../.../devices/pci0000:00/0000:00:01.2 > > ... The human - readable strings are not taken from the hardware. This is a local database of the `lspci`: **/usr/share/hwdata/pci.id** You can always find the latest PCI ID database here: The PCI ID Repositor[y](https://pci-ids.ucw.cz/) Or you can check the Vendor ID here: [Member Companies \| PCI-SIG](https://pcisig.com/membership/member-companies) 这些可读的字符串并非来自硬件,而是来自 `lspci` 的本地数据库:**/usr/share/hwdata/pci.id**。您始终可以在以下位置找到最新的 PCI ID 数据库:PCI ID 数据库。您也可以在此处查看供应商 ID:PCI-SIG 会员公司 The Linux kernel assigns special memory regions, "Base Address Registers" (BARs), to communicate with the hardware. These memory addresses (and region length) are written to the PCI controller hardware during the system boot. You can find something like this In `dmesg`: Linux 内核分配了特殊的内存区域,"基地址寄存器"(BARs),用于与硬件通信。这些内存地址(以及区域长度)在系统启动时写入 PCI 控制器硬件。您可以在 `dmesg` 中找到类似以下内容: > \[ 0.959296\] pci_bus 0001:00: root bus resource \[bus 00 - ff
0.964853\] pci_bus 0001:00: root bus resource \[io 0x10000 - 0x1ffff\] (bus address \[0x0000 - 0xffff\]) \[ 0.973943\] pci_bus 0001:00: root bus resource \[mem 0x4840000000 - 0x487fffffff\] (bus address \[0x40000000 - 0x7fffffff\]) \[ 0.999755\] pci 0001:00:00.0: **BAR** 14: assigned \[mem 0x4840000000 - 0x48402fffff
1.007107\] pci 0001:00:00.0: **BAR** 6: assigned \[mem 0x4840300000 - 0x48403007ff pref
1.014769\] pci 0001:01:00.0: **BAR** 0: assigned \[mem 0x4840000000 - 0x48401fffff 64bit
1.022579\] pci 0001:01:00.0: **BAR** 6: assigned \[mem 0x4840200000 - 0x484020ffff pref
1.030265\] pci 0001:00:00.0: PCI bridge to \[bus 01 - ff
1.035563\] pci 0001:00:00.0: bridge window \[mem 0x4840000000 - 0x48402fffff
There is no way to determine installed PCI hardware. So the bus must be enumerated. Bus enumeration is performed by attempting to read the vendor ID and device ID (VID/DID) register for each combination of the bus number and device number at the device's function #0.
无法确定已安装的 PCI 硬件,因此必须对总线进行枚举。总线枚举是通过尝试读取每个总线编号和设备编号组合的设备功能 #0 处的供应商 ID 和设备 ID(VID/DID)寄存器来完成的。
The kernel can call the corresponding driver during the enumeration stage with a compatible VID/PID pair. Some devices (like PCI bridges) might be statically described in the device tree in an embedded system. The static hardware configuration is supported with "platform drivers".
在枚举阶段,内核可以根据兼容的 VID/PID 对调用相应的驱动程序。某些设备(如 PCI 桥接器)可能在嵌入式系统中的设备树中静态描述。静态硬件配置由"平台驱动程序"支持。
Every PCI compliant device should implement a basic set of register -- configuration registers. The Linux kernel attempts to read these registers to identify and properly configure the device. All these registers are mapped to the memory and available for the driver developer for reading and writing.
每个符合 PCI 规范的设备都应实现一组基本寄存器------配置寄存器。Linux 内核尝试读取这些寄存器以识别和正确配置设备。所有这些寄存器都映射到内存中,可供驱动程序开发者读取和写入。
The first 64 bytes of the registers are mandatory and should be implemented (by the hardware vendor) in any case.
无论如何,硬件供应商都必须实现寄存器的前 64 字节。

The optional registers may contain zero values if there is nothing to provide from the hardware.
如果硬件没有提供任何内容,则可选寄存器可以包含零值。
Please note that byte order is always little - endian. This might be important if you are working on some big - endian system.
请注意,字节顺序始终是小端模式。如果您正在处理某些大端模式的系统,这可能很重要。
Let's dig into some registers deeply.
让我们深入研究一些寄存器。
Vendor ID and Device ID are already well known and should contain valid identifiers of the hardware vendor.
供应商 ID 和 设备 ID 已广为人知,应包含硬件供应商的有效标识符。
Command registers define some capabilities. The operating system initializes these bits.
命令 寄存器定义了一些功能。操作系统会初始化这些位。

Status register holds different events of the PCI bus and is filled by the hardware.
状态 寄存器保存了 PCI 总线的不同事件,由硬件填充。

Class code defines a class of the device (Network adapter, for example).
类别代码 定义了设备的类别(例如网络适配器)。
Base Address Registers -- "BAR" registers filled by the Linux kernel and used for the IO operations.
基地址寄存器 -- 由 Linux 内核填充的"BAR"寄存器,用于 I/O 操作。
Subsystem Vendor ID and Subsystem Device ID -- helps to differentiate specific board/device model. This is is optional, of course.
子系统供应商 ID 和 子系统设备 ID -- 有助于区分特定的板卡/设备型号。当然,这是可选的。
The Linux kernel PCI implementation can be found in the kernel source tree drivers/pci directory
. For driver developers kernel provides a header file include/linux/pci.h
. Here you can find all the required structures and functions.
Linux 内核的 PCI 实现在内核源代码树的 drivers/pci
目录中。对于驱动程序开发者,内核提供了一个头文件 include/linux/pci.h
,其中包含了所有必需的结构和函数。
The main PCI driver structure is struct pci_dev
. This is quite a big structure representing an actual device and can be used for the register's access and IO operations. Typically you don't need to remember all fields of the structure, only basic concepts.
主要的 PCI 驱动程序结构是 struct pci_dev
。这是一个相当大的结构,用于表示实际的设备,并可用于寄存器访问和 I/O 操作。通常,您不需要记住该结构的所有字段,只需了解基本概念即可。
PCI driver entry point is struct pci_driver
. The driver developer should initialize this structure (set callbacks) and pass it to the kernel.
PCI 驱动程序的入口点是 struct pci_driver
。驱动程序开发者应初始化此结构(设置回调函数)并将其传递给内核。
c
struct pci_driver {
struct list_head node;
const char *name;
const struct pci_device_id *id_table; /* Must be non - NULL for probe to be called */
int (*probe)(struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */
void (*remove)(struct pci_dev *dev); /* Device removed (NULL if not a hot - plug capable driver) */
int (*suspend)(struct pci_dev *dev, pm_message_t state); /* Device suspended */
int (*resume)(struct pci_dev *dev); /* Device woken up */
void (*shutdown)(struct pci_dev *dev);
int (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
const struct pci_error_handlers *err_handler;
const struct attribute_group **groups;
struct device_driver driver;
struct pci_dynids dynids;
};
The structure field "id_table" should be initialized with the IDs array. Those IDs define compatible Vendor and Product IDs for devices. You can set here multiple pairs of VID/PID if your driver supports multiple devices. For example, declare support of VID = 0F1F + PID = 0F0E, and VID = 0F2F + PID = 0F0D:
结构字段"id_table"应使用 ID 数组初始化。这些 ID 定义了设备兼容的供应商 ID 和产品 ID。如果您的驱动程序支持多个设备,可以在此处设置多个 VID/PID 对。例如,声明支持 VID = 0F1F + PID = 0F0E 和 VID = 0F2F + PID = 0F0D:
c
static struct pci_device_id my_driver_id_table[] = {
{ PCI_DEVICE(0x0F1F, 0x0F0E) },
{ PCI_DEVICE(0x0F2F, 0x0F0D) },
{0,}
};
It's important to end this array with a single zero value.
以单个零值结束此数组非常重要。
Most drivers should export this table using MODULE_DEVICE_TABLE(pci, ...)
.
大多数驱动程序应使用 MODULE_DEVICE_TABLE(pci, ...)
导出此表。
This macro is doing a few important things. If your driver is built - in and compiled with the kernel, then the driver information (device IDs table) will be statically integrated into the global devices table. This allows the kernel to run your driver automatically when compatible hardware is found. If your driver is built as a separate module, then the device table can be extracted with depmod
utility. This information is added to a cache and automatically loads your driver kernel object when compatible hardware is found.
此宏执行了一些重要的操作。如果您的驱动程序是内置的,并且与内核一起编译,则驱动程序信息(设备 ID 表)将静态集成到全局设备表中。这使得内核可以在找到兼容硬件时自动运行您的驱动程序。如果您的驱动程序是作为独立模块构建的,则可以使用 depmod
工具提取设备表。此信息将添加到缓存中,并在找到兼容硬件时自动加载您的驱动程序内核对象。
Other important fields of the struct pci_driver
are:
struct pci_driver
的其他重要字段包括:
.name -- unique driver name, this string will be displayed in /sys/bus/pci/drivers
.name -- 唯一的驱动程序名称,该字符串将显示在 /sys/bus/pci/drivers
中
.probe -- A callback function called by the kernel after the driver registration.
.probe -- 在驱动程序注册后由内核调用的回调函数。
.remove -- A callback function called by the kernel during the driver unloading.
.remove -- 在驱动程序卸载期间由内核调用的回调函数。
.suspend -- A callback function called by kernel when the system is going to suspend mode.
.suspend -- 在系统进入挂起模式时由内核调用的回调函数。
.resume -- A callback function called when the system resumes after the suspend mode.
.resume -- 在系统从挂起模式恢复时调用的回调函数。
Configured pci_driver
should be registered and unregistered during the driver module loading and unloading. This allows the kernel to run your driver.
在驱动程序模块加载和卸载期间,应注册和注销已配置的 pci_driver
。这使得内核可以运行您的驱动程序。
c
pci_register_driver(struct pci_driver *);
pci_unregister_driver(struct pci_driver *);
Device access
设备访问
To access PCI configuration registers kernel provides a set of functions:
为了访问 PCI 配置寄存器,内核提供了一组函数:
c
int pci_read_config_byte(const struct pci_dev *dev, int where, u8 *val);
int pci_read_config_word(const struct pci_dev *dev, int where, u16 *val);
int pci_read_config_dword(const struct pci_dev *dev, int where, u32 *val);
int pci_write_config_byte(const struct pci_dev *dev, int where, u8 val);
int pci_write_config_word(const struct pci_dev *dev, int where, u16 val);
int pci_write_config_dword(const struct pci_dev *dev, int where, u32 val);
You can read and write 8, 16, and 32 - bit data.
您可以读取和写入 8 位、16 位和 32 位数据。
The argument "where" specifies the actual register offset. All accessible values are defined in linux/pci_regs.h
参数"where"指定实际的寄存器偏移量。所有可访问的值都在 linux/pci_regs.h
中定义。
For example, read PCI device Vendor ID and Product ID:
例如,读取 PCI 设备的 供应商 ID 和 产品 ID:
c
#include <linux/pci.h>
u16 vendor, device, revision;
pci_read_config_word(dev, PCI_VENDOR_ID, &vendor);
pci_read_config_word(dev, PCI_DEVICE_ID, &device);
Read the "Interrupt state" of the Status register:
读取 状态 寄存器的"中断状态":
c
#include <linux/pci.h>
u16 status_reg;
pci_read_config_word(dev, PCI_STATUS, &status_reg);
/* Check the bit 3 */
if ((status_reg >> 3) & 0x1) {
printk("Interrupt bit is set\n");
} else {
printk("Interrupt bit is not set\n");
}
Sure, the kernel has many other functions, but we will not discuss them there.
当然,内核还有许多其他函数,但在这里我们不会讨论它们。
Actual device control and data communication is made through the mapped memory (BARs). It's a little bit tricky. Of course, it's just a memory region(s). What to read and write is depends on the actual hardware. It's required to get actual offsets, data types, and "magic" numbers somewhere. Typically this is done through the reverse engineering of the Windows driver. But this is outside the scope of this article. Sometimes hardware vendors are kind enough to share their protocols and specifications.
实际的设备控制和数据通信是通过映射的内存(BARs)完成的。这有点棘手。当然,它只是一个内存区域。读取和写入什么内容取决于实际的硬件。需要从某处获取实际的偏移量、数据类型和"魔法"数字。通常,这是通过逆向工程 Windows 驱动程序完成的。但这超出了本文的范围。有时,硬件供应商会慷慨地分享他们的协议和规格。
To access the device memory, we need to request the memory region, start and stop offsets and map this memory region to some local pointer.
为了访问设备内存,我们需要请求内存区域、起始和结束偏移量,并将该内存区域映射到某个本地指针。
c
#include <linux/pci.h>
int bar;
unsigned long mmio_start, mmio_len;
u8 __iomem *hwmem; /* Memory pointer for the I/O operations */
struct pci_dev *pdev; /* Initialized pci_dev */
...
/* Request the I/O resource */
bar = pci_select_bars(pdev, IORESOURCE_MEM);
/* "enable" device memory */
pci_enable_device_mem(pdev);
/* Request the memory region */
pci_request_region(pdev, bar, "My PCI driver");
/* Get the start and stop memory positions */
mmio_start = pci_resource_start(pdev, 0);
mmio_len = pci_resource_len(pdev, 0);
/* map provided resource to the local memory pointer */
hwmem = ioremap(mmio_start, mmio_len);
Now it's possible to use hwmem
to read and write from/to the device. The only correct way is to use special kernel routines. The data can be read and written in the 8, 16, and 32 - bit chunks.
现在可以使用 hwmem
从设备读取和写入数据。唯一正确的方法是使用特殊的内核例程。数据可以以 8 位、16 位和 32 位块的形式读取和写入。
c
void iowrite8(u8 b, void __iomem *addr);
void iowrite16(u16 b, void __iomem *addr);
void iowrite32(u16 b, void __iomem *addr);
unsigned int ioread8(void __iomem *addr);
unsigned int ioread16(void __iomem *addr);
unsigned int ioread32(void __iomem *addr);
You might note that there is an alternatively IO API that can be found in some drivers.
您可能会注意到,在某些驱动程序中可以找到另一种 IO API。
c
#include <linux/io.h>
unsigned readb(address);
unsigned readw(address);
unsigned readl(address);
void writeb(unsigned value, address);
void writew(unsigned value, address);
void writel(unsigned value, address);
On x86 and ARM platforms, ioreadX
/iowriteX
functions are just inline wrappers around these readX
/writeX
functions. But for better portability and compatibility, it's highly recommended to use io*
functions.
在 x86 和 ARM 平台上,ioreadX
/iowriteX
函数只是这些 readX
/writeX
函数的内联包装器。但为了更好的可移植性和兼容性,强烈建议使用 io*
函数。
PCI DMA
PCI 直接内存访问
The high - performance device supports Direct Memory Access. This is implemented with bus mastering. Buse mastering is the capability of devices on the PCI bus to take control of the bus and perform transfers to the mapped memory directly.
高性能设备支持直接内存访问(DMA)。这是通过总线主控实现的。总线主控是指 PCI 总线上的设备能够接管总线,并直接对映射的内存执行传输。
Bus mastering (if supported) can be enabled and disabled with the following functions:
如果支持,可以使用以下函数启用和禁用总线主控:
c
void pci_set_master(struct pci_dev *dev);
void pci_clear_master(struct pci_dev *dev);
PCI interrupts
PCI 中断
Interrupt handling is critical in device drivers. Hardware may generate an interrupt on data reception event, error, state changes, and so on. All interrupts should be handled most optimally.
中断处理在设备驱动程序中至关重要。硬件可能会在数据接收事件、错误、状态变化等情况下生成中断。所有中断都应尽可能高效地处理。
There are two types of PCI interrupts:
有两种类型的 PCI 中断:
-
Pin - based (INTx) interrupts, an old and classic way
-
MSI/MSI - X interrupts, modern and more optimal way, introduced in PCI 2.2
-
基于引脚(INTx) 的中断,一种古老而经典的方式
-
MSI/MSI - X 中断,一种现代且更优的方式,引入于 PCI 2.2
It's highly recommended to use MSI interrupts when possible. There are a few reasons why using MSIs can be advantageous over traditional pin - based interrupts.
如果可能,强烈建议使用 MSI 中断。使用 MSI 而非传统的基于引脚的中断有几个优点。
Pin - based PCI interrupts are often shared amongst several devices. To support this, the kernel must call each interrupt handler associated with an interrupt, which leads to reduced performance for the system. MSIs are never shared, so this problem cannot arise.
基于引脚的 PCI 中断通常由多个设备共享。为了支持这一点,内核必须调用与中断相关联的每个中断处理程序,这会导致系统性能下降。MSI 永不共享,因此不会出现此问题。
When a device writes data to memory, then raises a pin - based interrupt, the interrupt may arrive before all the data has arrived in memory (this becomes more likely with devices behind PCI - PCI bridges). The interrupt handler must read a register on the device that raised the interrupt to ensure that all the data has arrived in memory. PCI transaction ordering rules require that all the data arrive in memory before the value may be returned from the register. Using MSI's avoids this problem as the interrupt - generating write cannot pass the data writes, so by the time the interrupt is raised, the driver knows that all the data has arrived in memory.
当设备将数据写入内存,然后引发基于引脚的中断时,中断可能会在所有数据到达内存之前到达(在 PCI - PCI 桥接器后面的设备中,这种情况更有可能发生)。中断处理程序必须读取引发中断的设备上的一个寄存器,以确保所有数据都已到达内存。PCI 事务排序规则要求所有数据必须在返回寄存器的值之前到达内存。使用 MSI 可以避免此问题,因为引发中断的写操作不能超越数据写操作,因此当引发中断时,驱动程序知道所有数据都已到达内存。
Please note that not all machines support MSIs correctly.
请注意,并非所有机器都能正确支持 MSI。
You can find information about currently allocated interrupts in /proc/interrupts
This information contains interrupt spreads over the CPU cores and interrupts types (MSI or pin - based). Typically interrupts are dynamically set for the CPU cores. A special daemon tries to spread interrupts in the most optimal way on some systems.
您可以在 /proc/interrupts
中找到有关当前已分配中断的信息。这些信息包括中断在 CPU 核心上的分布以及中断类型(MSI 或基于引脚)。通常,中断会动态分配给 CPU 核心。在某些系统上,一个特殊的守护进程会尝试以最优化的方式分配中断。
Also, you can manually select the CPU core for the selected interrupt. This might be helpful in some fine - tuning situations. The core assignment can be done via the SMP affinity mechanism. Just select required cores (in a binary pattern) and send this value (as HEX number) to /proc/irq/X/smp_affinity
, where X is the interrupt number. For example, put IRQ 44 to the first and third cores (set bits 0 and 2, from left to right):
此外,您还可以手动为选定的中断选择 CPU 核心。在某些微调情况下,这可能会有所帮助。可以通过 SMP 亲和性 机制完成核心分配。只需选择所需的核心(以二进制模式),并将该值(以十六进制数字表示)发送到 /proc/irq/X/smp_affinity
,其中 X 是中断编号。例如,将 IRQ 44 分配给第一和第三个核心(设置第 0 位和第 2 位,从左到右):
bash
echo 5 > /proc/irq/44/smp_affinity
Now let's see how to use both types of interrupts.
现在让我们看看如何使用这两种类型的中断。
INTx interrupts
INTx 中断
The classic pin - based interrupt can be requested with request_threaded_irq()
and request_irq()
经典的基于引脚的中断可以通过 request_threaded_irq()
和 request_irq()
请求。
c
int request_threaded_irq( unsigned int irq,
irq_handler_t handler,
irq_handler_t thread_fn,
unsigned long irqflags,
const char * devname,
void * dev_id);
int request_irq( unsigned int irq,
irqreturn_t handler,
unsigned long irqflags,
const char *devname,
void *dev_id);
For the new drivers, it's recommended to use request_threaded_irq()
对于新的驱动程序,建议使用 request_threaded_irq()
The first parameter, irq
specifies the interrupt number to allocate. For some devices, for example, legacy PC devices such as the system timer or keyboard, this value is typically hard - coded. It is probed or otherwise determined programmatically and dynamically for most other devices.
第一个参数 irq
指定要分配的中断编号。对于某些设备,例如系统时钟或键盘等传统的 PC 设备,此值通常是硬编码的。对于大多数其他设备,它会通过探测或其他方式动态地、程序化地确定。
The second parameter, handler
is the function to be called when the IRQ occurs.
第二个参数 handler
是在发生 IRQ 时要调用的函数。
thread_fn
-- Function called from the IRQ handler thread. If NULL -- no IRQ thread is created.
thread_fn
-- 在 IRQ 处理程序线程中调用的函数。如果为 NULL,则不创建 IRQ 线程。
irqflags
-- Interrupt type flags; Possible values can be found here.
irqflags
-- 中断类型标志;可能的值可以在 这里 找到。
dev_name
-- The string passed to request_irq is used in /proc/interrupts
to show the owner of the interrupt.
dev_name
-- 传递给 request_irq
的字符串用于在 /proc/interrupts
中显示中断的所有者。
dev_id
-- This pointer is used for shared interrupt lines. It is a unique identifier used when the interrupt line is freed, and the driver may also use that to point to its own private data area (to identify which device is interrupting).
dev_id
-- 此指针用于共享中断线。它是一个唯一的标识符,用于在释放中断线时使用,驱动程序也可以用它指向自己的私有数据区域(以识别哪个设备正在中断)。
In the end, all requested IRQs should be released with free_irq()
最终,所有请求的 IRQ 都应通过 free_irq()
释放。
c
void free_irq ( unsigned int irq, void * dev_id);
A simple example, install and use interrupt #42:
一个简单示例,安装并使用中断 #42:
c
static irqreturn_t sample_irq(int irq, void *dev_id)
{
printk("IRQ %d\n", irq);
return IRQ_HANDLED;
}
...
if (request_irq(42, test_irq, 0, "test_irq", 0) < 0) {
return -1;
}
MSI interrupts
MSI 中断
In the case of MSI/MSIX interrupts, everything is almost the same, but it's required to tell the PCI subsystem that we want to use MSI/MSIX interrupts.
在 MSI/MSIX 中断的情况下,几乎所有内容都几乎相同,但需要告诉 PCI 子系统我们想使用 MSI/MSIX 中断。
Use the following function:
使用以下函数:
c
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, unsigned int max_vecs, unsigned int flags);
This function allocates up to max_vecs interrupt vectors for a PCI device. It returns the number of vectors allocated or a negative error.
此函数为 PCI 设备分配多达 max_vecs 个中断向量。它返回分配的向量数量或一个负错误值。
The flags argument is used to specify which type of interrupt can be used by the device and the driver (PCI_IRQ_LEGACY , PCI_IRQ_MSI , PCI_IRQ_MSIX ). A convenient short - hand (PCI_IRQ_ALL_TYPES) is also available to ask for any possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set, pci_alloc_irq_vectors()
will spread the interrupts around the available CPUs.
参数 flags 用于指定设备和驱动程序可以使用的中断类型(PCI_IRQ_LEGACY 、PCI_IRQ_MSI 、PCI_IRQ_MSIX )。还可以使用一个方便的简写(PCI_IRQ_ALL_TYPES)来请求任何可能的中断类型。如果设置了 PCI_IRQ_AFFINITY 标志,pci_alloc_irq_vectors()
将在可用的 CPU 之间分配中断。
Of course, interrupt type (MSI/MSIX) and the number of MSI interrupts depend on your hardware.
当然,中断类型(MSI/MSIX)和 MSI 中断的数量取决于您的硬件。
Free the allocated resources with:
使用以下函数释放分配的资源:
c
void pci_free_irq_vectors(struct pci_dev *dev);
PCI driver skeleton
PCI 驱动程序框架
I think it's enough with boring theory. This is an example of the PCI device driver. This driver can load and register for specified VID/PID pairs. Some basic operations (config registers read, memory read/write) are performed.
我认为理论部分已经足够了。这是一个 PCI 设备驱动程序的示例。该驱动程序可以加载并为指定的 VID/PID 对注册。它执行了一些基本操作(配置寄存器读取、内存读写)。
c
/* Sample Linux PCI device driver */
#include <linux/init.h>
#include <linux/module.h>
#include <linux/pci.h>
#define MY_DRIVER "my_pci_driver"
/* This sample driver supports device with VID = 0x010F, and PID = 0x0F0E*/
static struct pci_device_id my_driver_id_table[] = {
{ PCI_DEVICE(0x010F, 0x0F0E) },
{0,}
};
MODULE_DEVICE_TABLE(pci, my_driver_id_table);
static int my_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
static void my_driver_remove(struct pci_dev *pdev);
/* Driver registration structure */
static struct pci_driver my_driver = {
.name = MY_DRIVER,
.id_table = my_driver_id_table,
.probe = my_driver_probe,
.remove = my_driver_remove
};
/* This is a "private" data structure */
/* You can store there any data that should be passed between driver's functions */
struct my_driver_priv {
u8 __iomem *hwmem;
};
/* */
static int __init mypci_driver_init(void)
{
/* Register new PCI driver */
return pci_register_driver(&my_driver);
}
static void __exit mypci_driver_exit(void)
{
/* Unregister */
pci_unregister_driver(&my_driver);
}
void release_device(struct pci_dev *pdev)
{
/* Disable IRQ #42*/
free_irq(42, pdev);
/* Free memory region */
pci_release_region(pdev, pci_select_bars(pdev, IORESOURCE_MEM));
/* And disable device */
pci_disable_device(pdev);
}
/* */
static irqreturn_t irq_handler(int irq, void *cookie)
{
(void) cookie;
printk("Handle IRQ #%d\n", irq);
return IRQ_HANDLED;
}
/* Reqest interrupt and setup handler */
int set_interrupts(struct pci_dev *pdev)
{
/* We want MSI interrupt, 3 lines (just an example) */
int ret = pci_alloc_irq_vectors(pdev, 3, 3, PCI_IRQ_MSI);
if (ret < 0) {
return ret;
}
/* Request IRQ #42 */
return request_threaded_irq(42, irq_handler, NULL, 0, "TEST IRQ", pdev);
}
/* Write some data to the device */
void write_sample_data(struct pci_dev *pdev)
{
int data_to_write = 0xDEADBEEF; /* Just a random trash */
struct my_driver_priv *drv_priv = (struct my_driver_priv *) pci_get_drvdata(pdev);
if (!drv_priv) {
return;
}
/* Write 32 - bit data to the device memory */
iowrite32(data_to_write, drv_priv->hwmem);
}
/* This function is called by the kernel */
static int my_driver_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{
int bar, err;
u16 vendor, device;
unsigned long mmio_start, mmio_len;
struct my_driver_priv *drv_priv;
/* Let's read data from the PCI device configuration registers */
pci_read_config_word(pdev, PCI_VENDOR_ID, &vendor);
pci_read_config_word(pdev, PCI_DEVICE_ID, &device);
printk(KERN_INFO "Device vid: 0x%X pid: 0x%X\n", vendor, device);
/* Request IO BAR */
bar = pci_select_bars(pdev, IORESOURCE_MEM);
/* Enable device memory */
err = pci_enable_device_mem(pdev);
if (err) {
return err;
}
/* Request memory region for the BAR */
err = pci_request_region(pdev, bar, MY_DRIVER);
if (err) {
pci_disable_device(pdev);
return err;
}
/* Get start and stop memory offsets */
mmio_start = pci_resource_start(pdev, 0);
mmio_len = pci_resource_len(pdev, 0);
/* Allocate memory for the driver private data */
drv_priv = kzalloc(sizeof(struct my_driver_priv), GFP_KERNEL);
if (!drv_priv) {
release_device(pdev);
return -ENOMEM;
}
/* Remap BAR to the local pointer */
drv_priv->hwmem = ioremap(mmio_start, mmio_len);
if (!drv_priv->hwmem) {
release_device(pdev);
return -EIO;
}
/* Set driver private data */
/* Now we can access mapped "hwmem" from the any driver's function */
pci_set_drvdata(pdev, drv_priv);
write_sample_data(pdev);
return set_interrupts(pdev);
}
/* Clean up */
static void my_driver_remove(struct pci_dev *pdev)
{
struct my_driver_priv *drv_priv = pci_get_drvdata(pdev);
if (drv_priv) {
if (drv_priv->hwmem) {
iounmap(drv_priv->hwmem);
}
pci_free_irq_vectors(pdev);
kfree(drv_priv);
}
release_device(pdev);
}
MODULE_LICENSE("GPL");
MODULE_AUTHOR("OlegKutkov <[email protected]>");
MODULE_DESCRIPTION("Test PCI driver");
MODULE_VERSION("0.1");
module_init(mypci_driver_init);
module_exit(mypci_driver_exit);
And Makefile:
以及 Makefile:
makefile
BINARY := test_pci_module
KERNEL := /lib/modules/$(shell uname -r)/build
ARCH := x86
C_FLAGS := -Wall
KMOD_DIR := $(shell pwd)
OBJECTS := test_pci.o
ccflags-y += $(C_FLAGS)
obj-m += $(BINARY).o
$(BINARY)-y := $(OBJECTS)
$(BINARY).ko:
make -C $(KERNEL) M=$(KMOD_DIR) modules
clean:
rm -f $(BINARY).ko
Real - life example
实际案例
I worked in the Crimean Astrophysical observatory a few years ago and found a PCI interface board for 4 incremental linear or angular encoders. I decided to use this board, but there was no Linux driver. I contacted the vendor and proposed to them to write an open - source driver for Linux. They were kind enough and proposed the full documentation. This was a table with memory offsets and data sizes. Basically, at which offsets I can read sensor data and so on.
几年前,我在克里米亚天体物理天文台工作时,发现了一块用于 4 个增量式线性或角编码器的 PCI 接口板。我决定使用这块板,但当时没有 Linux 驱动程序。我联系了供应商,建议他们为 Linux 编写一个开源驱动程序。他们非常慷慨,提供了完整的文档。那是一个包含内存偏移量和数据大小的表格。基本上,它告诉我可以在哪些偏移量处读取传感器数据等信息。
I wrote this driver. The driver uses a character device interface to interact with the user. It's quite simple and might be a good example to start PCI driver development.
我编写了 这个驱动程序。该驱动程序使用 字符设备 接口与用户交互。它非常简单,可能是开始 PCI 驱动程序开发的一个很好的例子。
via:
-
如何写 Linux PCI 驱动 --- The Linux Kernel documentation
https://www.kernel.org/doc/html/v6.15-rc1/translations/zh_CN/PCI/pci.html- How To Write Linux PCI Drivers --- The Linux Kernel documentation
https://docs.kernel.org/PCI/pci.html
- How To Write Linux PCI Drivers --- The Linux Kernel documentation
-
PCI Drivers - Linux Device Drivers, 3rd Edition [Book]
https://www.oreilly.com/library/view/linux-device-drivers/0596005903/ch12.html -
Writing a PCI device driver for Linux -- Oleg Kutkov personal blog
https://olegkutkov.me/2021/01/07/writing-a-pci-device-driver-for-linux/ -
pcie tlp preparation in linux driver | Linux.org
https://www.linux.org/threads/pcie-tlp-preparation-in-linux-driver.45843/