如何使能PCIe的ASPM?

1. ASPM概述

PCIe总线的电源管理包含ASPM(Active State Power Management)和软件电源管理两方面内容。所谓的ASPM是指PCIe链路在没有系统软件参与的情况下,由PCIe链路自发进行的电源管理方式。如下是PCIe的ASPM的状态机,其L1是强制性的规定,而L0s是可选的。

2. Debugging

2.1 如何查看ASPM的状态

对于Linux系统来说,可以使用"lspci -vvv"指令来查看ASPM的状态。

2.1.1 When ASPM is enabled

如下是一个PCIe ASPM使能的示例,请参考:

05:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
        Subsystem: Atheros Communications Inc. Device 3099
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 19
        Region 0: Memory at dbdf0000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [60] Express (v1) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L1 Enabled; RCB 128 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [90] MSI-X: Enable- Count=1 Masked-
                Vector table: BAR=0 offset=00000000
                PBA: BAR=0 offset=00000000
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140] Virtual Channel <?>
        Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: ath9k
        Kernel modules: ath9k

2.1.2 When ASPM is disabled

如下是一个PCIe ASPM没有使能的示例,请参考:

localhost ~ # lspci -vvvv -s 03:00
03:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
        Subsystem: Atheros Communications Inc. Device 309a
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at f0100000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [60] Express (v1) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM unknown, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [90] MSI-X: Enable- Count=1 Masked-
                Vector table: BAR=0 offset=00000000
                PBA: BAR=0 offset=00000000
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140] Virtual Channel <?>
        Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: ath9k
        Kernel modules: ath9k

2.1.3 为什么我的设备ASPM没有使能?

ASPM应该由RC和所有EP自动协商。如果你的设备查询之后,发现禁用了ASPM,可能有以下原因:

  • BIOS没有使能ASPM的相关设置
  • L0s是可选的,可能你得设备只支持L1
  • BIOS可能发生了相关未知的问题
  • ASPM不仅需要EP的支持,也需要RC的支持

2.2 如何使能ASPM

当前我们大多数的Wi-Fi芯片所使用的接口都是PCIe,而且大多数都是手持设备,所以这时候对于设备的功耗就有很大的要求。所以也就需要支持ASPM的L0s状态,以达到最佳的功耗状态。

2.2.1 如何在Kernel中使能ASPM

操作系统一般不干涉ASPM,但是我们可以通过Kernel来调试PCIe RC/EP的ASPM设置。所以Kernel一般需要使能CONFIG_PCIEASPM配置,以此达到能够调试的目的:

config PCIEASPM
        bool "PCI Express ASPM support(Experimental)"
        depends on PCI && EXPERIMENTAL && PCIEPORTBUS
        default n
        help
          This enables PCI Express ASPM (Active State Power Management) and
          Clock Power Management. ASPM supports state L0/L0s/L1.

          When in doubt, say N.
2.2.1.1 强制ASPM的状态

也可以通过boot的参数,强制enable/disable ASPM:

        pcie_aspm=      [PCIE] Forcibly enable or disable PCIe Active State Power
                        Management.
                off     Disable ASPM.
                force   Enable ASPM even on devices that claim not to support it.
                        WARNING: Forcing ASPM on may cause system lockups.

2.2.2 使用enable_aspm使能ASPM

enable_aspm是一个脚本,可以用于启用ASPM。具体你可以阅读如下链接:

只需要修改如下三个参数:

ROOT_COMPLEX="00:1c.1"
ENDPOINT="03:00.0"

# We'll only enable the last 2 bits by using a mask
# of :3 to setpci, this will ensure we keep the existing
# values on the byte.
#
# Hex  Binary  Meaning
# -------------------------
# 0    0b00    L0 only
# 1    0b01    L0s only
# 2    0b10    L1 only
# 3    0b11    L1 and L0s
ASPM_SETTING=3

2.2.3 使用setpci使能ASPM

"PCIE Link Control Register"可以通过"lspci -vvv"进行读取,所以可以通过"setpci"工具修改相关寄存器,来使能PCIe ASPM。

如下是"Link Control Register"中关于ASPM的状态展示:

0b00 = L0 only
0b01 = L0s only
0b10 = L1 only
0b11 = L1 and L0s

首先查找你想要的设备,如下所示:

user@tux ~ $ lspci | grep -i atheros
03:00.0 Network controller: Atheros Communications Inc. Device 0030 (rev 01)

03:00.0是总线地址。现在,使用"lspci -t"检查该设备位于哪个RC上。

-[0000:00]-+-00.0
           +-02.0
           +-02.1
           +-03.0
           +-03.2
           +-03.3
           +-19.0
           +-1a.0
           +-1a.1
           +-1a.7
           +-1b.0
           +-1c.0-[0000:02]--
           +-1c.1-[0000:03]----00.0
           +-1c.2-[0000:04]--
           +-1c.3-[0000:05-0c]--
           +-1c.4-[0000:0d-14]--
           +-1d.0
           +-1d.1
           +-1d.2
           +-1d.7
           +-1e.0-[0000:15-18]--+-00.0
           |                    \-00.1
           +-1f.0
           +-1f.1
           +-1f.2
           \-1f.3

在这种情况下,我们看到03:00.0位于00:1c.1上,你可以执行"lspci -s 00:1c.1 -xxx",以获取该设备的PCI配置空间。PCIe规范有一个有趣的小算法,可以从PCI配置空间中找到链路控制寄存器。逻辑如下:

  • Read 0x34 and read the register that points to

  • If that value is not 0x10 then read the next byte (0x35) and go read that register

  • If that register is not 0x10 then read the next byte and go read that register

  • Repeat this until you find a register that has 0x10

  • Once you find the register with 0x10 then add 0x10 to the final register you were reading

  • The Link Control Register is this final register + 0x10 Lets analyze a real world example of a root complex, specifically the one of the root complex above.

    user@tux ~ $ sudo lspci -s 00:1c.1 -xxx
    00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
    00: 86 80 41 28 07 05 10 00 03 00 04 06 10 00 81 00
    10: 00 00 00 00 00 00 00 00 00 03 03 00 30 30 00 00
    20: 00 dc 30 df e1 df e1 df 00 00 00 00 00 00 00 00
    30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 02 04 00
    40: 10 80 41 01 c0 8f 00 00 00 00 10 00 11 2c 11 02
    50: 40 00 11 30 e0 a0 18 00 00 00 48 01 00 00 00 00
    60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    80: 05 90 01 00 0c 30 e0 fe 69 41 00 00 00 00 00 00
    90: 0d a0 00 00 aa 17 ad 20 00 00 00 00 00 00 00 00
    a0: 01 00 02 c8 00 00 00 00 00 00 00 00 00 00 00 00
    b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    d0: 00 00 00 00 00 00 00 00 80 00 11 08 00 00 00 00
    e0: 00 0f c7 00 06 07 08 00 33 00 00 00 00 00 00 00
    f0: 00 00 00 00 00 00 00 00 86 0f 05 00 00 00 00 00

首先读取地址0x34,我们看到它是0x40(在这里不要跳到下一个字节)。我们读取0x40,并看到它是0x10。现在我们加上0x40 + 0x10 = 0x50。我们读取0x50。0x50是链路控制寄存器的值。0x50的值是0x40。这意味着只有L0被启用,因此ASPM完全被禁用。要调整此RC的ASPM,我们需要首先保留原始值,然后与我们的新ASPM设置进行OR运算。

注意:事实证明,0x50也用于ICH6、ICH7、ICH8、ICH9的链路控制寄存器。

# Disables ASPM, enables only L0 (this was the existing setting)
sudo setpci -s 00:1c.1 0x50.B=0x40

# Enable L0s only 
sudo setpci -s 00:1c.1 0x50.B=0x41

# Enable L1 only
sudo setpci -s 00:1c.1 0x50.B=0x42

# Enable L1 and L0s
sudo setpci -s 00:1c.1 0x50.B=0x43

现在,让我们可以调整你的设备。获取到的设备的PCIe配置空间如下所示:

user@tux ~ $ sudo lspci -s 03:00.0 -xxx
03:00.0 Network controller: Atheros Communications Inc. Device 0030 (rev 01)
00: 8c 16 30 00 03 01 10 40 01 00 80 02 10 00 00 00
10: 04 00 3e df 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 16 31
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 84 01 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 00 02 00 00 87 04 05 10 20 0b 00 11 5c 03 00
80: 41 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

这个例子稍微复杂一些,所以我们将逐行进行分析:

00: 8c 16 30 00 03 01 10 40 01 00 80 02 10 00 00 00
10: 04 00 3e df 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 8c 16 16 31
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
    ^            ^
    |            |
   0x30         0x34

So 0x34 = 0x40. 0x40 is not 0x10 so we go read 0x40 now

40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00
    ^
    |
   0x40 = 0x01, this is not 0x10 so read the next byte

40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00
       ^
       |
      0x41 = 0x50, so go read that register next

50: 05 70 84 01 00 00 00 00 00 00 00 00 00 00 00 00
    ^
    |
   0x50 = 0x05, this is not 0x10, so go read the next byte.
   The next byte 0x51 = 0x70 so we go read that register next.

70: 10 00 02 00 00 87 04 05 10 20 0b 00 11 5c 03 00
    ^
    |
    At last, 0x70 = 0x10. So now we do 0x70 + 0x10 = 0x80 and go read 0x80.

80: 41 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00
    ^
    |
    0x80 = 0x41
    0x41 = 0b1000001 so this has ASPM L0s on only.

所以,使用如下指令修改PCIe ASPM的状态:

# Disables ASPM, enables only L0
sudo setpci -s 03:00.0 0x80.B=0x40

# Enable L0s only (this was the existing setting)
sudo setpci -s 03:00.0 0x80.B=0x41

# Enable L1 only
sudo setpci -s 03:00.0 0x80.B=0x42

# Enable L1 and L0s
sudo setpci -s 03:00.0 0x80.B=0x43