NVIDIA /CUDA 里面的clock rate详细介绍

本文主要介绍：

cuda中的时钟频率具体有哪些？
clock rate怎么调节？

cuda中可以通过nvml 函数或者命令来调整时钟频率（clock rate）

介绍

命令行 nvdia-smi -q -i 0 可以查询device相关参数，可以用后面的命令过滤clock相关参数，

bash 复制代码

$ nvidia-smi -q -d CLOCK -i 0
	Clocks                             
        Graphics Clock                  : 1410 MHz
        SM Clock                        : 1410 MHz
        Memory Clock                    : 1215 MHz
        Video Clock                     : 1275 MHz
    Applications Clocks                
        Graphics Clock                  : 1410 MHz
        Memory Clock                    : 1215 MHz
    Default Applications Clocks        
        Graphics Clock                  : 765 MHz
        Memory Clock                    : 1215 MHz
    Deferred Clocks                    
        Memory Clock                    : N/A
    Max Clocks                         
        Graphics Clock                  : 1410 MHz
        SM Clock                        : 1410 MHz
        Memory Clock                    : 1215 MHz
        Video Clock                     : 1290 MHz
    Max Customer Boost Clocks          
        Graphics Clock                  : 1410 MHz
    Clock Policy                       
        Auto Boost                      : N/A  # (disable/enable)
        Auto Boost Default              : N/A  #（disable/enable）

通过上面可以了解到，最新的CUDA12 显示了以下几个clock

Clocks
代表的是目前实时频率
Applications Clocks
Application clock，也就是说CUDA runtime 启动后的时钟频率，启动后就和第一个"Clocks"一样的
当application设置后，无程序跑的时候比较低大概200-600之间。idle clock，运行kernel，其值与Application Clocks一致。
在不支持application的机器上，设置locked clock后，其值为设定的locked clock(当设定的clock rate > boost clock后，会动态变化> boost clock GPU Boost 4.0)
Default Applications Clocks
这个是默认的 Applications Clocks ，当用户设置了Applications Clocks 后，再次返回的时候可以返回到这个默认值
Deferred Clocks
没有研究
Max Clocks
最大的时钟频率，包括超频

对于clock还有其它的一些clock rate 没有在上面体现,具体为下面两项

Base Clock:

(nvidia-smi base-clocks -i 0)The Base Clock of a graphics card (also sometimes referred to as the "Core Clock") is the minimum speed at which the GPU is advertised to run. In normal conditions, the GPU of the card will not drop below this clock speed unless conditions are significantly altered. This number is more significant in older cards but is becoming less and less relevant as boosting technologies take center stage.

Boost Clock:

The advertised Boost Clock of the card is the maximum clock speed that the graphics card can achieve under normal conditions before the GPU Boost is activated. This clock speed number is generally quite a bit higher than the Base Clock and the card uses up most of its power budget to achieve this number. Unless the card is thermally constrained, it will hit this advertised boost clock. This is also the parameter that is altered in "Factory Overclocked" cards from AIB partners.

其中大小关系是Base Clock <= Boost Clock <= Max Clocks <=max Boost Clocks.

Auto Boost相关说明入如下：

Auto Boost 大部分不支持：

GPU Boost technology allows the card to boost much higher than the advertised "Boost Clock" that may be listed on the box or on the product page.

Increase Performance with GPU Boost and K80 Autoboost | NVIDIA Technical Blog

nvmlDeviceSetAutoBoostedClocksEnabled （nvidia-smi --auto-boost-default=ENABLED -i 0）

所说的不支持指的是不支持打开和关闭

对于cuda的 cudaGetDeviceProperties.clock_rate对应的

c 复制代码

cudaDeviceProp device_prop;
  err = cudaGetDeviceProperties(&device_prop, device);
  if (err != cudaSuccess) {
    return (Error_t)err;
  }

对于支持application clock rate的设备，对应的是上面的：Applications Clocks->Graphics Clock / SM Clock，
对于不支持application clock的卡，其值是boost clock，并不是locked clock，这一点需要注意，并且该值只能通过spec去查询，nvml中查询不到，只能查询到base clock

比如：RTX4090 properties.clock_rate = 2520 MHz, 其中通过命令行查询后如下，nvidia-smi -q -i 3 找不到2520Mhz

测试性能的时候，是否需要set max clock rate？或者reset default clock rate?

一定需要的。

可以通过命令行或者API（1.4会详细介绍）修改上面提到的具体运行clock rate , 我们的卡是多人使用，一旦这些参数被人篡改（比较低的值），测试性能急剧下降。并且不稳定。

bash 复制代码

nvidia-smi  --applications-clocks=9001,2520 -i 0
 nvidia-smi  --reset-applications-clocks -i
 nvidia-smi  --lock-gpu-clocks=3105,3105 -i 0
 nvidia-smi  --reset-gpu-clocks
 nvidia-smi  --lock-memory-clocks=680，680 -i0
 nvidia-smi  --reset-memory-clocks
 
#查询
nvidia-smi --query-supported-clocks "gr" --format=csv -i 0
nvidia-smi -i 0 --query-supported-clocks=mem,gr --format=csv
--help-query-supported-clocks

那么能不能单纯的reset设置呢？

其实在reset default clock rate情况下，会让devie处于Dynamic Clocking（GPU BOOST）状态下.

Starting with our GPU and later, every application and game runs at a guaranteed, minimum Base Clock speed.

If there's extra power available, a Boost Clock is enabled increasing clock speeds until the graphics card hits its predetermined Power Target.

This dynamic clock speed adjustment is controlled by GPU Boost, which monitors a raft of data and makes real-time changes to speeds and voltages several times per second, maximizing performance in each and every application.

在下面的图中可以看到，不同频率下包括Boost模式，性能是不一样的。

哪些函数能设置clock rate？

clocks set/reset相关函数：

SM clock rate

nvmlDeviceSetApplicationsClocks

nvmlDeviceResetApplicationsClocks

nvmlDeviceSetGpuLockedClocks

nvmlDeviceResetGPULockedClocks

Mem clock rate

nvmlDeviceSetApplicationsClocks

nvmlDeviceResetApplicationsClocks

nvmlDeviceSetMemoryLockedClocks

nvmlDeviceResetMemoryLockedClocks

两者区别：

nvmlDeviceSetApplicationsClocks/nvmlDeviceResetApplicationsClocks

当设置了application后，只有当有kernel在GPU上运行的时候才会锁定到设置的频率。

nvmlDeviceSetGpuLockedClocks/nvmlDeviceSetMemoryLockedClocks

当设置了lockedclock会让GPU 一直处于设置的频率，即使没有kernel在GPU上运行。

如下图所示，当设置了gpu/mem lockedclock，即使在没有进程的情况下，GPU的实际频率仍然为设置的2520，9001。

注意：部分机器比如A100支持nvmlDeviceSetGpuLockedClocks，不支持nvmlDeviceSetMemoryLockedClocks，但是设置了前者后，后者自动设置为最大值。

如下图所示，当reset gpu/mem lockedclock，在没有进程的情况下，GPU的实际频率为IDLE状态210，405

如何设置application clock？

Before you can change the application clocks you need to put the GPU in Persistence Mode（1.7介绍） and query the available application clock rates.

Persistence mode ensures that the driver stays loaded even when no CUDA or X applications are running on the GPU.

This maintains current state, including requested applications clocks.Persistence Mode is necessary to make application clock changes persistent until the application runs. Enable Persistence Mode with the following command line (for GPU 0).

Increase Performance with GPU Boost and K80 Autoboost | NVIDIA Technical Blog

c 复制代码

nvmlReturn_t DECLDIR nvmlDeviceSetApplicationsClocks(nvmlDevice_t device, unsigned int memClockMHz, unsigned int graphicsClockMHz);

Set clocks that applications will lock to.
Sets the clocks that compute and graphics applications will be running at.

e.g. CUDA driver requests these clocks during context creation which means this property defines clocks at which CUDA applications will be running unless some overspec event occurs (e.g. over power, over thermal or external HW brake).

Can be used as a setting to request constant performance.
On Pascal and newer hardware, this will automatically disable automatic boosting of clocks.
On K80 and newer Kepler and Maxwell GPUs, users desiring fixed performance should also call nvmlDeviceSetAutoBoostedClocksEnabled （nvidia-smi --auto-boost-default=ENABLED -i 0）to prevent clocks from automatically boosting above the clock value being set.
For Kepler &tm; or newer non-GeForce fully supported devices and Maxwell or newer GeForce devices. Requires root/admin permissions.
See nvmlDeviceGetSupportedMemoryClocks and nvmlDeviceGetSupportedGraphicsClocks for details on how to list available clocks combinations.（nvidia-smi -q -i 0 -d SUPPORTED_CLOCKS）
After system reboot or driver reload applications clocks go back to their default value. See \ref nvmlDeviceResetApplicationsClocks.

reset application clock

c 复制代码

nvmlReturn_t DECLDIR nvmlDeviceResetApplicationsClocks(nvmlDevice_t device);

Resets the application clock to the default value

This is the applications clock that will be used after system reboot or driver reload.

Default value is constant, but the current value an be changed using \ref nvmlDeviceSetApplicationsClocks.

On Pascal and newer hardware, if clocks were previously locked with \ref nvmlDeviceSetApplicationsClocks, this call will unlock clocks.

This returns clocks their default behavior of automatically boosting above base clocks as thermal limits allow.

application辅助函数 AutoBoosted clocks

作用：

Auto Boosted clocks are enabled by default on some hardware, allowing the GPU to run at higher clock rates, to maximize performance as thermal limits allow.

AutoBoosted clocks should be disabled if fixed clock rates are desired.

1.4.3.1 set auto boosted clock 函数

c 复制代码

/*
 * @param device                               The identifier of the target device
 * @param enabled                              What state to try to set Auto Boosted clocks of the target device to
 *
 * @return
 *         - \ref NVML_SUCCESS                 If the Auto Boosted clocks were successfully set to the state specified by \a enabled
 *         - \ref NVML_ERROR_UNINITIALIZED     if the library has not been successfully initialized
 *         - \ref NVML_ERROR_INVALID_ARGUMENT  if \a device is invalid
 *         - \ref NVML_ERROR_NOT_SUPPORTED     if the device does not support Auto Boosted clocks
 *         - \ref NVML_ERROR_GPU_IS_LOST       if the target GPU has fallen off the bus or is otherwise inaccessible
 *         - \ref NVML_ERROR_UNKNOWN           on any unexpected error
*/
nvmlReturn_t DECLDIR nvmlDeviceSetAutoBoostedClocksEnabled(nvmlDevice_t device, nvmlEnableState_t enabled);

Try to set the current state of Auto Boosted clocks on a device. For Kepler &tm; or newer fully supported devices.
On Pascal and newer hardware, Auto Boosted clocks are controlled through application clocks. Use \ref nvmlDeviceSetApplicationsClocks and \ref nvmlDeviceResetApplicationsClocks to control Auto Boost behavior.
Non-root users may use this API by default but can be restricted by root from using this API by calling \ref nvmlDeviceSetAPIRestriction with apiType=NVML_RESTRICTED_API_SET_AUTO_BOOSTED_CLOCKS.

Note: Persistence Mode is required to modify current Auto Boost settings, therefore, it must be enabled.

reset auto boosted clock 函数

Try to set the default state of Auto Boosted clocks on a device. This is the default state that Auto Boosted clocks will return to when no compute running processes (e.g. CUDA application which have an active context) are running

c 复制代码

/* @param device                               The identifier of the target device
 * @param enabled                              What state to try to set default Auto Boosted clocks of the target device to
 * @param flags                                Flags that change the default behavior. Currently Unused.
 */
nvmlReturn_t DECLDIR nvmlDeviceSetDefaultAutoBoostedClocksEnabled(nvmlDevice_t device, nvmlEnableState_t enabled, unsigned int flags);

locked clock 函数

1.5.1 nvmlDeviceSetGpuLockedClocks/nvmlDeviceReSetMemoryLockedClocks

通过Locked Clock锁定clock rate后（<=boost clock rate），执行kernel的时候，频率会被限制在设定的范围内。

但是当lockedclock 设置的【min,max】，其中的min,max处于boost clock与max clock之间的话，那么实际clockrate 不一定是TBD，它会在之间某一个值之后不再增长。

比如RTX4090 TBD=3105MHz的时候，实际clockrate 为2775，这时候如果用3105计算时间，就会有误差。

c 复制代码

/*
 * @param device                               The identifier of the target device
 * @param minGpuClockMHz                       Requested minimum gpu clock in MHz
 * @param maxGpuClockMHz                       Requested maximum gpu clock in MHz
 *
 * @return
 *         - \ref NVML_SUCCESS                 if new settings were successfully set
 *         - \ref NVML_ERROR_UNINITIALIZED     if the library has not been successfully initialized
 *         - \ref NVML_ERROR_INVALID_ARGUMENT  if \a device is invalid or \a minGpuClockMHz and \a maxGpuClockMHz
 *                                                 is not a valid clock combination
 *         - \ref NVML_ERROR_NO_PERMISSION     if the user doesn't have permission to perform this operation
 *         - \ref NVML_ERROR_NOT_SUPPORTED     if the device doesn't support this feature
 *         - \ref NVML_ERROR_GPU_IS_LOST       if the target GPU has fallen off the bus or is otherwise inaccessible
 *         - \ref NVML_ERROR_UNKNOWN           on any unexpected error
 */
nvmlReturn_t DECLDIR nvmlDeviceSetGpuLockedClocks(nvmlDevice_t device, unsigned int minGpuClockMHz, unsigned int maxGpuClockMHz);
 
 
* @param device                               The identifier of the target device
 * @param minMemClockMHz                       Requested minimum memory clock in MHz
 * @param maxMemClockMHz                       Requested maximum memory clock in MHz
 *
 * @return
 *         - \ref NVML_SUCCESS                 if new settings were successfully set
 *         - \ref NVML_ERROR_UNINITIALIZED     if the library has not been successfully initialized
 *         - \ref NVML_ERROR_INVALID_ARGUMENT  if \a device is invalid or \a minGpuClockMHz and \a maxGpuClockMHz
 *                                                 is not a valid clock combination
 *         - \ref NVML_ERROR_NO_PERMISSION     if the user doesn't have permission to perform this operation
 *         - \ref NVML_ERROR_NOT_SUPPORTED     if the device doesn't support this feature
 *         - \ref NVML_ERROR_GPU_IS_LOST       if the target GPU has fallen off the bus or is otherwise inaccessible
 *         - \ref NVML_ERROR_UNKNOWN           on any unexpected error
 */
nvmlReturn_t DECLDIR nvmlDeviceSetMemoryLockedClocks(nvmlDevice_t device, unsigned int minMemClockMHz, unsigned int maxMemClockMHz);

Set clocks that device will lock to.
Sets the clocks that the device will be running at to the value in the range of minGpuClockMHz to maxGpuClockMHz.

Setting this will supersede application clock values and take effect regardless if a cuda app is running.
Can be used as a setting to request constant performance.
This can be called with a pair of integer clock frequencies in MHz, or a pair of /ref nvmlClockLimitId_t values.

See the table below for valid combinations of these values.
minGpuClock | maxGpuClock | Effect
------------±------------±-------------------------------------------------

复制代码

  tdp     |     tdp     | Lock clock to TDP

unlimited | tdp | Upper bound is TDP but clock may drift below this

复制代码

  tdp     |  unlimited  | Lower bound is TDP but clock may boost above this

unlimited | unlimited | Unlocked (== nvmlDeviceResetGpuLockedClocks)

If one arg takes one of these values, the other must be one of these values as

well. Mixed numeric and symbolic calls return NVML_ERROR_INVALID_ARGUMENT.
Requires root/admin permissions.
After system reboot or driver reload applications clocks go back to their default value.
For Volta &tm; or newer fully supported devices.

问题：要不要设置AutoBoosted？

不用设置。

nvmlDeviceResetGpuLockedClocks/nvmlDeviceResetMemoryLockedClocks

c 复制代码

* @param device                               The identifier of the target device
 *
 * @return
 *         - \ref NVML_SUCCESS                 if new settings were successfully set
 *         - \ref NVML_ERROR_UNINITIALIZED     if the library has not been successfully initialized
 *         - \ref NVML_ERROR_INVALID_ARGUMENT  if \a device is invalid
 *         - \ref NVML_ERROR_NOT_SUPPORTED     if the device does not support this feature
 *         - \ref NVML_ERROR_GPU_IS_LOST       if the target GPU has fallen off the bus or is otherwise inaccessible
 *         - \ref NVML_ERROR_UNKNOWN           on any unexpected error
 */
nvmlReturn_t DECLDIR nvmlDeviceResetGpuLockedClocks(nvmlDevice_t device);
 
 
/ * @param device                               The identifier of the target device
 *
 * @return
 *         - \ref NVML_SUCCESS                 if new settings were successfully set
 *         - \ref NVML_ERROR_UNINITIALIZED     if the library has not been successfully initialized
 *         - \ref NVML_ERROR_INVALID_ARGUMENT  if \a device is invalid
 *         - \ref NVML_ERROR_NOT_SUPPORTED     if the device does not support this feature
 *         - \ref NVML_ERROR_GPU_IS_LOST       if the target GPU has fallen off the bus or is otherwise inaccessible
 *         - \ref NVML_ERROR_UNKNOWN           on any unexpected error
 */
nvmlReturn_t DECLDIR nvmlDeviceResetMemoryLockedClocks(nvmlDevice_t device);

Resets the gpu clock to the default value
This is the gpu clock that will be used after system reboot or driver reload.

Default values are idle clocks, but the current values can be changed using \ref nvmlDeviceSetApplicationsClocks.
For Volta &tm; or newer fully supported devices.

在支持application clock的机器上同时使用nvmlDeviceSetApplicationsClocks，nvmlDeviceSetGpuLockedClocks，

会产生locked clock现象， clock rate为nvmlDeviceSetApplicationsClocks的值，自动忽略

详细如下：

项目	Value	application clock	real clock（无kernel运行）	real clock（有kernel运行）
nvidia-smi --applications-clocks=405,405 -i 0	x	405,405	210,405	405，405
nvidia-smi --lock-gpu-clocks=2520,9001 -m 0 -i 0	nvidia-smi --applications-clocks=405,405 -i 0	405,405	405,405	405，405
nvidia-smi --applications-clocks=405,405 -i 0	nvidia-smi --lock-gpu-clocks=2520,9001 -m 0 -i 0	405,405	405,405	405，405
x	nvidia-smi --lock-gpu-clocks=2520,9001 -m 0 -i 0	2520,9001	2520,9001	2520，9001
综上：

对于支持Application的device，如果使用了nvmlDeviceSetApplicationsClocks，就不要再使用nvmlDeviceSetGpuLockedClocks

对于不支持Application的device，使用nvmlDeviceSetGpuLockedClocks。

Persistence mode

c 复制代码

vidia-smi -i <target gpu> - q
    ==============NVSMI LOG==============
 
    Timestamp                           : ----
    Driver Version                      : ----
 
    Attached GPUs                       : ----
    GPU 0000:01:00.0
        Product Name                    : ----
        Display Mode                    : ----
        Display Active                  : ----
        Persistence Mode                : Enabled
        Accounting Mode                 : ----

Persistence Mode is the term for a user-settable driver property that keeps a target GPU initialized even when no clients are connected to it.

The GPU state remains loaded in the driver whenever one or more clients have the device file open. Once all clients have closed the device file, the GPU state will be unloaded unless persistence mode is enabled.

Application start latency

Applications that trigger GPU initilization may incur a short (order of 1-3 second) startup cost per GPU due to ECC scrubbing behavior. If the GPU is already initialized this scrubbing does not take place.

Preservation of driver state

If the driver deinitializes a GPU some non-persistent state associated with that GPU will be lost and revert back to defaults the next time the GPU is initialized. See Data Persistence. To avoid this the GPU should be kept initialized.

nvidia-persistenced --help

persistence mode对NVML的影响如下，从表中可以看出enable下init影响很大

下面是API调用时间.

	set max clock	set default clock	getML info
application clock	10.07ms	16.63ms	57.02ms
locked clock	37.41ms	20ms	54.81

在disable状态下，只要进程退出（i.e. it is idle, technically: no contexts of any kind are instantiated on the GPU），再次进入，所花费的时间还是和上次一样的（比较久）。

所以最好在脚本上设置一下。

persistence mode设置命令 nvidia-smi -i 0, 会将当前driver设置为disable或者enable mode，会影响所有卡，-i 0 这条命令其实没有用。