Android各种部件温度获取逻辑探究

Android各种部件温度获取逻辑探究

Preface

事件的来源是做一个性能监测工具获取GPU,电池温度数据时反了snapdragon的pluginGPU-GGPM的so,这个so对应会获取GPU GeneralGPU theramal也就是GP温度数据。snapdragon会把获取温度的文件名通过logcat打出来,如下:

erlang 复制代码
sub_94FE0(v69, 1LL, "GGPMProvider", "GGPM_DP: Reading GPU Temperature from '/sys/class/kgsl/kgsl-3d0/temp'");

可以在snapdrgon连接上手机后从logcat日志出看出这一句:

less 复制代码
grus:/ $ logcat | grep Temperature
11-23 11:01:53.264 17088 17088 I SDP     : SDPCore.Metric: Metric 'GPU Temperature' activated for all processes
11-23 11:10:12.597 23936 23936 I SDP     : GGPMProvider: GGPM_DP: Reading GPU Temperature from '/sys/devices/virtual/thermal/thermal_zone29/temp'

也就是说snapdrgon是从/sys/devices/virtual/thermal/thermal_zone29/temp这个文件中拿出的GPU温度数据。

在这之前,我获取CPU温度数据的逻辑也是从一些文件中读取数据,比如:

ruby 复制代码
            "/sys/class/thermal/thermal_zone7/temp",
            "/sys/devices/virtual/thermal/thermal_zone7/temp",
//            "/sys/kernel/debug/tegra_thermal/temp_tj",
//            "/sys/devices/platform/s5p-tmu/curr_temp",
//            "/sys/devices/virtual/thermal/thermal_zone1/temp",//常用
//            "/sys/devices/system/cpu/cpufreq/cput_attributes/cur_temp",
//            "/sys/devices/virtual/hwmon/hwmon2/temp1_input",
//            "/sys/devices/platform/coretemp.0/temp2_input",
//            "/sys/devices/virtual/thermal/thermal_zone0/temp",
//            "/sys/devices/system/cpu/cpu0/cpufreq/cpu_temp",
//            "/sys/devices/platform/omap/omap_temp_sensor.0/temperature",
//            "/sys/class/thermal/thermal_zone1/temp",
//            "/sys/devices/platform/s5p-tmu/temperature",
//            "/sys/devices/w1 bus master/w1_master_attempts",
//            "/sys/class/thermal/thermal_zone0/temp"

​

 
            "/sys/devices/virtual/thermal/thermal_zone0/temp",
            "/sys/class/thermal/thermal_zone0/temp",
            "/sys/kernel/debug/tegra_thermal/temp_tj",
            "/sys/devices/platform/s5p-tmu/curr_temp",
            "/sys/devices/virtual/thermal/thermal_zone1/temp",
            "/sys/devices/system/cpu/cpufreq/cput_attributes/cur_temp",
            "/sys/devices/virtual/hwmon/hwmon2/temp1_input",
            "/sys/devices/platform/coretemp.0/temp2_input",
            "/sys/devices/platform/omap/omap_temp_sensor.0/temperature",
            "/sys/class/thermal/thermal_zone1/temp",
            "/sys/devices/platform/s5p-tmu/temperature",
            "/sys/devices/w1 bus master/w1_master_attempts",
            "/sys/devices/system/cpu/cpu0/cpufreq/cpu_temp",
            "/sys/devices/system/cpu/cpu0/cpufreq/FakeShmoo_cpu_temp",
            "/sys/class/i2c-adapter/i2c-4/4-004c/temperature",
            "/sys/devices/platform/tegra-i2c.3/i2c-4/4-004c/temperature",
            "/sys/devices/platform/tegra_tmon/temp1_input",
            "/sys/class/hwmon/hwmon0/device/temp1_input",
            "/sys/devices/virtual/thermal/thermal_zone1/temp",
            "/sys/class/thermal/thermal_zone3/temp",
            "/sys/class/thermal/thermal_zone4/temp",
            "/sys/class/hwmon/hwmonX/temp1_input",
            "/sys/devices/platform/s5p-tmu/curr_temp"

可以看到,获取CPU温度中主要使用了/sys/devices/virtual/thermal/thermal_zone7/temp。现在的问题是,这些thermal_zone到底有什么作用? 如何使用分辨不同的thermal_zone的作用?

Manuscript

为了解决上述的问题,有了下面的探究。

不同的厂家和设备之间可能存在着明显的兼容性问题,因此此处表明本次测试环境是XIAO MI9 SE, MIUI 12.0.3, Android 10。

初步认识thermal_zone

javascript 复制代码
126|grus:/ $ ls /sys/class/thermal/
cooling_device0  cooling_device16 cooling_device7 thermal_zone12 thermal_zone2  thermal_zone27 thermal_zone34 thermal_zone41 thermal_zone8
cooling_device1  cooling_device17 cooling_device8 thermal_zone13 thermal_zone20 thermal_zone28 thermal_zone35 thermal_zone42 thermal_zone9
cooling_device10 cooling_device18 cooling_device9 thermal_zone14 thermal_zone21 thermal_zone29 thermal_zone36 thermal_zone43
cooling_device11 cooling_device2  thermal_message thermal_zone15 thermal_zone22 thermal_zone3  thermal_zone37 thermal_zone44
cooling_device12 cooling_device3  thermal_zone0   thermal_zone16 thermal_zone23 thermal_zone30 thermal_zone38 thermal_zone45
cooling_device13 cooling_device4  thermal_zone1   thermal_zone17 thermal_zone24 thermal_zone31 thermal_zone39 thermal_zone5
cooling_device14 cooling_device5  thermal_zone10  thermal_zone18 thermal_zone25 thermal_zone32 thermal_zone4  thermal_zone6
cooling_device15 cooling_device6  thermal_zone11  thermal_zone19 thermal_zone26 thermal_zone33 thermal_zone40 thermal_zone7
  1. 可以看出/sys/class/thermal目录下存在着很多文件夹,主要包括cooling_device开头的以及thermal_zone开头的。
  2. 进一步会发现,每一个thermal_zone文件下的文件都包括:temp,type,subsystem等文件如下,关于这些文件的作用可以参考小米内核的描述: sysfs-api.txt
vbnet 复制代码
https://github.com/MiCode/Xiaomi_Kernel_OpenSource/blob/c218005419cfebd4332773623d464588752d7b11/Documentation/thermal/sysfs-api.txt#L264
​
Thermal zone device sys I/F, created once it's registered:
/sys/class/thermal/thermal_zone[0-*]:
    |---type:           Type of the thermal zone   # 类别
    |---temp:           Current temperature        # 温度
    |---mode:           Working mode of the thermal zone
    |---policy:         Thermal governor used for this zone
    |---available_policies: Available thermal governors for this zone
    |---trip_point_[0-*]_temp:  Trip point temperature
    |---trip_point_[0-*]_type:  Trip point type
    |---trip_point_[0-*]_hyst:  Hysteresis value for this trip point
    |---emul_temp:      Emulated temperature set node
    |---sustainable_power:      Sustainable dissipatable power
    |---k_po:                   Proportional term during temperature overshoot
    |---k_pu:                   Proportional term during temperature undershoot
    |---k_i:                    PID's integral term in the power allocator gov
    |---k_d:                    PID's derivative term in the power allocator
    |---integral_cutoff:        Offset above which errors are accumulated
    |---slope:                  Slope constant applied as linear extrapolation
    |---offset:                 Offset constant applied as linear extrapolation
    
    
# 更加详细的信息
***************************
* Thermal zone attributes *
***************************
​
type
    Strings which represent the thermal zone type.
    This is given by thermal zone driver as part of registration.
    E.g: "acpitz" indicates it's an ACPI thermal device.
    In order to keep it consistent with hwmon sys attribute; this should
    be a short, lowercase string, not containing spaces nor dashes.
    RO, Required
​
temp
    Current temperature as reported by thermal zone (sensor).
    Unit: millidegree Celsius
    RO, Required
​
mode
    One of the predefined values in [enabled, disabled].
    This file gives information about the algorithm that is currently
    managing the thermal zone. It can be either default kernel based
    algorithm or user space application.
    enabled     = enable Kernel Thermal management.
    disabled    = Preventing kernel thermal zone driver actions upon
              trip points so that user application can take full
              charge of the thermal management.
    RW, Optional
​
policy
    One of the various thermal governors used for a particular zone.
    RW, Required
​
available_policies
    Available thermal governors which can be used for a particular zone.
    RO, Required
​
trip_point_[0-*]_temp
    The temperature above which trip point will be fired.
    Unit: millidegree Celsius
    RO, Optional
​
trip_point_[0-*]_type
    Strings which indicate the type of the trip point.
    E.g. it can be one of critical, hot, passive, active[0-*] for ACPI
    thermal zone.
    RO, Optional
​
trip_point_[0-*]_hyst
    The hysteresis value for a trip point, represented as an integer
    Unit: Celsius
    RW, Optional
​
cdev[0-*]
    Sysfs link to the thermal cooling device node where the sys I/F
    for cooling device throttling control represents.
    RO, Optional
​
cdev[0-*]_trip_point
    The trip point in this thermal zone which cdev[0-*] is associated
    with; -1 means the cooling device is not associated with any trip
    point.
    RO, Optional
​
cdev[0-*]_weight
        The influence of cdev[0-*] in this thermal zone. This value
        is relative to the rest of cooling devices in the thermal
        zone. For example, if a cooling device has a weight double
        than that of other, it's twice as effective in cooling the
        thermal zone.
        RW, Optional
​
passive
    Attribute is only present for zones in which the passive cooling
    policy is not supported by native thermal driver. Default is zero
    and can be set to a temperature (in millidegrees) to enable a
    passive trip point for the zone. Activation is done by polling with
    an interval of 1 second.
    Unit: millidegrees Celsius
    Valid values: 0 (disabled) or greater than 1000
    RW, Optional
​
emul_temp
    Interface to set the emulated temperature method in thermal zone
    (sensor). After setting this temperature, the thermal zone may pass
    this temperature to platform emulation function if registered or
    cache it locally. This is useful in debugging different temperature
    threshold and its associated cooling action. This is write only node
    and writing 0 on this node should disable emulation.
    Unit: millidegree Celsius
    WO, Optional
​
      WARNING: Be careful while enabling this option on production systems,
      because userland can easily disable the thermal policy by simply
      flooding this sysfs node with low temperature values.
​
sustainable_power
    An estimate of the sustained power that can be dissipated by
    the thermal zone. Used by the power allocator governor. For
    more information see Documentation/thermal/power_allocator.txt
    Unit: milliwatts
    RW, Optional
​
k_po
    The proportional term of the power allocator governor's PID
    controller during temperature overshoot. Temperature overshoot
    is when the current temperature is above the "desired
    temperature" trip point. For more information see
    Documentation/thermal/power_allocator.txt
    RW, Optional
​
k_pu
    The proportional term of the power allocator governor's PID
    controller during temperature undershoot. Temperature undershoot
    is when the current temperature is below the "desired
    temperature" trip point. For more information see
    Documentation/thermal/power_allocator.txt
    RW, Optional
​
k_i
    The integral term of the power allocator governor's PID
    controller. This term allows the PID controller to compensate
    for long term drift. For more information see
    Documentation/thermal/power_allocator.txt
    RW, Optional
​
k_d
    The derivative term of the power allocator governor's PID
    controller. For more information see
    Documentation/thermal/power_allocator.txt
    RW, Optional
​
integral_cutoff
    Temperature offset from the desired temperature trip point
    above which the integral term of the power allocator
    governor's PID controller starts accumulating errors. For
    example, if integral_cutoff is 0, then the integral term only
    accumulates error when temperature is above the desired
    temperature trip point. For more information see
    Documentation/thermal/power_allocator.txt
    Unit: millidegree Celsius
    RW, Optional
​
slope
    The slope constant used in a linear extrapolation model
    to determine a hotspot temperature based off the sensor's
    raw readings. It is up to the device driver to determine
    the usage of these values.
    RW, Optional
​
offset
    The offset constant used in a linear extrapolation model
    to determine a hotspot temperature based off the sensor's
    raw readings. It is up to the device driver to determine
    the usage of these values.
    RW, Optional
​
*****************************
* Cooling device attributes *
*****************************
​
type
    String which represents the type of device, e.g:
    - for generic ACPI: should be "Fan", "Processor" or "LCD"
    - for memory controller device on intel_menlow platform:
      should be "Memory controller".
    RO, Required
​
max_state
    The maximum permissible cooling state of this cooling device.
    RO, Required
​
cur_state
    The current cooling state of this cooling device.
    The value can any integer numbers between 0 and max_state:
    - cur_state == 0 means no cooling
    - cur_state == max_state means the maximum cooling.
    RW, Required
​
typescript 复制代码
1|grus:/ # ls /sys/class/thermal/thermal_zone29/
available_policies cdev0_trip_point  integral_cutoff k_po offset        polling_delay subsystem         trip_point_0_hyst type
cdev0              cdev0_upper_limit k_d             k_pu passive_delay power         sustainable_power trip_point_0_temp uevent
cdev0_lower_limit  cdev0_weight      k_i             mode policy        slope         temp              trip_point_0_type
typescript 复制代码
grus:/ # ls /sys/class/thermal/thermal_zone7/
available_policies k_i  mode          policy        slope             temp              trip_point_0_type trip_point_1_type trip_point_2_type
integral_cutoff    k_po offset        polling_delay subsystem         trip_point_0_hyst trip_point_1_hyst trip_point_2_hyst type
k_d                k_pu passive_delay power         sustainable_power trip_point_0_temp trip_point_1_temp trip_point_2_temp uevent
  1. temp文件无疑就是记录温度数值的地方,这里的type记录了thermal_zone的类别,使用命令将所有的thermal_zone的类别取出:
bash 复制代码
grus:/ # find /sys/class/thermal/thermal_zone* | while read -r a; do cat $a/temp | awk '{printf $1 " " }'; cat $a/type | awk '{printf $1 " "}'; echo $a; done | sort -nr  
274000 soc /sys/class/thermal/thermal_zone5
75000 lmh-dcvs-01 /sys/class/thermal/thermal_zone35
75000 lmh-dcvs-00 /sys/class/thermal/thermal_zone36
57500 dual-gold-max-step /sys/class/thermal/thermal_zone31
56500 cpu1-gold-usr /sys/class/thermal/thermal_zone18
56500 cpu0-gold-usr /sys/class/thermal/thermal_zone17
54900 cpuss-0-usr /sys/class/thermal/thermal_zone13
54600 hexa-silv-max-step /sys/class/thermal/thermal_zone30
54600 cpu0-silver-usr /sys/class/thermal/thermal_zone9
53900 cpuss-1-usr /sys/class/thermal/thermal_zone14
53600 cpu4-silver-usr /sys/class/thermal/thermal_zone15
53300 cpu5-silver-usr /sys/class/thermal/thermal_zone16
53000 cpu2-silver-usr /sys/class/thermal/thermal_zone11
53000 cpu1-silver-usr /sys/class/thermal/thermal_zone10
52300 cpu3-silver-usr /sys/class/thermal/thermal_zone12
51700 mdm-dsp-usr /sys/class/thermal/thermal_zone22
51100 camera-usr /sys/class/thermal/thermal_zone26
50400 mmss-usr /sys/class/thermal/thermal_zone27
49400 mdm-core-usr /sys/class/thermal/thermal_zone28
49100 wlan-usr /sys/class/thermal/thermal_zone24
49100 pop-mem-step /sys/class/thermal/thermal_zone32
49100 gpu-virt-max-step /sys/class/thermal/thermal_zone29
49100 ddr-usr /sys/class/thermal/thermal_zone23
49100 compute-hvx-usr /sys/class/thermal/thermal_zone25
48800 gpu1-usr /sys/class/thermal/thermal_zone20
48800 gpu0-usr /sys/class/thermal/thermal_zone19
48800 aoss0-usr /sys/class/thermal/thermal_zone8
48500 aoss1-usr /sys/class/thermal/thermal_zone21
48500 aoss1-lowf /sys/class/thermal/thermal_zone34
48500 aoss0-lowf /sys/class/thermal/thermal_zone33
47789 pm660_tz /sys/class/thermal/thermal_zone6
47207 cam_therm0 /sys/class/thermal/thermal_zone39
43953 xo_therm /sys/class/thermal/thermal_zone38
43953 xo-therm-step /sys/class/thermal/thermal_zone37
43604 pa_therm1 /sys/class/thermal/thermal_zone42
41453 quiet_therm /sys/class/thermal/thermal_zone43
40813 slave_therm /sys/class/thermal/thermal_zone40
39000 bms /sys/class/thermal/thermal_zone45
39000 battery /sys/class/thermal/thermal_zone44
37000 pm660l_tz /sys/class/thermal/thermal_zone7
35677 conn_therm /sys/class/thermal/thermal_zone41
3666 vbat_too_low /sys/class/thermal/thermal_zone4
3666 vbat_low /sys/class/thermal/thermal_zone3
3666 vbat_adc /sys/class/thermal/thermal_zone2
1760 ibat-high /sys/class/thermal/thermal_zone0
240 ibat-vhigh /sys/class/thermal/thermal_zone1
​
  1. 可以看出,thermal_zone7的type为pm660l_tz,thermal_zone29的type为gpu-virt-max-step。当然除此之外还可以看出一个表示battery温度的zone:
arduino 复制代码
39000 battery /sys/class/thermal/thermal_zone44

这些传感器/虚拟传感器都对应一个组件,这个组件可能是CPU/GPU/Battery或其它,不同手机厂商和不同版本的android系统采取的策略不同,具体可以参考安卓9源码:thermal-helper.cpp

php 复制代码
// This is a golden set of thermal sensor type and their temperature types.
// Used when we read in sensor values.
const std::map<std::string, TemperatureType>
kValidThermalSensorTypeMap = {
    {"cpu0-silver-usr", TemperatureType::CPU},  // CPU0
    {"cpu1-silver-usr", TemperatureType::CPU},  // CPU1
    {"cpu2-silver-usr", TemperatureType::CPU},  // CPU2
    {"cpu3-silver-usr", TemperatureType::CPU},  // CPU3
    {"cpu0-gold-usr", TemperatureType::CPU},    // CPU4
    {"cpu1-gold-usr", TemperatureType::CPU},    // CPU5
    {"cpu2-gold-usr", TemperatureType::CPU},    // CPU6
    {"cpu3-gold-usr", TemperatureType::CPU},    // CPU7
    // GPU thermal sensors.
    {"gpu0-usr", TemperatureType::GPU},
    {"gpu1-usr", TemperatureType::GPU},
    // Battery thermal sensor.
    {"battery", TemperatureType::BATTERY},
    // USBC thermal sensor.
    {"usbc-therm-adc", TemperatureType::UNKNOWN},
    // Skin sensors.
    {"quiet-therm-adc", TemperatureType::SKIN},  // Used by EVT devices
    {"fps-therm-adc", TemperatureType::SKIN},    // Used by prod devices
};

经过和GPU-Z对比后,发现这个type为battery的thermal_zone就是用来衡量电池温度的。

以上的内容描述了用于获取cpu,gpu以及battery温度的thermal_zone,下面的内容是更深度地剖析这套温度获取的机制。

深究thermal_zone

比如gpu-virt-max-step的thermal_zone,这个概念其实在安卓内核源码中: qti_virtual_sensor.c:

ini 复制代码
​
static const struct virtual_sensor_data qti_virtual_sensors[] = {
    {
        .virt_zone_name = "gpu-virt-max-step",
        .num_sensors = 2,
        .sensor_names = {"gpu0-usr",
                "gpu1-usr"},
        .logic = VIRT_MAXIMUM,
    },
    ....
};

在这个结构中,virt_zone_name也就是thermal_zone的type,而sensor_names决定了数据的来源,在这里也就是数据来源是名为gpu0-usrgpu1-usr的两个温度传感器。

具体有多少个传感器,分别有什么作用,取决于SOC厂商,不过SOC厂商会将这些信息写入到安卓源码中。

高通SOC传感器

在安卓源码中,platform/hardware/qcom/ 下面列出了高通一些soc的硬件信息。

比如thermal_target.c就列出了在android13下sdm845处理器的一些传感器信息。

// TODO: pm660l_tz 和 battery的逻辑不是很清楚, 不过pm代表的是power management, 也就是Power Control IC。

总结

获取GPU, CPU以及battery等温度数据可以通过读取对应thermal_zone文件下的temp文件来获得,但是需要注意,由于不同手机厂商的方案不同,这些文件的type以及文件权限并不一定相同。目前看到的情况如下:

小米手机:

typescript 复制代码
可以正常读取 /sys/class/thermal/ 路径下的文件夹和文件。
​
例如小米9 SE:
grus:/ $ cat /sys/class/thermal/thermal_zone44/type
battery
​
grus:/ $ cat /sys/class/thermal/thermal_zone29/type
gpu-virt-max-step
​

华为手机:

bash 复制代码
/sys/class/thermal/ 路径下的文件夹和文件权限不够;
可以通过读取 /sys/devices/virtual/thermal/ 路径下的文件夹和文件获取数据。
​
例如华为p30:
HWELE:/ $ cat /sys/devices/virtual/thermal/thermal_zone3/type
Battery
​
HWELE:/ $ cat /sys/devices/virtual/thermal/thermal_zone9/type
gpu
​

IQOO手机:

typescript 复制代码
可以正常读取 /sys/class/thermal/ 路径下的文件夹和文件。
​
例如IQOO Z3:
PD2073:/ $ cat /sys/class/thermal/thermal_zone90/type
battery
​
PD2073:/ $ cat /sys/class/thermal/thermal_zone45/type
gpuss-max-step
​

OPPO手机:

typescript 复制代码
可以正常读取 /sys/class/thermal/ 路径下的文件夹和文件。
​
例如OPPO Reno:
OP46B1:/ $ cat /sys/class/thermal/thermal_zone44/type
battery
​
OP46B1:/ $ cat /sys/class/thermal/thermal_zone33/type
gpu-virt-max-step
​

Reference

www.reddit.com/r/Mi9T/comm...

android-review.linaro.org/plugins/git...

相关推荐
CCTV果冻爽1 小时前
Android 源码集成可卸载 APP
android
码农明明1 小时前
Android源码分析:从源头分析View事件的传递
android·操作系统·源码阅读
秋月霜风2 小时前
mariadb主从配置步骤
android·adb·mariadb
Python私教3 小时前
Python ORM 框架 SQLModel 快速入门教程
android·java·python
编程乐学4 小时前
基于Android Studio 蜜雪冰城(奶茶饮品点餐)—原创
android·gitee·android studio·大作业·安卓课设·奶茶点餐
problc5 小时前
Android中的引用类型:Weak Reference, Soft Reference, Phantom Reference 和 WeakHashMap
android
IH_LZH5 小时前
Broadcast:Android中实现组件及进程间通信
android·java·android studio·broadcast
去看全世界的云5 小时前
【Android】Handler用法及原理解析
android·java
机器之心6 小时前
o1 带火的 CoT 到底行不行?新论文引发了论战
android·人工智能
机器之心6 小时前
从架构、工艺到能效表现,全面了解 LLM 硬件加速,这篇综述就够了
android·人工智能