Android各种部件温度获取逻辑探究
Preface
事件的来源是做一个性能监测工具获取GPU,电池温度数据时反了snapdragon的pluginGPU-GGPM
的so,这个so对应会获取GPU General
的GPU theramal
也就是GP温度数据。snapdragon会把获取温度的文件名通过logcat打出来,如下:
erlang
sub_94FE0(v69, 1LL, "GGPMProvider", "GGPM_DP: Reading GPU Temperature from '/sys/class/kgsl/kgsl-3d0/temp'");
可以在snapdrgon连接上手机后从logcat日志出看出这一句:
less
grus:/ $ logcat | grep Temperature
11-23 11:01:53.264 17088 17088 I SDP : SDPCore.Metric: Metric 'GPU Temperature' activated for all processes
11-23 11:10:12.597 23936 23936 I SDP : GGPMProvider: GGPM_DP: Reading GPU Temperature from '/sys/devices/virtual/thermal/thermal_zone29/temp'
也就是说snapdrgon是从/sys/devices/virtual/thermal/thermal_zone29/temp
这个文件中拿出的GPU温度数据。
在这之前,我获取CPU温度数据的逻辑也是从一些文件中读取数据,比如:
ruby
"/sys/class/thermal/thermal_zone7/temp",
"/sys/devices/virtual/thermal/thermal_zone7/temp",
// "/sys/kernel/debug/tegra_thermal/temp_tj",
// "/sys/devices/platform/s5p-tmu/curr_temp",
// "/sys/devices/virtual/thermal/thermal_zone1/temp",//常用
// "/sys/devices/system/cpu/cpufreq/cput_attributes/cur_temp",
// "/sys/devices/virtual/hwmon/hwmon2/temp1_input",
// "/sys/devices/platform/coretemp.0/temp2_input",
// "/sys/devices/virtual/thermal/thermal_zone0/temp",
// "/sys/devices/system/cpu/cpu0/cpufreq/cpu_temp",
// "/sys/devices/platform/omap/omap_temp_sensor.0/temperature",
// "/sys/class/thermal/thermal_zone1/temp",
// "/sys/devices/platform/s5p-tmu/temperature",
// "/sys/devices/w1 bus master/w1_master_attempts",
// "/sys/class/thermal/thermal_zone0/temp"
"/sys/devices/virtual/thermal/thermal_zone0/temp",
"/sys/class/thermal/thermal_zone0/temp",
"/sys/kernel/debug/tegra_thermal/temp_tj",
"/sys/devices/platform/s5p-tmu/curr_temp",
"/sys/devices/virtual/thermal/thermal_zone1/temp",
"/sys/devices/system/cpu/cpufreq/cput_attributes/cur_temp",
"/sys/devices/virtual/hwmon/hwmon2/temp1_input",
"/sys/devices/platform/coretemp.0/temp2_input",
"/sys/devices/platform/omap/omap_temp_sensor.0/temperature",
"/sys/class/thermal/thermal_zone1/temp",
"/sys/devices/platform/s5p-tmu/temperature",
"/sys/devices/w1 bus master/w1_master_attempts",
"/sys/devices/system/cpu/cpu0/cpufreq/cpu_temp",
"/sys/devices/system/cpu/cpu0/cpufreq/FakeShmoo_cpu_temp",
"/sys/class/i2c-adapter/i2c-4/4-004c/temperature",
"/sys/devices/platform/tegra-i2c.3/i2c-4/4-004c/temperature",
"/sys/devices/platform/tegra_tmon/temp1_input",
"/sys/class/hwmon/hwmon0/device/temp1_input",
"/sys/devices/virtual/thermal/thermal_zone1/temp",
"/sys/class/thermal/thermal_zone3/temp",
"/sys/class/thermal/thermal_zone4/temp",
"/sys/class/hwmon/hwmonX/temp1_input",
"/sys/devices/platform/s5p-tmu/curr_temp"
可以看到,获取CPU温度中主要使用了/sys/devices/virtual/thermal/thermal_zone7/temp
。现在的问题是,这些thermal_zone到底有什么作用? 如何使用分辨不同的thermal_zone的作用?
Manuscript
为了解决上述的问题,有了下面的探究。
不同的厂家和设备之间可能存在着明显的兼容性问题,因此此处表明本次测试环境是XIAO MI9 SE, MIUI 12.0.3, Android 10。
初步认识thermal_zone
javascript
126|grus:/ $ ls /sys/class/thermal/
cooling_device0 cooling_device16 cooling_device7 thermal_zone12 thermal_zone2 thermal_zone27 thermal_zone34 thermal_zone41 thermal_zone8
cooling_device1 cooling_device17 cooling_device8 thermal_zone13 thermal_zone20 thermal_zone28 thermal_zone35 thermal_zone42 thermal_zone9
cooling_device10 cooling_device18 cooling_device9 thermal_zone14 thermal_zone21 thermal_zone29 thermal_zone36 thermal_zone43
cooling_device11 cooling_device2 thermal_message thermal_zone15 thermal_zone22 thermal_zone3 thermal_zone37 thermal_zone44
cooling_device12 cooling_device3 thermal_zone0 thermal_zone16 thermal_zone23 thermal_zone30 thermal_zone38 thermal_zone45
cooling_device13 cooling_device4 thermal_zone1 thermal_zone17 thermal_zone24 thermal_zone31 thermal_zone39 thermal_zone5
cooling_device14 cooling_device5 thermal_zone10 thermal_zone18 thermal_zone25 thermal_zone32 thermal_zone4 thermal_zone6
cooling_device15 cooling_device6 thermal_zone11 thermal_zone19 thermal_zone26 thermal_zone33 thermal_zone40 thermal_zone7
- 可以看出
/sys/class/thermal
目录下存在着很多文件夹,主要包括cooling_device
开头的以及thermal_zone
开头的。 - 进一步会发现,每一个
thermal_zone
文件下的文件都包括:temp
,type
,subsystem
等文件如下,关于这些文件的作用可以参考小米内核的描述: sysfs-api.txt。
vbnet
https://github.com/MiCode/Xiaomi_Kernel_OpenSource/blob/c218005419cfebd4332773623d464588752d7b11/Documentation/thermal/sysfs-api.txt#L264
Thermal zone device sys I/F, created once it's registered:
/sys/class/thermal/thermal_zone[0-*]:
|---type: Type of the thermal zone # 类别
|---temp: Current temperature # 温度
|---mode: Working mode of the thermal zone
|---policy: Thermal governor used for this zone
|---available_policies: Available thermal governors for this zone
|---trip_point_[0-*]_temp: Trip point temperature
|---trip_point_[0-*]_type: Trip point type
|---trip_point_[0-*]_hyst: Hysteresis value for this trip point
|---emul_temp: Emulated temperature set node
|---sustainable_power: Sustainable dissipatable power
|---k_po: Proportional term during temperature overshoot
|---k_pu: Proportional term during temperature undershoot
|---k_i: PID's integral term in the power allocator gov
|---k_d: PID's derivative term in the power allocator
|---integral_cutoff: Offset above which errors are accumulated
|---slope: Slope constant applied as linear extrapolation
|---offset: Offset constant applied as linear extrapolation
# 更加详细的信息
***************************
* Thermal zone attributes *
***************************
type
Strings which represent the thermal zone type.
This is given by thermal zone driver as part of registration.
E.g: "acpitz" indicates it's an ACPI thermal device.
In order to keep it consistent with hwmon sys attribute; this should
be a short, lowercase string, not containing spaces nor dashes.
RO, Required
temp
Current temperature as reported by thermal zone (sensor).
Unit: millidegree Celsius
RO, Required
mode
One of the predefined values in [enabled, disabled].
This file gives information about the algorithm that is currently
managing the thermal zone. It can be either default kernel based
algorithm or user space application.
enabled = enable Kernel Thermal management.
disabled = Preventing kernel thermal zone driver actions upon
trip points so that user application can take full
charge of the thermal management.
RW, Optional
policy
One of the various thermal governors used for a particular zone.
RW, Required
available_policies
Available thermal governors which can be used for a particular zone.
RO, Required
trip_point_[0-*]_temp
The temperature above which trip point will be fired.
Unit: millidegree Celsius
RO, Optional
trip_point_[0-*]_type
Strings which indicate the type of the trip point.
E.g. it can be one of critical, hot, passive, active[0-*] for ACPI
thermal zone.
RO, Optional
trip_point_[0-*]_hyst
The hysteresis value for a trip point, represented as an integer
Unit: Celsius
RW, Optional
cdev[0-*]
Sysfs link to the thermal cooling device node where the sys I/F
for cooling device throttling control represents.
RO, Optional
cdev[0-*]_trip_point
The trip point in this thermal zone which cdev[0-*] is associated
with; -1 means the cooling device is not associated with any trip
point.
RO, Optional
cdev[0-*]_weight
The influence of cdev[0-*] in this thermal zone. This value
is relative to the rest of cooling devices in the thermal
zone. For example, if a cooling device has a weight double
than that of other, it's twice as effective in cooling the
thermal zone.
RW, Optional
passive
Attribute is only present for zones in which the passive cooling
policy is not supported by native thermal driver. Default is zero
and can be set to a temperature (in millidegrees) to enable a
passive trip point for the zone. Activation is done by polling with
an interval of 1 second.
Unit: millidegrees Celsius
Valid values: 0 (disabled) or greater than 1000
RW, Optional
emul_temp
Interface to set the emulated temperature method in thermal zone
(sensor). After setting this temperature, the thermal zone may pass
this temperature to platform emulation function if registered or
cache it locally. This is useful in debugging different temperature
threshold and its associated cooling action. This is write only node
and writing 0 on this node should disable emulation.
Unit: millidegree Celsius
WO, Optional
WARNING: Be careful while enabling this option on production systems,
because userland can easily disable the thermal policy by simply
flooding this sysfs node with low temperature values.
sustainable_power
An estimate of the sustained power that can be dissipated by
the thermal zone. Used by the power allocator governor. For
more information see Documentation/thermal/power_allocator.txt
Unit: milliwatts
RW, Optional
k_po
The proportional term of the power allocator governor's PID
controller during temperature overshoot. Temperature overshoot
is when the current temperature is above the "desired
temperature" trip point. For more information see
Documentation/thermal/power_allocator.txt
RW, Optional
k_pu
The proportional term of the power allocator governor's PID
controller during temperature undershoot. Temperature undershoot
is when the current temperature is below the "desired
temperature" trip point. For more information see
Documentation/thermal/power_allocator.txt
RW, Optional
k_i
The integral term of the power allocator governor's PID
controller. This term allows the PID controller to compensate
for long term drift. For more information see
Documentation/thermal/power_allocator.txt
RW, Optional
k_d
The derivative term of the power allocator governor's PID
controller. For more information see
Documentation/thermal/power_allocator.txt
RW, Optional
integral_cutoff
Temperature offset from the desired temperature trip point
above which the integral term of the power allocator
governor's PID controller starts accumulating errors. For
example, if integral_cutoff is 0, then the integral term only
accumulates error when temperature is above the desired
temperature trip point. For more information see
Documentation/thermal/power_allocator.txt
Unit: millidegree Celsius
RW, Optional
slope
The slope constant used in a linear extrapolation model
to determine a hotspot temperature based off the sensor's
raw readings. It is up to the device driver to determine
the usage of these values.
RW, Optional
offset
The offset constant used in a linear extrapolation model
to determine a hotspot temperature based off the sensor's
raw readings. It is up to the device driver to determine
the usage of these values.
RW, Optional
*****************************
* Cooling device attributes *
*****************************
type
String which represents the type of device, e.g:
- for generic ACPI: should be "Fan", "Processor" or "LCD"
- for memory controller device on intel_menlow platform:
should be "Memory controller".
RO, Required
max_state
The maximum permissible cooling state of this cooling device.
RO, Required
cur_state
The current cooling state of this cooling device.
The value can any integer numbers between 0 and max_state:
- cur_state == 0 means no cooling
- cur_state == max_state means the maximum cooling.
RW, Required
typescript
1|grus:/ # ls /sys/class/thermal/thermal_zone29/
available_policies cdev0_trip_point integral_cutoff k_po offset polling_delay subsystem trip_point_0_hyst type
cdev0 cdev0_upper_limit k_d k_pu passive_delay power sustainable_power trip_point_0_temp uevent
cdev0_lower_limit cdev0_weight k_i mode policy slope temp trip_point_0_type
typescript
grus:/ # ls /sys/class/thermal/thermal_zone7/
available_policies k_i mode policy slope temp trip_point_0_type trip_point_1_type trip_point_2_type
integral_cutoff k_po offset polling_delay subsystem trip_point_0_hyst trip_point_1_hyst trip_point_2_hyst type
k_d k_pu passive_delay power sustainable_power trip_point_0_temp trip_point_1_temp trip_point_2_temp uevent
temp
文件无疑就是记录温度数值的地方,这里的type
记录了thermal_zone的类别,使用命令将所有的thermal_zone的类别取出:
bash
grus:/ # find /sys/class/thermal/thermal_zone* | while read -r a; do cat $a/temp | awk '{printf $1 " " }'; cat $a/type | awk '{printf $1 " "}'; echo $a; done | sort -nr
274000 soc /sys/class/thermal/thermal_zone5
75000 lmh-dcvs-01 /sys/class/thermal/thermal_zone35
75000 lmh-dcvs-00 /sys/class/thermal/thermal_zone36
57500 dual-gold-max-step /sys/class/thermal/thermal_zone31
56500 cpu1-gold-usr /sys/class/thermal/thermal_zone18
56500 cpu0-gold-usr /sys/class/thermal/thermal_zone17
54900 cpuss-0-usr /sys/class/thermal/thermal_zone13
54600 hexa-silv-max-step /sys/class/thermal/thermal_zone30
54600 cpu0-silver-usr /sys/class/thermal/thermal_zone9
53900 cpuss-1-usr /sys/class/thermal/thermal_zone14
53600 cpu4-silver-usr /sys/class/thermal/thermal_zone15
53300 cpu5-silver-usr /sys/class/thermal/thermal_zone16
53000 cpu2-silver-usr /sys/class/thermal/thermal_zone11
53000 cpu1-silver-usr /sys/class/thermal/thermal_zone10
52300 cpu3-silver-usr /sys/class/thermal/thermal_zone12
51700 mdm-dsp-usr /sys/class/thermal/thermal_zone22
51100 camera-usr /sys/class/thermal/thermal_zone26
50400 mmss-usr /sys/class/thermal/thermal_zone27
49400 mdm-core-usr /sys/class/thermal/thermal_zone28
49100 wlan-usr /sys/class/thermal/thermal_zone24
49100 pop-mem-step /sys/class/thermal/thermal_zone32
49100 gpu-virt-max-step /sys/class/thermal/thermal_zone29
49100 ddr-usr /sys/class/thermal/thermal_zone23
49100 compute-hvx-usr /sys/class/thermal/thermal_zone25
48800 gpu1-usr /sys/class/thermal/thermal_zone20
48800 gpu0-usr /sys/class/thermal/thermal_zone19
48800 aoss0-usr /sys/class/thermal/thermal_zone8
48500 aoss1-usr /sys/class/thermal/thermal_zone21
48500 aoss1-lowf /sys/class/thermal/thermal_zone34
48500 aoss0-lowf /sys/class/thermal/thermal_zone33
47789 pm660_tz /sys/class/thermal/thermal_zone6
47207 cam_therm0 /sys/class/thermal/thermal_zone39
43953 xo_therm /sys/class/thermal/thermal_zone38
43953 xo-therm-step /sys/class/thermal/thermal_zone37
43604 pa_therm1 /sys/class/thermal/thermal_zone42
41453 quiet_therm /sys/class/thermal/thermal_zone43
40813 slave_therm /sys/class/thermal/thermal_zone40
39000 bms /sys/class/thermal/thermal_zone45
39000 battery /sys/class/thermal/thermal_zone44
37000 pm660l_tz /sys/class/thermal/thermal_zone7
35677 conn_therm /sys/class/thermal/thermal_zone41
3666 vbat_too_low /sys/class/thermal/thermal_zone4
3666 vbat_low /sys/class/thermal/thermal_zone3
3666 vbat_adc /sys/class/thermal/thermal_zone2
1760 ibat-high /sys/class/thermal/thermal_zone0
240 ibat-vhigh /sys/class/thermal/thermal_zone1
- 可以看出,thermal_zone7的type为
pm660l_tz
,thermal_zone29的type为gpu-virt-max-step
。当然除此之外还可以看出一个表示battery温度的zone:
arduino
39000 battery /sys/class/thermal/thermal_zone44
这些传感器/虚拟传感器都对应一个组件,这个组件可能是CPU/GPU/Battery或其它,不同手机厂商和不同版本的android系统采取的策略不同,具体可以参考安卓9源码:thermal-helper.cpp。
php
// This is a golden set of thermal sensor type and their temperature types.
// Used when we read in sensor values.
const std::map<std::string, TemperatureType>
kValidThermalSensorTypeMap = {
{"cpu0-silver-usr", TemperatureType::CPU}, // CPU0
{"cpu1-silver-usr", TemperatureType::CPU}, // CPU1
{"cpu2-silver-usr", TemperatureType::CPU}, // CPU2
{"cpu3-silver-usr", TemperatureType::CPU}, // CPU3
{"cpu0-gold-usr", TemperatureType::CPU}, // CPU4
{"cpu1-gold-usr", TemperatureType::CPU}, // CPU5
{"cpu2-gold-usr", TemperatureType::CPU}, // CPU6
{"cpu3-gold-usr", TemperatureType::CPU}, // CPU7
// GPU thermal sensors.
{"gpu0-usr", TemperatureType::GPU},
{"gpu1-usr", TemperatureType::GPU},
// Battery thermal sensor.
{"battery", TemperatureType::BATTERY},
// USBC thermal sensor.
{"usbc-therm-adc", TemperatureType::UNKNOWN},
// Skin sensors.
{"quiet-therm-adc", TemperatureType::SKIN}, // Used by EVT devices
{"fps-therm-adc", TemperatureType::SKIN}, // Used by prod devices
};
经过和GPU-Z对比后,发现这个type为battery
的thermal_zone就是用来衡量电池温度的。
以上的内容描述了用于获取cpu,gpu以及battery温度的thermal_zone,下面的内容是更深度地剖析这套温度获取的机制。
深究thermal_zone
比如gpu-virt-max-step
的thermal_zone,这个概念其实在安卓内核源码中: qti_virtual_sensor.c:
ini
static const struct virtual_sensor_data qti_virtual_sensors[] = {
{
.virt_zone_name = "gpu-virt-max-step",
.num_sensors = 2,
.sensor_names = {"gpu0-usr",
"gpu1-usr"},
.logic = VIRT_MAXIMUM,
},
....
};
在这个结构中,virt_zone_name
也就是thermal_zone的type,而sensor_names
决定了数据的来源,在这里也就是数据来源是名为gpu0-usr
和gpu1-usr
的两个温度传感器。
具体有多少个传感器,分别有什么作用,取决于SOC厂商,不过SOC厂商会将这些信息写入到安卓源码中。
高通SOC传感器
在安卓源码中,platform/hardware/qcom/ 下面列出了高通一些soc的硬件信息。
比如thermal_target.c就列出了在android13下sdm845处理器的一些传感器信息。
// TODO: pm660l_tz 和 battery的逻辑不是很清楚, 不过pm代表的是power management, 也就是Power Control IC。
总结
获取GPU, CPU以及battery等温度数据可以通过读取对应thermal_zone文件下的temp文件来获得,但是需要注意,由于不同手机厂商的方案不同,这些文件的type以及文件权限并不一定相同。目前看到的情况如下:
小米手机:
typescript
可以正常读取 /sys/class/thermal/ 路径下的文件夹和文件。
例如小米9 SE:
grus:/ $ cat /sys/class/thermal/thermal_zone44/type
battery
grus:/ $ cat /sys/class/thermal/thermal_zone29/type
gpu-virt-max-step
华为手机:
bash
/sys/class/thermal/ 路径下的文件夹和文件权限不够;
可以通过读取 /sys/devices/virtual/thermal/ 路径下的文件夹和文件获取数据。
例如华为p30:
HWELE:/ $ cat /sys/devices/virtual/thermal/thermal_zone3/type
Battery
HWELE:/ $ cat /sys/devices/virtual/thermal/thermal_zone9/type
gpu
IQOO手机:
typescript
可以正常读取 /sys/class/thermal/ 路径下的文件夹和文件。
例如IQOO Z3:
PD2073:/ $ cat /sys/class/thermal/thermal_zone90/type
battery
PD2073:/ $ cat /sys/class/thermal/thermal_zone45/type
gpuss-max-step
OPPO手机:
typescript
可以正常读取 /sys/class/thermal/ 路径下的文件夹和文件。
例如OPPO Reno:
OP46B1:/ $ cat /sys/class/thermal/thermal_zone44/type
battery
OP46B1:/ $ cat /sys/class/thermal/thermal_zone33/type
gpu-virt-max-step