一则小故事
在股票市场,曾经simple moving averages (SMA ), weighted moving averages (WMA ), and exponential moving averages (EMA) 三种算法被用于计算一段时间内股票价格波动。下面是三种算法的举例:
- Example) Closing prices for the last 6 days (raw data): 1020, 1030, 1000, 1010, 1020, 1060
- 3-day simple moving average ( SMA )
- Evenly reflects past and recent data
- = (1010 + 1020 + 1060) / 3
- = 1030
- 3-day weighted moving average ( WMA )
- How to give high weight to recent data (e.g. w1=1/6, w2=2/6, w3=3/6)
- = 1010 * w1 + 1020 * w2 + 1060 * w3
- = 168 + 340 + 530
- = 1038
- 3-day exponential moving average ( EMA : Exponential Moving Average or EWMA : Exponential Weighted Moving Average)
- It is simply calculated using only the previous day's exponential moving average and today's value, giving more weight to recent data.
- Using the exponential smoothing factor (Exponential Percentage) k, multiply 1-k by the previous day's exponential moving average and then multiply today's value by k.
- The smoothing coefficient uses the following exponential function model: (n=period)
- k = a * (b^n)
- The smoothing coefficient uses the following exponential function model: (n=period)
- The exponential smoothing factor (k) varies depending on the environment.
- The smoothing coefficient k of the stock market is determined as follows using the estimation method by the exponential mean period.
- (a=2, b=1/(n+1))
- k= 2 * 1/(n+1) = 2/(n+1)
- Example: n=2 days, k=2/(n+1)=0.666666...
- Example: n=3 days, k=2/(n+1)=0.5
- Example: n=10 days, k=2/(n+1)=0.181818...
- It is calculated by reflecting the daily exponential moving average value * (1-k) + today's value * (k).
- Day 1: 1020 = 100% reflection
- Day 2: Reflecting the 2-day moving average = 1020 * 33% + 1030 * 66% = 1026
- Day 3: Reflecting the 3-day moving average = 1026 * 50% + 1000 * 50% = 1013
- Day 4: Reflecting the 3-day moving average = 1013 * 50% + 1010 * 50% = 1012
- Day 5: Reflecting the 3-day moving average = 1012 * 50% + 1020 * 50% = 1016
- Day 6: Reflecting the 3-day moving average = 1016 * 50% + 1060 * 50% = 1038
- 3-day simple moving average ( SMA )
从上面的例子可以看出:
1、SMA算法是计算了前三天价格的完全平均;
2、WMA算法也是只计算了前三天价格的平均,不过加了一个系数(越近的系数越大)。
3、而EMA算法虽然也是只计算了3天的移动平均,但是根据平滑参数k既考虑了越近参考意义越大因素,也将前面所有数据都利用了起来,让历史的数据也能贡献价值。同时,对存储的消耗也只是三天的数据。
Linux中计算cpu load的算法,采用了类似EMA的算法,或者叫做EDA : Exponential Decaying Average, EDMA : Exponential Damped Moving Average
Global CPU load的更新时机
- Fixed tick handler
- tick_handle_periodic() -> tick_periodic() -> do_timer() -> calc_global_load()
- tick nohz handler
- tick_nohz_handler() -> tick_sched_do_timer() -> tick_do_update_jiffies64() -> do_timer() -> calc_global_load()
- tick_nohz_update_jiffies() -> tick_do_update_jiffies64() -> do_timer() -> calc_global_load()
- tick_nohz_restart_sched_tick() -> tick_do_update_jiffies64() -> do_timer() -> calc_global_load()
- hrtimer tick handler
- tick_sched_timer() -> tick_sched_do_timer() -> tick_do_update_jiffies64() -> do_timer() -> calc_global_load()
calc_global_load()虽然在每次schedule tick的时候调用,但是并不是每次都更新,而是有一个更新周期------calc_load_update: default 5 seconds。准确的讲,应该是calc_load_update + 10ticks:
/*
* calc_load - update the avenrun load estimates 10 ticks after the
* CPUs have updated calc_load_tasks.
*
* Called from the global timer code.
*/
void calc_global_load(unsigned long ticks)
{
unsigned long sample_window;
long active, delta;
sample_window = READ_ONCE(calc_load_update);
if (time_before(jiffies, sample_window + 10))
return;
......
}
Global CPU load的计算方法
对于负载值,举个例子:如果有两个runnable的task运行在一个cpu的系统上,则负载值为2.0;如果系统有4个cpu,仍然只有2个runnable的task,则负载值就是0.5。
Linux内核中计算全局平均负载的算法如kernel/sched/loadavg.c文件开头的注释:
/*
* Global load-average calculations
*
* We take a distributed and async approach to calculating the global load-avg
* in order to minimize overhead.
*
* The global load average is an exponentially decaying average of nr_running +
* nr_uninterruptible.
*
* Once every LOAD_FREQ:
*
* nr_active = 0;
* for_each_possible_cpu(cpu)
* nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;
*
* avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)
*
* Due to a number of reasons the above turns in the mess below:
*
* - for_each_possible_cpu() is prohibitively expensive on machines with
* serious number of CPUs, therefore we need to take a distributed approach
* to calculating nr_active.
*
* \Sum_i x_i(t) = \Sum_i x_i(t) - x_i(t_0) | x_i(t_0) := 0
* = \Sum_i { \Sum_j=1 x_i(t_j) - x_i(t_j-1) }
*
* So assuming nr_active := 0 when we start out -- true per definition, we
* can simply take per-CPU deltas and fold those into a global accumulate
* to obtain the same result. See calc_load_fold_active().
*
* Furthermore, in order to avoid synchronizing all per-CPU delta folding
* across the machine, we assume 10 ticks is sufficient time for every
* CPU to have completed this task.
*
* This places an upper-bound on the IRQ-off latency of the machine. Then
* again, being late doesn't loose the delta, just wrecks the sample.
*
* - cpu_rq()->nr_uninterruptible isn't accurately tracked per-CPU because
* this would add another cross-CPU cacheline miss and atomic operation
* to the wakeup path. Instead we increment on whatever CPU the task ran
* when it went into uninterruptible state and decrement on whatever CPU
* did the wakeup. This means that only the sum of nr_uninterruptible over
* all CPUs yields the correct result.
*
* This covers the NO_HZ=n code, for extra head-aches, see the comment below.
*/
如注释所说,原本linux中计算负载的公式是:
avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)
但是,对于cpu数量比较大的系统,其要计算系统全局的nr_active是比较耗时的。所以,采用了改进的算法:
首先,初始设置nr_active=0(系统启动时),此时的负载当然就是0;
然后,每个schedule tick来时,分别计算percpu的nr_active的增减delta,并记录在全局的变量calc_load_tasks中,参见calc_load_fold_active()函数;
其次,由于更新全局变量calc_load_tasks需要原子的进行,多个cpu也需要同步,所以,这里预留了10 ticks(这也是上一章节calc_load_update+10 ticks的原因)
最后,由于把计算nr_active的工作分散到了平时,当calc_load_update周期到后,就可以直接读取calc_load_tasks全局变量,来作为nr_active。
Linux系统为用户提供了三个最近的平均负载值,分别是last 1 minute, 5 minutes, and 15 minutes的平均负载,记录在avenrun[3]全局数组中。用户可以通过"uptime"命令,或"/proc/loadavg"文件来查看:
$ uptime
20:41:18 up 1:09, 2 users, load average: 0.57, 0.15, 0.11
$ cat /proc/loadavg
0.55 0.28 0.16 1/318 2112
这三个平均负载的计算公式如下:
- nr_active = Add the number of running and uninterruptible tasks in the runqueue of each CPU.
- avenrun[0] = avenrun[0]*k1 + nr_active*(1-k1) //1分钟
- avenrun[1] = avenrun[1]*k2 + nr_active*(1-k2) //5分钟
- avenrun[2] = avenrun[2]*k3 + nr_active*(1-k3) //15分钟
公式中k=e^(-1/n),其中n是周期。如avenrun[0]中记录的是1分钟,每5s更新一次,相当于12个周期。avenrun[1]相当于60周期、avenrun[2]相当于180周期。所以:
- n=12,k=e^(-1/12) = 0.920044 (about 92%);
- n=60, k = 0.9835;
- n=180, k=0.9945 。
注意 :这里的nr_active不仅包含了runqueue中runnable的tasks,还包含了uninterruptible sleep的tasks。这具体的缘由,可以参见:
https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
在Linux代码实现时,为了计算精度,内核使用了2^11=2048来代表真实负载值1.0,并hardcode了系统的这三个平均负载的factor值:
#define FSHIFT 11 /* nr of bits of precision */
#define FIXED_1 (1<<FSHIFT) /* 1.0 as fixed-point */
#define LOAD_FREQ (5*HZ+1) /* 5 sec intervals */
#define EXP_1 1884 /* 1/exp(5sec/1min) as fixed-point */
#define EXP_5 2014 /* 1/exp(5sec/5min) */
#define EXP_15 2037 /* 1/exp(5sec/15min) */
其中,
- EXP_1 = 1/(exp(1/12)) * FIXED_1 = 92.00% * 2048 = 1884
- EXP_5 = 1/(exp(1/60)) * FIXED_1 = 98.35% * 2048 = 2014
- EXP_15 = 1/(exp(1/180)) * FIXED_1 = 99.45% * 2048 = 2037
而代码中的计算公式,则稍微变形为:

最后加的一个2047,看起来像是一个向上取整的操作,为了体现在active>load时,负载上升的趋势不至于被mod操作截断。
/*
* a1 = a0 * e + a * (1 - e)
*/
static inline unsigned long
calc_load(unsigned long load, unsigned long exp, unsigned long active)
{
unsigned long newload;
newload = load * exp + active * (FIXED_1 - exp);
if (active >= load)
newload += FIXED_1-1;
return newload / FIXED_1;
}