AB实验假设检验方法：Delta Method

在货运AB实验的场景中，分流单元与分析单元不一定相同 。当分流单元与分析单元不一致时，常用的假设检验方法会使得指标方差被错误计算，影响统计量与p值的计算，对检验结果造成干扰。

以在订单分流实验中，检验 GTV （ Gross Transaction Value ） 配对率 差异的显著性为例。

该实验分流单位是订单ID，而GTV配对率的分析单位是交易单元，即¥1。此时分流单位与分析单位出现了不一致的现象。

当前GTV配对率的假设检验方法为单量校正法。定义检验统计量 <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z为

<math xmlns="http://www.w3.org/1998/Math/MathML"> Z = Δ r V a r ( Δ r ) = r 1 ^ − r 2 ^ r ^ ( 1 − r ^ ) ( 1 n 1 + 1 n 2 ) Z = \dfrac{\Delta r}{\sqrt{Var(\Delta r)}} = \dfrac{\widehat{r_1} - \widehat{r_2}}{\sqrt{\widehat{r}(1-\widehat{r})(\frac{1}{n_1}+\frac{1}{n_2})}} </math>Z=Var(Δr) Δr = r (1−r )(n11+n21) r1 −r2

其中变量及其指标含义分别对应下表。

本文提出了基于Delta Method的假设检验计算 ，通过构造 <math xmlns="http://www.w3.org/1998/Math/MathML"> V a r ( Δ r ) \sqrt{Var(\Delta r)} </math>Var(Δr) 的无偏估计，从而得到假设检验统计量，解决分流单元与分析单元不一致时，犯第一类错误概率过高的问题。同时会结合Bootstrap方法，比较两种纠偏方法的效率。

1、理论基础

1、Delta Method

Delta Method在统计里通常可用来求渐近分布。利用连续可导的函数，将一种或多种随机变量转化为新的随机变量，可用于求样本均值、样本矩函数等。

1. 公式

Delta Method的公式有单变量与多变量两种形式，多变量的公式如下：

对于多元随机变量序列 <math xmlns="http://www.w3.org/1998/Math/MathML"> { Y ( 1 ) , ⋯ , Y ( p ) } \{Y^{(1)}, \cdots, Y^{(p)}\} </math>{Y(1), ⋯, Y(p)}，定义 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ⃗ = [ Y ( 1 ) , ⋯ , Y ( p ) ] T \vec{Y}=[Y^{(1)}, \cdots, Y^{(p)}]^T </math>Y =[Y(1), ⋯, Y(p)]T，假设 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ⃗ \vec{Y} </math>Y 满足渐近正态性，则

<math xmlns="http://www.w3.org/1998/Math/MathML"> n [ Y ⃗ − θ ] ⟶ d N ( 0 , Σ ) \sqrt{n}[\vec{Y} - \theta] \stackrel{d}{\longrightarrow} N(0, \Sigma) </math>n [Y −θ]⟶dN(0,Σ)

其中 <math xmlns="http://www.w3.org/1998/Math/MathML"> θ ⃗ = [ θ 1 , ⋯ , θ n ] T \vec{\theta}=[\theta_1, \cdots, \theta_n]^T </math>θ =[θ1, ⋯, θn]T、 <math xmlns="http://www.w3.org/1998/Math/MathML"> Σ \Sigma </math>Σ分别为 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ⃗ \vec{Y} </math>Y 对应的期望与协方差。

给定多元函数 <math xmlns="http://www.w3.org/1998/Math/MathML"> g g </math>g，若 <math xmlns="http://www.w3.org/1998/Math/MathML"> g g </math>g在 <math xmlns="http://www.w3.org/1998/Math/MathML"> θ ⃗ \vec{\theta} </math>θ 处的梯度 <math xmlns="http://www.w3.org/1998/Math/MathML"> ∇ g ( θ ⃗ ) \nabla g(\vec{\theta}) </math>∇g(θ )存在且不为 <math xmlns="http://www.w3.org/1998/Math/MathML"> 0 ⃗ \vec{0} </math>0 ，则

<math xmlns="http://www.w3.org/1998/Math/MathML"> n [ g ( Y ⃗ ) − g ( θ ⃗ ) ] ⟶ d N ( 0 , ∇ T g ( θ ⃗ ) ⋅ Σ ⋅ ∇ g ( θ ⃗ ) ) \sqrt{n}\left[g(\vec{Y}) - g(\vec{\theta})\right] \stackrel{d}{\longrightarrow} N\left(0, \nabla^T g(\vec{\theta}) \cdot \Sigma \cdot \nabla g(\vec{\theta}) \right) </math>n [g(Y )−g(θ )]⟶dN(0,∇Tg(θ )⋅ Σ ⋅ ∇g(θ ))

梯度 <math xmlns="http://www.w3.org/1998/Math/MathML"> ∇ g ( θ ⃗ ) = [ ∂ g ∂ Y ( 1 ) ( θ 1 ) , ⋯ , ∂ g ∂ Y ( p ) ( θ p ) ] T \nabla g(\vec{\theta})=\left[\frac{\partial g}{\partial Y^{(1)}}(\theta_1), \cdots, \frac{\partial g}{\partial Y^{(p)}}(\theta_p) \right]^T </math>∇g(θ )=[∂Y(1)∂g(θ1),⋯, ∂Y(p)∂g(θp)]T

协方差矩阵

<math xmlns="http://www.w3.org/1998/Math/MathML"> Σ = [ V a r ( Y ( 1 ) ) ⋯ C o v ( Y ( 1 ) , Y ( p ) ) ⋮ ⋱ ⋮ C o v ( Y ( p ) , Y ( 1 ) ) ⋯ V a r ( Y ( p ) ) ] \Sigma = \begin{bmatrix} Var(Y^{(1)}) & \cdots& Cov(Y^{(1)}, Y^{(p)})\\ \vdots & \ddots& \vdots\\ Cov(Y^{(p)}, Y^{(1)}) & \cdots& Var(Y^{(p)}) \end{bmatrix} </math>Σ = Var(Y(1))⋮Cov(Y(p),Y(1))⋯ ⋱ ⋯ Cov(Y(1),Y(p))⋮Var(Y(p))

2. Delta Method在AB实验中的应用

以GTV配对率为例

GTV配对率是反映交易金额转化效率的指标，其计算公式为 GTV配对率 = 配对GTV / 执行单GTV 。在订单ID分流的实验中，业务方与分析师会对比不同实验分组间该指标的差异，从而判断策略是否能够提升交易效率。

2.1 分流单元与分析单元一致化

记GTV配对率为 <math xmlns="http://www.w3.org/1998/Math/MathML"> r r </math>r，改写公式

<math xmlns="http://www.w3.org/1998/Math/MathML"> r = G T V 配对率 = 配对 G T V 执行单 G T V = 配对 G T V / 执行单量执行单 G T V / 执行单量 r=GTV配对率 = \frac{配对GTV}{执行单GTV} = \frac{配对GTV/执行单量}{执行单GTV/执行单量} </math>r=GTV配对率=执行单GTV配对GTV= 执行单GTV/执行单量配对GTV/执行单量

定义 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ˉ \bar{X} </math>Xˉ为配对GTV /执行单量，代表每笔执行单对应的配对金额均值；

<math xmlns="http://www.w3.org/1998/Math/MathML"> Y ˉ \bar{Y} </math>Yˉ为执行单GTV/执行单量，即执行单的单均价格。

此时 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ˉ \bar{X} </math>Xˉ和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ˉ \bar{Y} </math>Yˉ都是分析单位为订单ID的指标，

GTV配对率也变成了两个分流单元与分析单元一致的指标之商。

2.2 利用Delta Method计算方差

<math xmlns="http://www.w3.org/1998/Math/MathML"> X ˉ \bar{X} </math>Xˉ和 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ˉ \bar{Y} </math>Yˉ是反映单均价的指标，根据中心极限定理，其分布渐近服从正态分布。

<math xmlns="http://www.w3.org/1998/Math/MathML"> n [ X ˉ − μ X ] ⟶ d N ( 0 , σ X 2 ) n [ Y ˉ − μ Y ] ⟶ d N ( 0 , σ Y 2 ) \sqrt{n}[\bar{X} - \mu_X] \stackrel{d}{\longrightarrow} N(0, \sigma_X^2) \sqrt{n}[\bar{Y} - \mu_Y] \stackrel{d}{\longrightarrow} N(0, \sigma_Y^2) </math>n [Xˉ−μX]⟶dN(0,σX2) n [Yˉ−μY]⟶dN(0,σY2)

定义函数 <math xmlns="http://www.w3.org/1998/Math/MathML"> g ( x , y ) = x / y g(x,y) = x/y </math>g(x,y)=x/y，则有 <math xmlns="http://www.w3.org/1998/Math/MathML"> r = g ( X ˉ , Y ˉ ) r=g(\bar{X}, \bar{Y}) </math>r=g(Xˉ,Yˉ)。

不难发现 <math xmlns="http://www.w3.org/1998/Math/MathML"> g g </math>g为连续可导函数，且梯度为 <math xmlns="http://www.w3.org/1998/Math/MathML"> ∇ g = [ 1 / y , − x / y 2 ] T \nabla g = [1/y, -x/y^2]^T </math>∇g = [1/y,−x/y2]T

利用Delta Method，可以求出 <math xmlns="http://www.w3.org/1998/Math/MathML"> r r </math>r的方差

<math xmlns="http://www.w3.org/1998/Math/MathML"> V a r ( r ) = V a r ( g ( X ˉ , Y ˉ ) ) = ∇ T g ( X ˉ , Y ˉ ) ⋅ Σ ⋅ ∇ g ( X ˉ , Y ˉ ) = [ 1 / Y ˉ − X ˉ / Y ˉ 2 ] [ V a r ( X ˉ ) C o v ( X ˉ , Y ˉ ) C o v ( X ˉ , Y ˉ ) V a r ( Y ˉ ) ] [ 1 / Y ˉ − X ˉ / Y ˉ 2 ] = V a r X ˉ Y ˉ 2 − 2 X ˉ Y ˉ 3 C o v ( X ˉ , Y ˉ ) + X ˉ 2 Y ˉ 4 V a r Y ˉ → 1 n μ X 2 μ Y 2 [ σ X 2 μ X 2 − 2 σ X Y μ X μ Y + σ Y 2 μ Y 2 ] \begin{aligned} Var(r) & = Var(g(\bar{X}, \bar{Y})) = \nabla^T g(\bar{X}, \bar{Y})\cdot\Sigma\cdot\nabla g(\bar{X}, \bar{Y}) \\ & = \begin{bmatrix} 1/\bar{Y}&-\bar{X}/\bar{Y}^2 \end{bmatrix} \begin{bmatrix} Var(\bar{X})&Cov(\bar{X},\bar{Y})\\ Cov(\bar{X},\bar{Y})&Var(\bar{Y}) \end{bmatrix} \begin{bmatrix} 1/\bar{Y}\\-\bar{X}/\bar{Y}^2 \end{bmatrix} \\ & = \dfrac{Var\bar{X}}{\bar{Y}^2} - 2\dfrac{\bar{X}}{\bar{Y}^3}Cov(\bar{X},\bar{Y}) + \dfrac{\bar{X}^2}{\bar{Y}^4}Var\bar{Y} \\ & \rightarrow \frac{1}{n}\frac{\mu_X^2}{\mu_Y^2} \left[\dfrac{\sigma_X^2}{\mu_X^2} - 2\dfrac{\sigma_{XY}}{\mu_X\mu_Y} + \dfrac{\sigma_Y^2}{\mu_Y^2} \right] \end{aligned} </math>Var(r) = Var(g(Xˉ,Yˉ)) = ∇Tg(Xˉ,Yˉ)⋅Σ⋅∇g(Xˉ,Yˉ) = [1/Yˉ−Xˉ/Yˉ2][Var(Xˉ)Cov(Xˉ,Yˉ)Cov(Xˉ,Yˉ)Var(Yˉ)][1/Yˉ−Xˉ/Yˉ2] = Yˉ2VarXˉ − 2Yˉ3XˉCov(Xˉ,Yˉ) + Yˉ4Xˉ2VarYˉ → n1μY2μX2[μX2σX2− 2μXμYσXY + μY2σY2 ]

若 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( X 1 , Y 1 ) , ⋯ , ( X n , Y n ) (X_1, Y_1), \cdots, (X_n, Y_n) </math>(X1,Y1),⋯,(Xn,Yn)为样本对，则可得到下列无偏估计

<math xmlns="http://www.w3.org/1998/Math/MathML"> { μ X ^ = 1 n ∑ i = 1 n X i μ Y ^ = 1 n ∑ i = 1 n Y i σ X 2 ^ = 1 n − 1 ∑ i = 1 n ( X i − μ X ^ ) 2 σ Y 2 ^ = 1 n − 1 ∑ i = 1 n ( Y i − μ Y ^ ) 2 σ X Y ^ = 1 n − 1 ∑ i = 1 n ( X i − μ X ^ ) ( Y i − μ Y ^ ) \left\{ \begin{aligned} \widehat{\mu_X}&=\frac{1}{n}\sum_{i=1}^nX_i\\ \widehat{\mu_Y}&=\frac{1}{n}\sum_{i=1}^nY_i\\ \widehat{\sigma_X^2}&=\frac{1}{n-1}\sum_{i=1}^n(X_i-\widehat{\mu_X})^2\\ \widehat{\sigma_Y^2}&=\frac{1}{n-1}\sum_{i=1}^n(Y_i-\widehat{\mu_Y})^2\\ \widehat{\sigma_{XY}}&=\frac{1}{n-1}\sum_{i=1}^n(X_i-\widehat{\mu_X})(Y_i-\widehat{\mu_Y})\\ \end{aligned} \right. </math>⎩ ⎨ ⎧μX μY σX2 σY2 σXY =n1i=1∑nXi=n1i=1∑nYi=n−11i=1∑n(Xi−μX )2=n−11i=1∑n(Yi−μY )2=n−11i=1∑n(Xi−μX )(Yi−μY )

将无偏估计带入计算公式，即可得到 <math xmlns="http://www.w3.org/1998/Math/MathML"> r r </math>r的方差估计 <math xmlns="http://www.w3.org/1998/Math/MathML"> V a r ( r ) ^ \widehat{Var(r)} </math>Var(r)

2.3 构造假设检验与计算统计量

在订单ID随机分流的AB实验中，记实验组的样本对为 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( X 1 ( t ) , Y 1 ( t ) ) , ⋯ , ( X n ( t ) , Y n ( t ) ) (X_1^{(t)}, Y_1^{(t)}), \cdots, (X_n^{(t)}, Y_n^{(t)}) </math>(X1(t),Y1(t)),⋯,(Xn(t),Yn(t))，GTV配对率 <math xmlns="http://www.w3.org/1998/Math/MathML"> r ( t ) ^ = ∑ i = 1 n X i ( t ) / ∑ i = 1 n Y i ( t ) \widehat{r^{(t)}}=\sum_{i=1}^nX_i^{(t)} \bigg/ \sum_{i=1}^nY_i^{(t)} </math>r(t) =∑i=1nXi(t)/∑i=1nYi(t)；对照组的样本对为 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( X 1 ( c ) , Y 1 ( c ) ) , ⋯ , ( X m ( c ) , Y m ( c ) ) (X_1^{(c)}, Y_1^{(c)}), \cdots, (X_m^{(c)}, Y_m^{(c)}) </math>(X1(c),Y1(c)),⋯,(Xm(c),Ym(c))，GTV配对率 <math xmlns="http://www.w3.org/1998/Math/MathML"> r ( c ) ^ = ∑ j = 1 m X i ( c ) / ∑ j = 1 m Y i ( c ) \widehat{r^{(c)}}=\sum_{j=1}^mX_i^{(c)} \bigg/ \sum_{j=1}^mY_i^{(c)} </math>r(c) =∑j=1mXi(c)/∑j=1mYi(c)。业务方想判断策略是否会对GTV配对率产生显著影响，故可建立假设检验

<math xmlns="http://www.w3.org/1998/Math/MathML"> H 0 : r ( t ) = r ( c ) H 1 : r ( t ) ≠ r ( c ) H_0:r^{(t)} = r^{(c)} \quad H_1:r^{(t)} \neq r^{(c)} </math>H0:r(t)=r(c) H1:r(t)=r(c)

利用2.2的公式推导，可以利用不同实验分组的样本矩估计，计算得到GTV配对率的方差估计 <math xmlns="http://www.w3.org/1998/Math/MathML"> V a r ( r ( t ) ^ ) , V a r ( r ( c ) ^ ) Var(\widehat{r^{(t)}}), Var(\widehat{r^{(c)}}) </math>Var(r(t) ),Var(r(c) )

在得到样本均值与方差估计后，即可得到检验统计量的公式为

<math xmlns="http://www.w3.org/1998/Math/MathML"> Z = Δ r ^ V a r Δ r ^ = r ( t ) ^ − r ( c ) ^ V a r ( r ( t ) ^ − r ( c ) ^ ) = r ( t ) ^ − r ( c ) ^ V a r ( r ( t ) ^ ) + V a r ( r ( c ) ^ ) Z=\dfrac{\widehat{\Delta r}}{\sqrt{\widehat{Var\Delta r}}} =\dfrac{\widehat{r^{(t)}}-\widehat{r^{(c)}}}{\sqrt{Var\left(\widehat{r^{(t)}}-\widehat{r^{(c)}}\right)}} =\dfrac{\widehat{r^{(t)}}-\widehat{r^{(c)}}}{\sqrt{Var(\widehat{r^{(t)}})+Var(\widehat{r^{(c)}})}} </math>Z=VarΔr Δr =Var(r(t) −r(c) ) r(t) −r(c) =Var(r(t) )+Var(r(c) ) r(t) −r(c)

在样本量充足的情况下， <math xmlns="http://www.w3.org/1998/Math/MathML"> Z Z </math>Z近似服从标准正态分布，因此可直接求得p值，并通过比较p值与 <math xmlns="http://www.w3.org/1998/Math/MathML"> α \alpha </math>α的大小判断GTV配对率差异是否显著。

2、Bootstrap方法

Bootstrap 是现代统计学较为流行的一种统计方法。通过对给定数据集进行有放回的重抽样以创建多个数据子集，生成统计量的经验分布，可以计算标准误差、构建置信区间并对多种类型的样本统计信息进行假设检验,其应用范围逐步扩大，是目前业界对工业化 ABTEST 实验效果常见的处理方法。

在实验中我们采用非参数的 Bootstrap 方法，其核心思想和基本步骤如下：

假设原始样本数量为 <math xmlns="http://www.w3.org/1998/Math/MathML"> N N </math>N,通过 有放回的重抽样 从原始样本中抽取 <math xmlns="http://www.w3.org/1998/Math/MathML"> N N </math>N个样本，此过程允许重复抽样；
根据抽出的样本计算给定的统计量的估计量 <math xmlns="http://www.w3.org/1998/Math/MathML"> θ ^ \hat{\theta} </math>θ^；
重复上述 <math xmlns="http://www.w3.org/1998/Math/MathML"> 次（一般大于 1000 ），得到次（一般大于1000），得到 </math>次（一般大于1000），得到个统计量 <math xmlns="http://www.w3.org/1998/Math/MathML"> θ 1 ^ , ⋯ , θ B ^ \hat{\theta_1},\cdots,\hat{\theta_B} </math>θ1^,⋯,θB^；
计算上述 <math xmlns="http://www.w3.org/1998/Math/MathML"> B B </math>B个统计量的估计量，如均值、方差等，得到原样本的均值与方差等统计量。

a. 利用Bootstrap方法得到的统计量均值的无偏估计为 <math xmlns="http://www.w3.org/1998/Math/MathML"> θ ^ ˉ = ∑ i = 1 B θ i ^ / B \bar{\hat{\theta}}=\sum_{i=1}^B\hat{\theta_i}/B </math>θ^ˉ=∑i=1Bθi^/B，方差的无偏估计为 <math xmlns="http://www.w3.org/1998/Math/MathML"> s e ^ 2 ( θ ^ ) = 1 B − 1 ∑ i = 1 B ( θ i ^ − θ ^ ˉ ) 2 \hat{se}^2(\hat{\theta})=\frac{1}{B-1}\sum_{i=1}^B(\hat{\theta_i}-\bar{\hat{\theta}})^2 </math>se^2(θ^)=B−11∑i=1B(θi^−θ^ˉ)2
计算置信区间，若实验组观测指标落在置信区间内，则说明不同实验分组无显著差异，否则说明差异显著。

a. 标准Bootstrap（SB）

标准Bootstrap方法计算CI是以bootstrap计算的出的样本均值、样本方差构造的标准CI的计算方法。根据中心极限定理， <math xmlns="http://www.w3.org/1998/Math/MathML"> θ ^ − E ( θ ^ ) s e ( θ ^ ) \frac{\hat{\theta}-E(\hat{\theta})}{se(\hat{\theta})} </math>se(θ^)θ^−E(θ^)近似服从标准正态分布，则标准Bootstrap的置信区间为

<math xmlns="http://www.w3.org/1998/Math/MathML"> ( θ ^ ˉ − z 1 − α / 2 ⋅ s e ^ ( θ ^ ) , θ ^ ˉ + z 1 − α / 2 ⋅ s e ^ ( θ ^ ) ) (\bar{\hat{\theta}}-z_{1-\alpha/2}\cdot\hat{se}(\hat{\theta}),\bar{\hat{\theta}}+z_{1-\alpha/2}\cdot\hat{se}(\hat{\theta})) </math>(θ^ˉ−z1−α/2⋅se^(θ^),θ^ˉ+z1−α/2⋅se^(θ^))

b. 百分位数 Bootstrap（PB）

百分位数的Bootstrap直接用 <math xmlns="http://www.w3.org/1998/Math/MathML"> θ ^ \hat{\theta} </math>θ^的分布来估计，在通过Bootstrap构造了 <math xmlns="http://www.w3.org/1998/Math/MathML"> θ ^ \hat{\theta} </math>θ^及其分布后，可直接利用其分位数构造置信区间，形式为

<math xmlns="http://www.w3.org/1998/Math/MathML"> ( θ ^ α / 2 , θ ^ 1 − α / 2 ) (\hat{\theta}{\alpha/2}, \hat{\theta}{1-\alpha/2}) </math>(θ^α/2, θ^1−α/2)

c. t百分位数Bootstrap（PTB）

t百分位数Bootstrap是对SB、PB的一种融合，后者对分布要求过高， （无偏且近似正态） ，通常可以得到比PB更加精确的CI。统计量 <math xmlns="http://www.w3.org/1998/Math/MathML"> θ ^ − E ( θ ^ ) s e ( θ ^ ) \frac{\hat{\theta}-E(\hat{\theta})}{se(\hat{\theta})} </math>se(θ^)θ^−E(θ^)服从自由度为 <math xmlns="http://www.w3.org/1998/Math/MathML"> B − B- </math>B−的学生t分布，令该分布的 <math xmlns="http://www.w3.org/1998/Math/MathML"> 1 − α / 2 1-\alpha/2 </math>1−α/2分位数 <math xmlns="http://www.w3.org/1998/Math/MathML"> t ⋆ = t 1 − α / 2 ( B − 1 ) t^{\star}=t_{1-\alpha/2}(B-1) </math>t⋆=t1−α/2(B−1)，则PTB的置信区间为

<math xmlns="http://www.w3.org/1998/Math/MathML"> ( θ ^ ˉ − t ⋆ ⋅ s e ^ ( θ ^ ) , θ ^ ˉ + t ⋆ ⋅ s e ^ ( θ ^ ) ) (\bar{\hat{\theta}}-t^{\star}\cdot\hat{se}(\hat{\theta}),\bar{\hat{\theta}}+t^{\star}\cdot\hat{se}(\hat{\theta})) </math>(θ^ˉ−t⋆⋅se^(θ^),θ^ˉ+t⋆⋅se^(θ^))

2、不同假设检验方法模拟数据对比

1. 订单ID随机分流

1.1 对比方案

为了更直观地理解使用错误的方差所带来的危害，利用当前真实数据模拟AA实验。实验信息如下：

评估指标	- 犯第一类错误的概率 - 不同指标差异显著性下的分布

1.2 对比结果

在GTV配对率差异的假设检验计算中，Delta Method可有效控制犯第一类错误的概率，相对Bootstrap效率更高。

Delta Method与分位数Bootstrap法中，犯第一类错误的概率与 <math xmlns="http://www.w3.org/1998/Math/MathML"> α \alpha </math>α取值接近。原始方法与单量校正法中，犯第一类错误的概率过大，原始方法的拒绝域区间宽度过窄，稍有偏差就会导致第一类错误的发生。

Bootstrap方法虽然准确，但复杂度过高，需要对全体样本有放回地抽样，计算时间长，相较之下Delta Method方法效率更高。

	犯第一类错误的次数	犯第一类错误的概率	置信区间长度	置信区间标准差
配对率差异检验	55	5.5%
原始方法	948	94.8%	8E-04	8E-09
单量校正法	431	43.1%	8E-03	0E+00
Delta Method	59	5.9%	2E-02	3E-05
分位数Bootstrap	58	5.8%	1E-02	3E-04

2. 用户ID随机分流

2.1 对比方案

实验信息如下：

评估指标	- 犯第一类错误的概率 - 不同指标差异显著性下的分布

2.2 对比结果

在用户下单配对率的假设检验计算中，Delta Method可有效控制犯第一类错误的概率，相对Bootstrap效率更高。

三种方法的MDE接近，但是犯第一类错误的概率差异较大。Delta Method与分位数Bootstrap法犯第一类错误的概率与 <math xmlns="http://www.w3.org/1998/Math/MathML"> α \alpha </math>α取值接近，而原始方法犯第一类错误的概率过大。

	犯第一类错误的次数	犯第一类错误的概率	置信区间长度	置信区间标准差
人均下单量差异检验	55	5.5%
原始方法	130	13.0%	8E-03	5E-08
Delta Method	38	3.8%	1E-02	2E-06
分位数Bootstrap	44	4.4%	5E-03	2E-04

3、实际案例应用

1. 订单ID实验分流的GTV配对率差异检验

某采取订单id分流的实验，实验组的GTV配对率正向0.4p.p.。

现需要判断： GTV配对率是否显著不为0

值	实验组	对照组	绝对差异
订单数	xxx	xxx	xxx
下单GTV（元）	xxx	xxx	xxx
配对GTV（元）	xxx	xxx	xxx
GTV配对率	xxx	xxx	0.4p.p.

实验的分流单位为订单ID，而用户下单配对率的分析单元为¥1，此处不一致。

定义 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ˉ \bar{X} </math>Xˉ为配对GTV /执行单量 、 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ˉ \bar{Y} </math>Yˉ为执行单GTV/执行单量、

函数 <math xmlns="http://www.w3.org/1998/Math/MathML"> g ( x , y ) = x / g(x,y) = x/ </math>g(x,y)=x/，则GTV配对率 <math xmlns="http://www.w3.org/1998/Math/MathML"> r = g ( X ˉ , Y ˉ ) r=g(\bar{X}, \bar{Y}) </math>r=g(Xˉ,Yˉ)。

可利用Delta Method对GTV配对率差异作显著性检验。

对比原始方法、单量校正法与Delta Method方法可得到下表：

原始方法的方差过小，进而导致统计量过大，很难得到差异不显著的结论，缺乏科学性。
Delta Method与单量校正法都得到不显著的结论，但Delta Method方差更大，检验结果相对更科学。

	方差	统计量	p_value	检验结果
原始方法	6E-09	60.00	0.00	显著
单量校正法	7E-06	1.60	0.11	不显著
Delta Method	2E-05	1.00	0.36	不显著

2. 用户ID实验分流的差异检验

某采取用户id分流的实验，实验组的用户下单配对率负向2p.p.，用户下单量多0.6%。

现需要判断：用户下单配对率差异是否显著不为0

值	实验组	对照组	绝对差异	相对差异
用户数	xxx	xxx	xxx	xxx
用户下单量	xxx	xxx	xxx	0.60%
用户配对单量	xxx	xxx	xxx	xxx
配对率	xxx	xxx	-2.p.p.	xxx

2.1 用户下单配对率

实验的分流单位为用户ID，而用户下单配对率的分析单元为订单ID，此处不一致。

定义 <math xmlns="http://www.w3.org/1998/Math/MathML"> X ˉ \bar{X} </math>Xˉ为人均配对单量 、 <math xmlns="http://www.w3.org/1998/Math/MathML"> Y ˉ \bar{Y} </math>Yˉ为人均执行单量 、函数 <math xmlns="http://www.w3.org/1998/Math/MathML"> g ( x , y ) = x / y g(x,y) = x/y </math>g(x,y)=x/y，则用户下单配对率 <math xmlns="http://www.w3.org/1998/Math/MathML"> r = g ( X ˉ , Y ˉ ) r=g(\bar{X}, \bar{Y}) </math>r=g(Xˉ,Yˉ)。

可利用Delta Method对指标差异作显著性检验。

原始方法直接对下单量、配对单量做z检验，检验结果为显著。而通过Delta Method方法估计方差后，方差变大，用户下单配对率差异由显著变为不显著。

	方差	统计量	p_value	检验结果
原始方法	1E-05	-7.00	0.00	显著
Delta Method	2E-04	-2.00	0.08	不显著

总结

在实验分析中，正确估计指标的方差 是确保可靠结果的关键。在分流单元与分析单元不一致的情况下，Bootstrap和Delta Method都可以提供相对科学的结论。但在处理大量数据时，Bootstrap的计算成本较高，而Delta Method在满足基本假设的前提下，能够以较低的计算成本准确估计方差，显示出更高的效率。