配对样本t检验

配对样本 t 检验 （Paired Sample t -test），也称为 成对样本 t 检验 或 配对 t 检验 ，用于比较 同一组对象在不同条件下的均值差异，以判断差异是否具有统计学意义。

1. 适用场景

配对样本 t 检验适用于 两组相关数据，常见情况包括：

前后测实验设计（Pre-test & Post-test）：如测试某种培训是否提升了学生的考试成绩（同一批学生测试前后的分数）。
同一对象不同条件下测量（Repeated Measures）：如测量同一批患者服药前后的血压变化。
配对实验设计（Matched Pairs）：如一组双胞胎兄弟分别接受不同的教学方法，比较他们的考试成绩。

2. 假设

零假设（H₀） ：两次测量的均值无显著差异，即 <math xmlns="http://www.w3.org/1998/Math/MathML"> μ d = 0 \mu_d = 0 </math>μd=0。
备择假设（H₁） ：两次测量的均值存在显著差异，即 <math xmlns="http://www.w3.org/1998/Math/MathML"> μ d ≠ 0 \mu_d \neq 0 </math>μd=0。

其中， <math xmlns="http://www.w3.org/1998/Math/MathML"> μ d \mu_d </math>μd 是两组配对数据的均值差。

3. 计算方法

(1) 计算差值

对于每对样本 <math xmlns="http://www.w3.org/1998/Math/MathML"> ( X i , Y i ) (X_i, Y_i) </math>(Xi,Yi)，计算差值：
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"> d i = X i − Y i d_i = X_i - Y_i </math>di=Xi−Yi

得到一组差值 <math xmlns="http://www.w3.org/1998/Math/MathML"> { d 1 , d 2 , ... , d n } \{ d_1, d_2, \dots, d_n \} </math>{d1,d2,...,dn}。

(2) 计算均值和标准差

计算差值的均值 <math xmlns="http://www.w3.org/1998/Math/MathML"> d ˉ \bar{d} </math>dˉ 和标准差 <math xmlns="http://www.w3.org/1998/Math/MathML"> s d s_d </math>sd：
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"> d ˉ = 1 n ∑ i = 1 n d i \bar{d} = \frac{1}{n} \sum_{i=1}^{n} d_i </math>dˉ=n1i=1∑ndi
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"> s d = ∑ ( d i − d ˉ ) 2 n − 1 s_d = \sqrt{\frac{\sum (d_i - \bar{d})^2}{n-1}} </math>sd=n−1∑(di−dˉ)2

(3) 计算 t 统计量

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"> t = d ˉ s d / n t = \frac{\bar{d}}{s_d / \sqrt{n}} </math>t=sd/n dˉ

其中， <math xmlns="http://www.w3.org/1998/Math/MathML"> n n </math>n 是配对样本的数量。

(4) 查表得 p 值

在自由度 <math xmlns="http://www.w3.org/1998/Math/MathML"> d f = n − 1 df = n - 1 </math>df=n−1 下查找 t 分布表，确定 p 值。

若 p 值小于显著性水平（如 0.05），则拒绝 H₀，说明两组均值存在显著差异。

4. 示例

假设某公司想评估培训对员工生产力的影响，测量培训前（X）和培训后（Y）10 名员工的产量：

员工	训练前 (X)	训练后 (Y)	差值 (d = X - Y)
1	50	55	-5
2	60	62	-2
3	45	50	-5
4	70	72	-2
5	65	68	-3
6	80	82	-2
7	75	78	-3
8	85	88	-3
9	90	93	-3
10	95	98	-3

计算均值差 <math xmlns="http://www.w3.org/1998/Math/MathML"> d ˉ = − 3.1 \bar{d} = -3.1 </math>dˉ=−3.1、标准差 <math xmlns="http://www.w3.org/1998/Math/MathML"> s d = 1.37 s_d = 1.37 </math>sd=1.37，然后计算 t 值，并根据 p 值判断是否显著。

5. 代码实现（Python）

python 复制代码

import scipy.stats as stats

# 训练前后数据
before = [50, 60, 45, 70, 65, 80, 75, 85, 90, 95]
after = [55, 62, 50, 72, 68, 82, 78, 88, 93, 98]

# 计算配对样本t检验
t_stat, p_value = stats.ttest_rel(before, after)

print(f"t 统计量: {t_stat:.4f}")
print(f"p 值: {p_value:.4f}")

如果 p 值 < 0.05，则说明培训前后存在显著差异。

6. 总结

用途：用于比较 同一组对象在不同条件下的均值差异。
关键点 ：
- 计算配对差值 <math xmlns="http://www.w3.org/1998/Math/MathML"> d i d_i </math>di
- 计算均值差 <math xmlns="http://www.w3.org/1998/Math/MathML"> d ˉ \bar{d} </math>dˉ 和标准差 <math xmlns="http://www.w3.org/1998/Math/MathML"> s d s_d </math>sd
- 计算 t 统计量并查找 p 值
适用于 ：重复测量实验、前后测设计、配对实验等。

如果你的数据是两组独立样本 ，应使用独立样本 t 检验 （Independent Sample t-test）。

员工	训练前 (X)	训练后 (Y)	差值 (d = X - Y)
1	50	55	-5
2	60	62	-2
3	45	50	-5
4	70	72	-2
5	65	68	-3
6	80	82	-2
7	75	78	-3
8	85	88	-3
9	90	93	-3
10	95	98	-3

员工	训练前 (X)	训练后 (Y)	差值 (d = X - Y)
1	50	55	-5
2	60	62	-2
3	45	50	-5
4	70	72	-2
5	65	68	-3
6	80	82	-2
7	75	78	-3
8	85	88	-3
9	90	93	-3
10	95	98	-3

员工	训练前 (X)	训练后 (Y)	差值 (d = X - Y)
1	50	55	-5
2	60	62	-2
3	45	50	-5
4	70	72	-2
5	65	68	-3
6	80	82	-2
7	75	78	-3
8	85	88	-3
9	90	93	-3
10	95	98	-3