利用python statsmodels包分析数据

原文档地址：https://www.statsmodels.org/stable/index.html

下载statsmodels安装包

aaa@kylin-pc:~/par$ python3 loong/pip-24.0.pyz download statsmodels -d 313 -i https://mirrors.aliyun.com/pypi/simple/ --platform manylinux2014_aarch64 --only-binary=:all: --python-version 3.13 --default-timeout=160
...
Successfully downloaded statsmodels numpy packaging pandas patsy scipy python-dateutil pytz tzdata six
安装statsmodels

aaa@kylin-pc:~/par $cd tpy313 aaa@kylin-pc:~/par/tpy313$ source myenv/bin/activate

(myenv) aaa@kylin-pc:~/par/tpy313$ pip install --no-index -f 313 statsmodels
...
Successfully installed pandas-2.3.2 patsy-1.0.2 pytz-2026.1.post1 scipy-1.16.3 statsmodels-0.14.6 tzdata-2026.1
执行文档中的例子，需要联网

(myenv) aaa@kylin-pc:~/par/tpy313$ python3
Python 3.13.13 (main, Apr 7 2026, 20:43:47) [Clang 22.1.1 ] on linux
Type "help", "copyright", "credits" or "license" for more information.

import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
dat = sm.datasets.get_rdataset("Guerry", "HistData").data
dat
dept Region Department Crime_pers Crime_prop Literacy Donations Infants ... Donation_clergy Lottery Desertion Instruction Prostitutes Distance Area Pop1831
0 1 E Ain 28870 15890 37 5098 33120 ... 69 41 55 46 13 218.372 5762 346.03
1 2 N Aisne 26226 5521 51 8901 14572 ... 36 38 82 24 327 65.945 7369 513.00
2 3 C Allier 26747 7925 13 10973 17044 ... 76 66 16 85 34 161.927 7340 298.26
3 4 E Basses-Alpes 12935 7289 46 2733 23018 ... 37 80 32 29 2 351.399 6925 155.90
4 5 E Hautes-Alpes 17488 8174 69 6962 23076 ... 64 79 35 7 1 320.280 5549 129.10
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
81 86 W Vienne 15010 4710 25 8922 35224 ... 44 40 38 65 18 170.523 6990 282.73
82 87 C Haute-Vienne 16256 6402 13 13817 19940 ... 78 55 11 84 7 198.874 5520 285.13
83 88 E Vosges 18835 9044 62 4040 14978 ... 5 14 85 11 43 174.477 5874 397.99
84 89 C Yonne 18006 6516 47 4276 16616 ... 35 51 66 27 272 81.797 7427 352.49
85 200 NaN Corse 2199 4589 49 37015 24743 ... 84 83 9 25 1 539.213 8680 195.41

[86 rows x 23 columns]

results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Lottery R-squared: 0.348
Model: OLS Adj. R-squared: 0.333
Method: Least Squares F-statistic: 22.20
Date: Fri, 17 Apr 2026 Prob (F-statistic): 1.90e-08
Time: 16:33:51 Log-Likelihood: -379.82
No. Observations: 86 AIC: 765.6
Df Residuals: 83 BIC: 773.0
Df Model: 2
Covariance Type: nonrobust
===================================================================================
coef std err t P>|t| [0.025 0.975]

Intercept 246.4341 35.233 6.995 0.000 176.358 316.510
Literacy -0.4889 0.128 -3.832 0.000 -0.743 -0.235
np.log(Pop1831) -31.3114 5.977 -5.239 0.000 -43.199 -19.424

Omnibus: 3.713 Durbin-Watson: 2.019
Prob(Omnibus): 0.156 Jarque-Bera (JB): 3.394
Skew: -0.487 Prob(JB): 0.183
Kurtosis: 3.003 Cond. No. 702.

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

nobs = 100
X = np.random.random((nobs, 2))
X = sm.add_constant(X)
beta = [1, .1, .5]
e = np.random.random(nobs)
y = np.dot(X, beta) + e
results = sm.OLS(y, X).fit()
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.263
Model: OLS Adj. R-squared: 0.248
Method: Least Squares F-statistic: 17.30
Date: Fri, 17 Apr 2026 Prob (F-statistic): 3.75e-07
Time: 16:35:40 Log-Likelihood: -14.069
No. Observations: 100 AIC: 34.14
Df Residuals: 97 BIC: 41.95
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]

const 1.4461 0.085 17.023 0.000 1.277 1.615
x1 0.0461 0.104 0.443 0.658 -0.160 0.253
x2 0.5766 0.098 5.865 0.000 0.381 0.772

Omnibus: 49.277 Durbin-Watson: 1.995
Prob(Omnibus): 0.000 Jarque-Bera (JB): 6.904
Skew: 0.074 Prob(JB): 0.0317
Kurtosis: 1.721 Cond. No. 6.04

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

利用python statsmodels包分析数据

Intercept 246.4341 35.233 6.995 0.000 176.358 316.510 Literacy -0.4889 0.128 -3.832 0.000 -0.743 -0.235 np.log(Pop1831) -31.3114 5.977 -5.239 0.000 -43.199 -19.424

Omnibus: 3.713 Durbin-Watson: 2.019 Prob(Omnibus): 0.156 Jarque-Bera (JB): 3.394 Skew: -0.487 Prob(JB): 0.183 Kurtosis: 3.003 Cond. No. 702.

const 1.4461 0.085 17.023 0.000 1.277 1.615 x1 0.0461 0.104 0.443 0.658 -0.160 0.253 x2 0.5766 0.098 5.865 0.000 0.381 0.772

Omnibus: 49.277 Durbin-Watson: 1.995 Prob(Omnibus): 0.000 Jarque-Bera (JB): 6.904 Skew: 0.074 Prob(JB): 0.0317 Kurtosis: 1.721 Cond. No. 6.04

Intercept 246.4341 35.233 6.995 0.000 176.358 316.510
Literacy -0.4889 0.128 -3.832 0.000 -0.743 -0.235
np.log(Pop1831) -31.3114 5.977 -5.239 0.000 -43.199 -19.424

Omnibus: 3.713 Durbin-Watson: 2.019
Prob(Omnibus): 0.156 Jarque-Bera (JB): 3.394
Skew: -0.487 Prob(JB): 0.183
Kurtosis: 3.003 Cond. No. 702.

const 1.4461 0.085 17.023 0.000 1.277 1.615
x1 0.0461 0.104 0.443 0.658 -0.160 0.253
x2 0.5766 0.098 5.865 0.000 0.381 0.772

Omnibus: 49.277 Durbin-Watson: 1.995
Prob(Omnibus): 0.000 Jarque-Bera (JB): 6.904
Skew: 0.074 Prob(JB): 0.0317
Kurtosis: 1.721 Cond. No. 6.04