利用python statsmodels包分析数据

原文档地址:https://www.statsmodels.org/stable/index.html

  1. 下载statsmodels安装包

    aaa@kylin-pc:~/par$ python3 loong/pip-24.0.pyz download statsmodels -d 313 -i https://mirrors.aliyun.com/pypi/simple/ --platform manylinux2014_aarch64 --only-binary=:all: --python-version 3.13 --default-timeout=160
    ...
    Successfully downloaded statsmodels numpy packaging pandas patsy scipy python-dateutil pytz tzdata six

  2. 安装statsmodels

    aaa@kylin-pc:~/par cd tpy313 aaa@kylin-pc:~/par/tpy313 source myenv/bin/activate

    (myenv) aaa@kylin-pc:~/par/tpy313$ pip install --no-index -f 313 statsmodels
    ...
    Successfully installed pandas-2.3.2 patsy-1.0.2 pytz-2026.1.post1 scipy-1.16.3 statsmodels-0.14.6 tzdata-2026.1

  3. 执行文档中的例子,需要联网

    (myenv) aaa@kylin-pc:~/par/tpy313$ python3
    Python 3.13.13 (main, Apr 7 2026, 20:43:47) [Clang 22.1.1 ] on linux
    Type "help", "copyright", "credits" or "license" for more information.

    import numpy as np
    import statsmodels.api as sm
    import statsmodels.formula.api as smf
    dat = sm.datasets.get_rdataset("Guerry", "HistData").data
    dat
    dept Region Department Crime_pers Crime_prop Literacy Donations Infants ... Donation_clergy Lottery Desertion Instruction Prostitutes Distance Area Pop1831
    0 1 E Ain 28870 15890 37 5098 33120 ... 69 41 55 46 13 218.372 5762 346.03
    1 2 N Aisne 26226 5521 51 8901 14572 ... 36 38 82 24 327 65.945 7369 513.00
    2 3 C Allier 26747 7925 13 10973 17044 ... 76 66 16 85 34 161.927 7340 298.26
    3 4 E Basses-Alpes 12935 7289 46 2733 23018 ... 37 80 32 29 2 351.399 6925 155.90
    4 5 E Hautes-Alpes 17488 8174 69 6962 23076 ... 64 79 35 7 1 320.280 5549 129.10
    .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    81 86 W Vienne 15010 4710 25 8922 35224 ... 44 40 38 65 18 170.523 6990 282.73
    82 87 C Haute-Vienne 16256 6402 13 13817 19940 ... 78 55 11 84 7 198.874 5520 285.13
    83 88 E Vosges 18835 9044 62 4040 14978 ... 5 14 85 11 43 174.477 5874 397.99
    84 89 C Yonne 18006 6516 47 4276 16616 ... 35 51 66 27 272 81.797 7427 352.49
    85 200 NaN Corse 2199 4589 49 37015 24743 ... 84 83 9 25 1 539.213 8680 195.41

    [86 rows x 23 columns]

    results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
    print(results.summary())
    OLS Regression Results
    ==============================================================================
    Dep. Variable: Lottery R-squared: 0.348
    Model: OLS Adj. R-squared: 0.333
    Method: Least Squares F-statistic: 22.20
    Date: Fri, 17 Apr 2026 Prob (F-statistic): 1.90e-08
    Time: 16:33:51 Log-Likelihood: -379.82
    No. Observations: 86 AIC: 765.6
    Df Residuals: 83 BIC: 773.0
    Df Model: 2
    Covariance Type: nonrobust
    ===================================================================================
    coef std err t P>|t| [0.025 0.975]


    Intercept 246.4341 35.233 6.995 0.000 176.358 316.510
    Literacy -0.4889 0.128 -3.832 0.000 -0.743 -0.235
    np.log(Pop1831) -31.3114 5.977 -5.239 0.000 -43.199 -19.424

    Omnibus: 3.713 Durbin-Watson: 2.019
    Prob(Omnibus): 0.156 Jarque-Bera (JB): 3.394
    Skew: -0.487 Prob(JB): 0.183
    Kurtosis: 3.003 Cond. No. 702.

    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

    nobs = 100
    X = np.random.random((nobs, 2))
    X = sm.add_constant(X)
    beta = [1, .1, .5]
    e = np.random.random(nobs)
    y = np.dot(X, beta) + e
    results = sm.OLS(y, X).fit()
    print(results.summary())
    OLS Regression Results
    ==============================================================================
    Dep. Variable: y R-squared: 0.263
    Model: OLS Adj. R-squared: 0.248
    Method: Least Squares F-statistic: 17.30
    Date: Fri, 17 Apr 2026 Prob (F-statistic): 3.75e-07
    Time: 16:35:40 Log-Likelihood: -14.069
    No. Observations: 100 AIC: 34.14
    Df Residuals: 97 BIC: 41.95
    Df Model: 2
    Covariance Type: nonrobust
    ==============================================================================
    coef std err t P>|t| [0.025 0.975]


    const 1.4461 0.085 17.023 0.000 1.277 1.615
    x1 0.0461 0.104 0.443 0.658 -0.160 0.253
    x2 0.5766 0.098 5.865 0.000 0.381 0.772

    Omnibus: 49.277 Durbin-Watson: 1.995
    Prob(Omnibus): 0.000 Jarque-Bera (JB): 6.904
    Skew: 0.074 Prob(JB): 0.0317
    Kurtosis: 1.721 Cond. No. 6.04

    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

相关推荐
2zcode7 小时前
基于MATLAB的肝病风险评估与分期分析系统设计与实现
开发语言·matlab
小小de风呀7 小时前
de风——【从零开始学C++】(五):内存管理
开发语言·c++
ooseabiscuit7 小时前
Laravel6.x核心优化与特性全解析
android·开发语言·javascript
折哥的程序人生 · 物流技术专研7 小时前
Java面试85题图解版(一):基础核心篇
java·开发语言·后端·面试
AllData公司负责人7 小时前
通过Postgresql同步到Doris,全视角演示AllData数据中台核心功能效果,涵盖:数据入湖仓,数据同步,数据处理,数据服务,BI可视化驾驶舱
java·大数据·数据库·数据仓库·人工智能·python·postgresql
Hello.Reader8 小时前
算法基础(十)——分治思想把大问题拆成小问题
java·开发语言·算法
一只大袋鼠8 小时前
JavaWeb四种文件上传方式(下篇)
java·开发语言·springmvc·javaweb
TE-茶叶蛋8 小时前
深入研究 yudao-framework 模块:Java 编程能力提升指南
java·开发语言
Flittly8 小时前
【LangGraph新手村系列】(5)时间旅行:浏览历史、分叉时间线与修改过去
python·langchain
逻辑驱动的ken8 小时前
Java高频考点场景题24
java·开发语言·面试·职场和发展·求职招聘