利用python statsmodels包分析数据

原文档地址:https://www.statsmodels.org/stable/index.html

  1. 下载statsmodels安装包

    aaa@kylin-pc:~/par$ python3 loong/pip-24.0.pyz download statsmodels -d 313 -i https://mirrors.aliyun.com/pypi/simple/ --platform manylinux2014_aarch64 --only-binary=:all: --python-version 3.13 --default-timeout=160
    ...
    Successfully downloaded statsmodels numpy packaging pandas patsy scipy python-dateutil pytz tzdata six

  2. 安装statsmodels

    aaa@kylin-pc:~/par cd tpy313 aaa@kylin-pc:~/par/tpy313 source myenv/bin/activate

    (myenv) aaa@kylin-pc:~/par/tpy313$ pip install --no-index -f 313 statsmodels
    ...
    Successfully installed pandas-2.3.2 patsy-1.0.2 pytz-2026.1.post1 scipy-1.16.3 statsmodels-0.14.6 tzdata-2026.1

  3. 执行文档中的例子,需要联网

    (myenv) aaa@kylin-pc:~/par/tpy313$ python3
    Python 3.13.13 (main, Apr 7 2026, 20:43:47) [Clang 22.1.1 ] on linux
    Type "help", "copyright", "credits" or "license" for more information.

    import numpy as np
    import statsmodels.api as sm
    import statsmodels.formula.api as smf
    dat = sm.datasets.get_rdataset("Guerry", "HistData").data
    dat
    dept Region Department Crime_pers Crime_prop Literacy Donations Infants ... Donation_clergy Lottery Desertion Instruction Prostitutes Distance Area Pop1831
    0 1 E Ain 28870 15890 37 5098 33120 ... 69 41 55 46 13 218.372 5762 346.03
    1 2 N Aisne 26226 5521 51 8901 14572 ... 36 38 82 24 327 65.945 7369 513.00
    2 3 C Allier 26747 7925 13 10973 17044 ... 76 66 16 85 34 161.927 7340 298.26
    3 4 E Basses-Alpes 12935 7289 46 2733 23018 ... 37 80 32 29 2 351.399 6925 155.90
    4 5 E Hautes-Alpes 17488 8174 69 6962 23076 ... 64 79 35 7 1 320.280 5549 129.10
    .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
    81 86 W Vienne 15010 4710 25 8922 35224 ... 44 40 38 65 18 170.523 6990 282.73
    82 87 C Haute-Vienne 16256 6402 13 13817 19940 ... 78 55 11 84 7 198.874 5520 285.13
    83 88 E Vosges 18835 9044 62 4040 14978 ... 5 14 85 11 43 174.477 5874 397.99
    84 89 C Yonne 18006 6516 47 4276 16616 ... 35 51 66 27 272 81.797 7427 352.49
    85 200 NaN Corse 2199 4589 49 37015 24743 ... 84 83 9 25 1 539.213 8680 195.41

    [86 rows x 23 columns]

    results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
    print(results.summary())
    OLS Regression Results
    ==============================================================================
    Dep. Variable: Lottery R-squared: 0.348
    Model: OLS Adj. R-squared: 0.333
    Method: Least Squares F-statistic: 22.20
    Date: Fri, 17 Apr 2026 Prob (F-statistic): 1.90e-08
    Time: 16:33:51 Log-Likelihood: -379.82
    No. Observations: 86 AIC: 765.6
    Df Residuals: 83 BIC: 773.0
    Df Model: 2
    Covariance Type: nonrobust
    ===================================================================================
    coef std err t P>|t| [0.025 0.975]


    Intercept 246.4341 35.233 6.995 0.000 176.358 316.510
    Literacy -0.4889 0.128 -3.832 0.000 -0.743 -0.235
    np.log(Pop1831) -31.3114 5.977 -5.239 0.000 -43.199 -19.424

    Omnibus: 3.713 Durbin-Watson: 2.019
    Prob(Omnibus): 0.156 Jarque-Bera (JB): 3.394
    Skew: -0.487 Prob(JB): 0.183
    Kurtosis: 3.003 Cond. No. 702.

    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

    nobs = 100
    X = np.random.random((nobs, 2))
    X = sm.add_constant(X)
    beta = [1, .1, .5]
    e = np.random.random(nobs)
    y = np.dot(X, beta) + e
    results = sm.OLS(y, X).fit()
    print(results.summary())
    OLS Regression Results
    ==============================================================================
    Dep. Variable: y R-squared: 0.263
    Model: OLS Adj. R-squared: 0.248
    Method: Least Squares F-statistic: 17.30
    Date: Fri, 17 Apr 2026 Prob (F-statistic): 3.75e-07
    Time: 16:35:40 Log-Likelihood: -14.069
    No. Observations: 100 AIC: 34.14
    Df Residuals: 97 BIC: 41.95
    Df Model: 2
    Covariance Type: nonrobust
    ==============================================================================
    coef std err t P>|t| [0.025 0.975]


    const 1.4461 0.085 17.023 0.000 1.277 1.615
    x1 0.0461 0.104 0.443 0.658 -0.160 0.253
    x2 0.5766 0.098 5.865 0.000 0.381 0.772

    Omnibus: 49.277 Durbin-Watson: 1.995
    Prob(Omnibus): 0.000 Jarque-Bera (JB): 6.904
    Skew: 0.074 Prob(JB): 0.0317
    Kurtosis: 1.721 Cond. No. 6.04

    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

相关推荐
2301_796588502 小时前
如何阻止 HTML 页面在 JavaScript 脚本执行完成前渲染
jvm·数据库·python
小同志002 小时前
请求两个注解 @PathVariable + @RequestPart
开发语言·lua·请求注解
DeepModel2 小时前
通俗易懂讲透 EM 算法(期望最大化)
人工智能·python·算法·机器学习
海海不掉头发2 小时前
【AI大模型实战项目】大模型入门实战:两个落地项目保姆级教程12月14日-【项目】基于知识库RAG的物流行业信息问答系统
人工智能·python·深度学习·语言模型·自然语言处理·pycharm·scikit-learn
2301_773553622 小时前
mysql执行SQL查询时结果不一致_检查事务隔离级别设置与幻读
jvm·数据库·python
m0_377618232 小时前
mysql如何解决乱码问题_检查客户端与服务器字符集一致性
jvm·数据库·python
m0_747854522 小时前
html怎么转astro island模式_Astro Islands如何隔离HTML组件
jvm·数据库·python
nnsix2 小时前
C# ProcessStartInfo对象笔记
开发语言·笔记·c#
m0_748920362 小时前
如何利用SQL触发器自动记录数据修改_编写审计日志逻辑
jvm·数据库·python