数据分析必备:一步步教你如何用Pandas做数据分析(14)

1、Pandas 聚合

Pandas 聚合的操作实例

创建滚动,扩展和ewm对象后,可以使用多种方法对数据执行聚合。

1.1、对DataFrame聚合

我们创建一个DataFrame并对其应用聚合

python 复制代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
    index = pd.date_range('6/1/2024', periods=10),
    columns = ['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3,min_periods=1)
print(r)

运行结果

python 复制代码
                   A         B         C         D
2024-06-01  1.441992  0.507236 -1.279692 -0.283955
2024-06-02  0.732984 -1.022779 -1.188695  0.899738
2024-06-03  0.363206 -0.610489  0.987919 -0.556534
2024-06-04  1.760517  0.513175 -1.952190 -0.371333
2024-06-05 -0.975915  0.941488  0.116632 -1.384646
2024-06-06  0.278110  2.193880  0.434967 -3.136830
2024-06-07  0.998929 -1.174505 -0.512467 -0.076176
2024-06-08 -0.836676  0.255251 -0.283001 -0.069504
2024-06-09 -1.042460  1.008820  1.203172  1.790213
2024-06-10 -0.000309  0.327030  0.235055  0.137578
Rolling [window=3,min_periods=1,center=False,axis=0,method=single]

我们可以通过将函数传递给整个DataFrame进行聚合,也可以通过标准的get item方法选择一列。

1.2、对Dataframe聚合

python 复制代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
                  index=pd.date_range('6/1/2024', periods=10),
                  columns=['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3, min_periods=1)
print(r.aggregate(np.sum))

运行结果

python 复制代码
                   A         B         C         D
2024-06-01  0.137541 -0.666472 -0.512313  0.124189
2024-06-02 -0.274006  0.546432  0.804729  0.444257
2024-06-03  0.656569  1.087017  0.546081 -0.645019
2024-06-04  0.287474 -0.037974  0.646037 -0.116104
2024-06-05  0.159287  0.242253  1.092559 -0.437320
2024-06-06 -1.081650  0.408552  0.273044 -0.802035
2024-06-07 -1.384118  0.366630  0.503155 -1.720862
2024-06-08  0.016059 -0.177049  0.066783  0.138181
2024-06-09  0.189092  1.099488  0.788672 -0.643970
2024-06-10  0.504482  0.307674 -1.186342 -1.958610
                   A         B         C         D
2024-06-01  0.137541 -0.666472 -0.512313  0.124189
2024-06-02 -0.136465 -0.120041  0.292416  0.568445
2024-06-03  0.520104  0.966977  0.838497 -0.076574
2024-06-04  0.670037  1.595475  1.996847 -0.316866
2024-06-05  1.103330  1.291296  2.284677 -1.198443
2024-06-06 -0.634889  0.612831  2.011640 -1.355459
2024-06-07 -2.306481  1.017435  1.868758 -2.960217
2024-06-08 -2.449709  0.598133  0.842982 -2.384716
2024-06-09 -1.178967  1.289068  1.358610 -2.226651
2024-06-10  0.709633  1.230113 -0.330886 -2.464399

1.3、将聚合应用于Dataframe的单列

python 复制代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
                  index=pd.date_range('6/1/2024', periods=10),
                  columns=['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3, min_periods=1)
print(r['A'].aggregate(np.sum))

运行结果

python 复制代码
                   A         B         C         D
2024-06-01  1.337425 -2.008430  0.487408  0.619035
2024-06-02 -1.057971 -0.454410  1.029195  1.031153
2024-06-03  0.180957 -1.598784  0.235843 -1.234636
2024-06-04 -0.215478 -0.283628 -0.159067 -0.441236
2024-06-05  0.568535  0.468742 -0.981265 -0.225904
2024-06-06  1.251656  0.045891  0.533743 -1.809453
2024-06-07 -0.118663  0.430278 -1.811598  1.199368
2024-06-08  1.103233 -0.909900  0.184519  0.363605
2024-06-09  0.499495  1.120610 -1.283629  0.073462
2024-06-10  1.182883 -0.573653  0.291168 -1.079381
2024-06-01    1.337425
2024-06-02    0.279453
2024-06-03    0.460411
2024-06-04   -1.092492
2024-06-05    0.534014
2024-06-06    1.604712
2024-06-07    1.701527
2024-06-08    2.236225
2024-06-09    1.484064
2024-06-10    2.785611
Freq: D, Name: A, dtype: float64

1.4、将聚合应用于DataFrame的多个列

python 复制代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
    index = pd.date_range('6/1/2024', periods=10),
    columns = ['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3,min_periods=1)
print(r[['A','B']].aggregate(np.sum))

运行结果

python 复制代码
                A         B         C         D
2024-06-01 -0.315264 -1.007784 -0.422830  0.240110
2024-06-02 -0.899798  1.220554  0.043764 -0.724214
2024-06-03 -0.506266 -1.114019  0.970437 -1.436598
2024-06-04 -0.567130 -0.358241 -2.330796  0.720396
2024-06-05  0.002677  0.358061 -0.191730 -2.024825
2024-06-06 -1.241444 -0.185388  1.539475 -0.398289
2024-06-07 -0.394370  0.899715 -0.235603  2.083027
2024-06-08 -0.063937 -0.703623 -0.771960  1.069107
2024-06-09 -0.997480 -0.145053 -2.013109  0.630082
2024-06-10 -1.323366  0.407704 -1.958234 -0.136122
                   A         B
2024-06-01 -0.315264 -1.007784
2024-06-02 -1.215063  0.212770
2024-06-03 -1.721329 -0.901249
2024-06-04 -1.973195 -0.251705
2024-06-05 -1.070719 -1.114198
2024-06-06 -1.805897 -0.185567
2024-06-07 -1.633136  1.072388
2024-06-08 -1.699751  0.010704
2024-06-09 -1.455787  0.051039
2024-06-10 -2.384783 -0.440971

1.5、在数据框的单列上应用多个功能

python 复制代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
                  index=pd.date_range('6/1/2024', periods=10),
                  columns=['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3, min_periods=1)
print(r['A'].aggregate([np.sum, np.mean]))

运行结果

python 复制代码
                  A         B         C         D
2024-06-01 -1.353324  0.127682 -0.200629  0.450458
2024-06-02  0.949610 -1.400609  0.627148 -0.043679
2024-06-03  0.033043  0.892801 -0.425507  0.880760
2024-06-04 -0.717365 -0.126336 -0.688569  0.406762
2024-06-05 -1.432076  1.305415  0.316325  1.700087
2024-06-06 -0.130123  1.470843  0.255068 -0.466856
2024-06-07 -0.259649  0.972374 -0.294581 -0.246689
2024-06-08  0.451554  0.726053  1.198266 -0.721875
2024-06-09 -1.328514 -0.188786  0.499362 -0.998840
2024-06-10 -0.235946  0.063362 -1.474905 -1.410311
                 sum      mean
2024-06-01 -1.353324 -1.353324
2024-06-02 -0.403714 -0.201857
2024-06-03 -0.370671 -0.123557
2024-06-04  0.265288  0.088429
2024-06-05 -2.116398 -0.705466
2024-06-06 -2.279563 -0.759854
2024-06-07 -1.821847 -0.607282
2024-06-08  0.061783  0.020594
2024-06-09 -1.136609 -0.378870
2024-06-10 -1.112906 -0.370969

1.6、在数据框的多个列上应用多个功能

python 复制代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
    index = pd.date_range('6/1/2024', periods=10),
    columns = ['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3,min_periods=1)
print(r[['A','B']].aggregate([np.sum,np.mean]))

运行结果

python 复制代码
                 A         B         C         D
2024-06-01  0.688572  0.335234  0.752168  0.961081
2024-06-02  1.085028  1.130616 -0.536655  0.779873
2024-06-03  0.867040  0.676979 -0.389117 -2.827168
2024-06-04  0.964311  0.861692 -0.421859 -1.080160
2024-06-05 -0.203971 -1.289974 -0.553891 -0.809878
2024-06-06  1.126439  1.169267 -2.039094 -1.062846
2024-06-07  0.442940 -2.056051  0.917150 -0.204623
2024-06-08 -0.441348 -0.131800 -0.884501 -0.733120
2024-06-09 -0.529172 -0.652189 -1.366874 -0.988671
2024-06-10  0.189241  0.030703  0.020499  0.532722
                   A                   B          
                 sum      mean       sum      mean
2024-06-01  0.688572  0.688572  0.335234  0.335234
2024-06-02  1.773601  0.886800  1.465849  0.732925
2024-06-03  2.640640  0.880213  2.142829  0.714276
2024-06-04  2.916379  0.972126  2.669287  0.889762
2024-06-05  1.627379  0.542460  0.248697  0.082899
2024-06-06  1.886778  0.628926  0.740985  0.246995
2024-06-07  1.365408  0.455136 -2.176759 -0.725586
2024-06-08  1.128031  0.376010 -1.018584 -0.339528
2024-06-09 -0.527580 -0.175860 -2.840041 -0.946680
2024-06-10 -0.781280 -0.260427 -0.753286 -0.251095

1.7、将不同的功能应用于数据框的不同列

python 复制代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 4),
                  index=pd.date_range('6/1/2024', periods=3),
                  columns=['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3, min_periods=1)
print(r.aggregate({'A': np.sum, 'B': np.mean}))

运行结果

python 复制代码
                  A         B         C         D
2024-06-01  0.024827 -0.020137  1.930786 -0.481966
2024-06-02  0.301334  0.295961 -0.983852  0.401034
2024-06-03  0.025677  0.625714  0.948775 -0.490254
                   A         B
2024-06-01  0.024827 -0.020137
2024-06-02  0.326161  0.137912
2024-06-03  0.351838  0.300513
相关推荐
杰哥在此11 分钟前
Python知识点:如何使用Multiprocessing进行并行任务管理
linux·开发语言·python·面试·编程
zaim12 小时前
计算机的错误计算(一百一十四)
java·c++·python·rust·go·c·多项式
凭栏落花侧2 小时前
决策树:简单易懂的预测模型
人工智能·算法·决策树·机器学习·信息可视化·数据挖掘·数据分析
wei_shuo3 小时前
偏标记学习+图像分类(论文复现)
学习·分类·数据挖掘
bin91534 小时前
【EXCEL数据处理】000010 案列 EXCEL文本型和常规型转换。使用的软件是微软的Excel操作的。处理数据的目的是让数据更直观的显示出来,方便查看。
大数据·数据库·信息可视化·数据挖掘·数据分析·excel·数据可视化
PythonFun6 小时前
Python批量下载PPT模块并实现自动解压
开发语言·python·powerpoint
炼丹师小米7 小时前
Ubuntu24.04.1系统下VideoMamba环境配置
python·环境配置·videomamba
极客先躯7 小时前
Hadoop krb5.conf 配置详解
大数据·hadoop·分布式·kerberos·krb5.conf·认证系统
GFCGUO7 小时前
ubuntu18.04运行OpenPCDet出现的问题
linux·python·学习·ubuntu·conda·pip
985小水博一枚呀8 小时前
【深度学习基础模型】神经图灵机(Neural Turing Machines, NTM)详细理解并附实现代码。
人工智能·python·rnn·深度学习·lstm·ntm