1、Pandas 聚合
Pandas 聚合的操作实例
创建滚动,扩展和ewm对象后,可以使用多种方法对数据执行聚合。
1.1、对DataFrame聚合
我们创建一个DataFrame并对其应用聚合
python
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
index = pd.date_range('6/1/2024', periods=10),
columns = ['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3,min_periods=1)
print(r)
运行结果
python
A B C D
2024-06-01 1.441992 0.507236 -1.279692 -0.283955
2024-06-02 0.732984 -1.022779 -1.188695 0.899738
2024-06-03 0.363206 -0.610489 0.987919 -0.556534
2024-06-04 1.760517 0.513175 -1.952190 -0.371333
2024-06-05 -0.975915 0.941488 0.116632 -1.384646
2024-06-06 0.278110 2.193880 0.434967 -3.136830
2024-06-07 0.998929 -1.174505 -0.512467 -0.076176
2024-06-08 -0.836676 0.255251 -0.283001 -0.069504
2024-06-09 -1.042460 1.008820 1.203172 1.790213
2024-06-10 -0.000309 0.327030 0.235055 0.137578
Rolling [window=3,min_periods=1,center=False,axis=0,method=single]
我们可以通过将函数传递给整个DataFrame进行聚合,也可以通过标准的get item方法选择一列。
1.2、对Dataframe聚合
python
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
index=pd.date_range('6/1/2024', periods=10),
columns=['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3, min_periods=1)
print(r.aggregate(np.sum))
运行结果
python
A B C D
2024-06-01 0.137541 -0.666472 -0.512313 0.124189
2024-06-02 -0.274006 0.546432 0.804729 0.444257
2024-06-03 0.656569 1.087017 0.546081 -0.645019
2024-06-04 0.287474 -0.037974 0.646037 -0.116104
2024-06-05 0.159287 0.242253 1.092559 -0.437320
2024-06-06 -1.081650 0.408552 0.273044 -0.802035
2024-06-07 -1.384118 0.366630 0.503155 -1.720862
2024-06-08 0.016059 -0.177049 0.066783 0.138181
2024-06-09 0.189092 1.099488 0.788672 -0.643970
2024-06-10 0.504482 0.307674 -1.186342 -1.958610
A B C D
2024-06-01 0.137541 -0.666472 -0.512313 0.124189
2024-06-02 -0.136465 -0.120041 0.292416 0.568445
2024-06-03 0.520104 0.966977 0.838497 -0.076574
2024-06-04 0.670037 1.595475 1.996847 -0.316866
2024-06-05 1.103330 1.291296 2.284677 -1.198443
2024-06-06 -0.634889 0.612831 2.011640 -1.355459
2024-06-07 -2.306481 1.017435 1.868758 -2.960217
2024-06-08 -2.449709 0.598133 0.842982 -2.384716
2024-06-09 -1.178967 1.289068 1.358610 -2.226651
2024-06-10 0.709633 1.230113 -0.330886 -2.464399
1.3、将聚合应用于Dataframe的单列
python
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
index=pd.date_range('6/1/2024', periods=10),
columns=['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3, min_periods=1)
print(r['A'].aggregate(np.sum))
运行结果
python
A B C D
2024-06-01 1.337425 -2.008430 0.487408 0.619035
2024-06-02 -1.057971 -0.454410 1.029195 1.031153
2024-06-03 0.180957 -1.598784 0.235843 -1.234636
2024-06-04 -0.215478 -0.283628 -0.159067 -0.441236
2024-06-05 0.568535 0.468742 -0.981265 -0.225904
2024-06-06 1.251656 0.045891 0.533743 -1.809453
2024-06-07 -0.118663 0.430278 -1.811598 1.199368
2024-06-08 1.103233 -0.909900 0.184519 0.363605
2024-06-09 0.499495 1.120610 -1.283629 0.073462
2024-06-10 1.182883 -0.573653 0.291168 -1.079381
2024-06-01 1.337425
2024-06-02 0.279453
2024-06-03 0.460411
2024-06-04 -1.092492
2024-06-05 0.534014
2024-06-06 1.604712
2024-06-07 1.701527
2024-06-08 2.236225
2024-06-09 1.484064
2024-06-10 2.785611
Freq: D, Name: A, dtype: float64
1.4、将聚合应用于DataFrame的多个列
python
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
index = pd.date_range('6/1/2024', periods=10),
columns = ['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3,min_periods=1)
print(r[['A','B']].aggregate(np.sum))
运行结果
python
A B C D
2024-06-01 -0.315264 -1.007784 -0.422830 0.240110
2024-06-02 -0.899798 1.220554 0.043764 -0.724214
2024-06-03 -0.506266 -1.114019 0.970437 -1.436598
2024-06-04 -0.567130 -0.358241 -2.330796 0.720396
2024-06-05 0.002677 0.358061 -0.191730 -2.024825
2024-06-06 -1.241444 -0.185388 1.539475 -0.398289
2024-06-07 -0.394370 0.899715 -0.235603 2.083027
2024-06-08 -0.063937 -0.703623 -0.771960 1.069107
2024-06-09 -0.997480 -0.145053 -2.013109 0.630082
2024-06-10 -1.323366 0.407704 -1.958234 -0.136122
A B
2024-06-01 -0.315264 -1.007784
2024-06-02 -1.215063 0.212770
2024-06-03 -1.721329 -0.901249
2024-06-04 -1.973195 -0.251705
2024-06-05 -1.070719 -1.114198
2024-06-06 -1.805897 -0.185567
2024-06-07 -1.633136 1.072388
2024-06-08 -1.699751 0.010704
2024-06-09 -1.455787 0.051039
2024-06-10 -2.384783 -0.440971
1.5、在数据框的单列上应用多个功能
python
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
index=pd.date_range('6/1/2024', periods=10),
columns=['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3, min_periods=1)
print(r['A'].aggregate([np.sum, np.mean]))
运行结果
python
A B C D
2024-06-01 -1.353324 0.127682 -0.200629 0.450458
2024-06-02 0.949610 -1.400609 0.627148 -0.043679
2024-06-03 0.033043 0.892801 -0.425507 0.880760
2024-06-04 -0.717365 -0.126336 -0.688569 0.406762
2024-06-05 -1.432076 1.305415 0.316325 1.700087
2024-06-06 -0.130123 1.470843 0.255068 -0.466856
2024-06-07 -0.259649 0.972374 -0.294581 -0.246689
2024-06-08 0.451554 0.726053 1.198266 -0.721875
2024-06-09 -1.328514 -0.188786 0.499362 -0.998840
2024-06-10 -0.235946 0.063362 -1.474905 -1.410311
sum mean
2024-06-01 -1.353324 -1.353324
2024-06-02 -0.403714 -0.201857
2024-06-03 -0.370671 -0.123557
2024-06-04 0.265288 0.088429
2024-06-05 -2.116398 -0.705466
2024-06-06 -2.279563 -0.759854
2024-06-07 -1.821847 -0.607282
2024-06-08 0.061783 0.020594
2024-06-09 -1.136609 -0.378870
2024-06-10 -1.112906 -0.370969
1.6、在数据框的多个列上应用多个功能
python
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
index = pd.date_range('6/1/2024', periods=10),
columns = ['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3,min_periods=1)
print(r[['A','B']].aggregate([np.sum,np.mean]))
运行结果
python
A B C D
2024-06-01 0.688572 0.335234 0.752168 0.961081
2024-06-02 1.085028 1.130616 -0.536655 0.779873
2024-06-03 0.867040 0.676979 -0.389117 -2.827168
2024-06-04 0.964311 0.861692 -0.421859 -1.080160
2024-06-05 -0.203971 -1.289974 -0.553891 -0.809878
2024-06-06 1.126439 1.169267 -2.039094 -1.062846
2024-06-07 0.442940 -2.056051 0.917150 -0.204623
2024-06-08 -0.441348 -0.131800 -0.884501 -0.733120
2024-06-09 -0.529172 -0.652189 -1.366874 -0.988671
2024-06-10 0.189241 0.030703 0.020499 0.532722
A B
sum mean sum mean
2024-06-01 0.688572 0.688572 0.335234 0.335234
2024-06-02 1.773601 0.886800 1.465849 0.732925
2024-06-03 2.640640 0.880213 2.142829 0.714276
2024-06-04 2.916379 0.972126 2.669287 0.889762
2024-06-05 1.627379 0.542460 0.248697 0.082899
2024-06-06 1.886778 0.628926 0.740985 0.246995
2024-06-07 1.365408 0.455136 -2.176759 -0.725586
2024-06-08 1.128031 0.376010 -1.018584 -0.339528
2024-06-09 -0.527580 -0.175860 -2.840041 -0.946680
2024-06-10 -0.781280 -0.260427 -0.753286 -0.251095
1.7、将不同的功能应用于数据框的不同列
python
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 4),
index=pd.date_range('6/1/2024', periods=3),
columns=['A', 'B', 'C', 'D'])
print(df)
r = df.rolling(window=3, min_periods=1)
print(r.aggregate({'A': np.sum, 'B': np.mean}))
运行结果
python
A B C D
2024-06-01 0.024827 -0.020137 1.930786 -0.481966
2024-06-02 0.301334 0.295961 -0.983852 0.401034
2024-06-03 0.025677 0.625714 0.948775 -0.490254
A B
2024-06-01 0.024827 -0.020137
2024-06-02 0.326161 0.137912
2024-06-03 0.351838 0.300513