08,DataFrame创建
DataFrame是一个【表格型】的数据结构,可以看做是【由Series组成的字典】(共用同一个索引)。DataFrame由按一定顺序排列的多列数据组成。设计初衷是将Series的使用场景从一维拓展到多维。DataFrame既有行索引,也有列索引。
- 行索引:index
- 列索引:columns
- 值:values(Numpy的二维数组)
(8.1)DataFrame的创建
最常用的方法是传递一个字典来创建。DataFrame以字典的键作为每一【列】的名称,以字典的值(一个数组)作为每一列。此外,DataFrame会自动加上每一行的索引(和Series一样)。
同Series一样,若传入的列与字典的键不匹配,则相应的值为NaN。
![](https://file.jishuzhan.net/article/1784629164377116674/1ef991e535c0b05222ce8a4c0e96846a.webp)
DataFrame的基本属性和方法:
- values 值,二维ndarray数组
- columns 列索引
- index 行索引
- shape 形状
- head() 查看前几条数据,默认5条
- tail() 查看后几条数据,默认5条
![](https://file.jishuzhan.net/article/1784629164377116674/9e54382d38560d1c23afb5d8f7bf329c.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/e5ab6631cefd06a0c2ab174b445f2ccf.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/e0a5e15782e214c69a597153d8416fb1.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/8ef00487b1128228ab4075ae84a4cddb.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/84158522b7b0a7a60371fe74f89d1ef6.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/8ed48b9d7174fd682295d79f657ac940.webp)
其他创建DataFrame的方式
![](https://file.jishuzhan.net/article/1784629164377116674/19050da82297b0e1a9a48e966febfa2b.webp)
09,DataFrame切片
【注意】直接用中括号时:
- 索引优先对列进行操作
- 切片优先对行进行操作
![](https://file.jishuzhan.net/article/1784629164377116674/370db0c695f6e7bc1a996abd87625c25.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d615ad8cd0370eb9f72f497150499564.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/a6a616b5c5b49c7ebece73758548327f.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/2b4a2c7436ab97d7f5f8812657956f0c.webp)
总结:
- 要么取一行或一列:索引
- 要么取连续的多行或多列:切片
- 要么取不连续的多行或多列:中括号
![](https://file.jishuzhan.net/article/1784629164377116674/b1e0cf9aa5b156b5f4acfa0621fb9f42.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/4e765ea23c897e493ef26b08f671755f.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/6405e2087dcc7cf9a99053f4a31a4239.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/cd5a1c1240c4089b6e7fec027b62da6b.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/f12ed06a738a3f925fd780c53ffbc042.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/42057e1cadaebf541804b0770b27625c.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d407ebc9d2679076c9f2ae49ce862cf8.webp)
010,DataFrame运算
(10.1)DataFrame之间的运算
- 在运算中自动补齐不同索引的数据
- 如果索引不对应,则补NaN
- DataFrame没有广播机制
创建DataFrame df1 不同人员的各科目成绩,月考一
![](https://file.jishuzhan.net/article/1784629164377116674/0edc764c0eaaf2f01e3d3a8ac7b9c06e.webp)
创建DataFrame df2 不同人员的各科目成绩,月考二
![](https://file.jishuzhan.net/article/1784629164377116674/a40b6c064f33b146cae04b4f20e68333.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/1a021687db3b166b1d09c2757d1aec71.webp)
DataFrame和标量之间的运算
![](https://file.jishuzhan.net/article/1784629164377116674/d9b98b3efda609bccd2ded09ff7dfdc7.webp)
DataFrame之间的运算
![](https://file.jishuzhan.net/article/1784629164377116674/b493aa138e9eb5e7a080ddb8414f5171.webp)
使用.add() 函数,填充数据
![](https://file.jishuzhan.net/article/1784629164377116674/ea3ef5d07510dceb7e10f1f5a63ee25c.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/5c0c53041dec60b388563af3bd4702e2.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/57c9692b6bc6fd705a96a5e19c1ae71f.webp)
(10.2)Series与DataFrame之间的运算
- 使用Python操作符:以行为单位操作(参数必须是行),对所有行都有效。
- 类似于NumPy中二维数组与一维数组的运算,但可能出现NaN
- 使用Pandas操作函数:
- axis=0:以列为单位操作(参数必须为列),对所有列都有效。
- axis=1:以行为单位操作(参数必须为行),对所有行都有效。
![](https://file.jishuzhan.net/article/1784629164377116674/617a6b80c5d1a5b76b95e5d7ea6b9eaa.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d7afc36ec7558eaca28ea76106d1f952.webp)
011,创建层次化索引
(11.1)创建多层行索引
(11.1.1)隐式构造
最常见的方法是给DataFrame构造函数的index参数传递两个或更多的数组
![](https://file.jishuzhan.net/article/1784629164377116674/54f2d88bcecf90177c507e1b1a4be520.webp)
- Series也可以创建多层索引
![](https://file.jishuzhan.net/article/1784629164377116674/f574396bf568517c22a9d6e4da051b6a.webp)
(11.1.2)显示构造pd.MultiIndex
- 使用数组
![](https://file.jishuzhan.net/article/1784629164377116674/4e246443d2530efa9350898bdba3fafc.webp)
- 使用tuple
![](https://file.jishuzhan.net/article/1784629164377116674/a5d776e2edd12d109fd93ba6dba75338.webp)
- 使用product
- 笛卡尔积
![](https://file.jishuzhan.net/article/1784629164377116674/ff7433c2f3632529821f3720e0692a5f.webp)
(11.2)多层列索引
除了行索引index,列索引columns也能用同样的方法创建多层索引
就是把pd.MultIndex. 移到columns那里去
012,多层索引中Series的索引和切片操作
(12.1)Series的操作
- 对于Series的操作,直接中括号[] 与使用.loc() 完全一样
(12.1.1)索引
![](https://file.jishuzhan.net/article/1784629164377116674/fa4f964a51b9c9ac306645cbd0a11467.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/6a159f8d4f47854691bfe3223de5cb5b.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/953ed37250b4f089729a36e42220fe37.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/879f6dc6645bfed6f0747347f08fd4b5.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d78c3fb863ae19bced2e6f84cf168dd0.webp)
(12.1.2)切片
![](https://file.jishuzhan.net/article/1784629164377116674/fe77e35d7ba0e8cc494b0d93c4f9f6b0.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/749d80f0828ad166ae7bcfe327325767.webp)
013,多层索引中DataFrame的索引和切片操作
(13.1)索引
![](https://file.jishuzhan.net/article/1784629164377116674/942f2ed1e1178089d8979cf1634aec81.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/aae99bad86414349d27ac8e6a4376e68.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/26c8146fd1da703e50ab8f41d6e1daeb.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/e7e0cad04dfdc058c0decc0a625dda6c.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d19b190a1b05b9a9932d72bb3be6fd4e.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/0ba727dbfe88a1cf80bed04e2837e9bd.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/dc429dea9661a2592b7eb26ea70ec0f1.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/3e5a8c2c079fcbf13e65c44c73ff2301.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/e6378d6d9e22c3bf6bb934f8e11a26ac.webp)
(13.2)切片
![](https://file.jishuzhan.net/article/1784629164377116674/cdfb4fb83dfac7886c53b2fcf972c42b.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/f75796255624f140e6e50abacc4e26bc.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/e395d21c1da575bd443fe1ff0be583f2.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/06d951fd1711bf3bfbae3d5de006c419.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/9b63bcf9c389b6bf6e608935ee176c05.webp)
014,索引的堆叠
![](https://file.jishuzhan.net/article/1784629164377116674/f15d20148f5d33e9f704f70f9fb7eeae.webp)
(14.1)stack():将列索引变成行索引
![](https://file.jishuzhan.net/article/1784629164377116674/883714fd789fa3fae535e73cc3129efc.webp)
(14.2)unstack():将行索引变成列索引
![](https://file.jishuzhan.net/article/1784629164377116674/d96261f9c059ac124d504c6b9c8a36f9.webp)
(14.3)使用fill_value填充
![](https://file.jishuzhan.net/article/1784629164377116674/72376424d2ce08fdfa74f0053fe08217.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/b004b6949de944ffc5d6d6d6384e0db7.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/29b345d13e4c4ea455fff28395bc170a.webp)
015,聚合操作
(15.1)DataFrame聚合函数
- 求和
- 平均值
- 最大值
- 最小值等
![](https://file.jishuzhan.net/article/1784629164377116674/006310d8ca18f1d6bd50e5487442121f.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d847a2fd729da5bfbe8e20e6584568e5.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/b25d2cb1837b1dce171d3ecf3fedb94b.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/78f15d0d705149066d392b52f12f7900.webp)
(15.2)多层索引聚合操作
![](https://file.jishuzhan.net/article/1784629164377116674/e83c2517ee5df40548a7685b12bcec23.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/6817e72e812906ffead29e13d83728bf.webp)
016,数据合并concat
为方便讲解,我们首先定义一个生成DataFrame的函数:
示例:
![](https://file.jishuzhan.net/article/1784629164377116674/5916568f618b0c6dc4286dcc867cbc82.webp)
使用pd.concat()级联
pandas使用pd.concat函数,与np.concatenate函数类似
(16.1)简单级联
![](https://file.jishuzhan.net/article/1784629164377116674/9552412d48a3255ad820245b68228bda.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/898aa9267abb0854958c499cd76910b8.webp)
忽略行索引 ignore_index
![](https://file.jishuzhan.net/article/1784629164377116674/dbe2c599062340c2885412bef385e0c4.webp)
使用多层索引 keys
![](https://file.jishuzhan.net/article/1784629164377116674/8e3cebea188fb5dcccc2f250f7a84e03.webp)
(16.2)不匹配级联
不匹配指的是级联的维度的索引不一致。例如纵向级联时列索引不一致,横向级联时行索引不一致
![](https://file.jishuzhan.net/article/1784629164377116674/77f5df82a81d35cbc5bf317a72241c82.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/31cbcfe7334262de0e7232093c1fe02f.webp)
外连接:补NaN(默认模式)
![](https://file.jishuzhan.net/article/1784629164377116674/ce83a9c591ac27e67f98da9ea6f01ee0.webp)
内连接:只连接匹配的项
![](https://file.jishuzhan.net/article/1784629164377116674/24942bdaf876a9be05089c975482a998.webp)
017,数据合并merge合并1
- 类似于MySQL中表和表直接的合并
- merge与concat的区别在于,merge需要依据某一共同的行或列来进行合并
- 使用pd.merge() 合并时,会自动根据两者相同column名称的那一列,作为key来进行合并。
- 每一列元素的顺序不要求一致
(17.1)一对一合并
![](https://file.jishuzhan.net/article/1784629164377116674/97770836afcce2918a423ddfe264da05.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/c1fdb981cb3f6437abc1a84f7f11d9f3.webp)
(17.2)多对一合并
![](https://file.jishuzhan.net/article/1784629164377116674/2ed6a1a8b31de16177bb36fa22a5db61.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/b388f5fd4a1036f2584171b00989782d.webp)
(17.3)多对多合并
![](https://file.jishuzhan.net/article/1784629164377116674/f7fb837748d4b51d7b4ff5d43abdee3d.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/a5401bd29425d47713c697cca1c60a1c.webp)
(17.4)key的规范化
- 使用on = 显式指定哪一列为key,当2个DataFrame有多列相同时使用
![](https://file.jishuzhan.net/article/1784629164377116674/9897ff2434d460d6947792d4f9b20493.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/6eefaa02ea82b3f8b15fbad3fedd4796.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/21c9717d76cae7bca82fbe43e45b44e4.webp)
- 使用left_on和right_on指定左右两边的列作为key,当左右两边的key都不相等时使用
![](https://file.jishuzhan.net/article/1784629164377116674/da9066b917314270d05f44a62e33ce18.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/8653cb50a4345b60201cb84d45082cb9.webp)
- 当左边的列和右边的index相同的时候,使用right_index=True
![](https://file.jishuzhan.net/article/1784629164377116674/9f796158b90f18d826f896bbf88a8526.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/8d87216b4c851951cc2ec6e4ea542d88.webp)
018,数据合并merge合并2
(18.1)内合并与外合并
- 内合并:只保留两者都有的key(默认模式)
![](https://file.jishuzhan.net/article/1784629164377116674/a3b1b3c298f673c2de8e6b0644b09d3c.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/b8eabeaaaab4301e7c23bb2ae82fdd33.webp)
- 外合并 how='outer':补NaN
![](https://file.jishuzhan.net/article/1784629164377116674/7b86576ec3be0c9f18add02d416a4eaa.webp)
- 左合并,右合并:how='left',how='right'
![](https://file.jishuzhan.net/article/1784629164377116674/ec80b568d3f96e85ca6fe979b7f95adf.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/02585d0cc428b6e9def5064a888ccea6.webp)
(18.2)列冲突的解决
当列冲突时,即有多个列名称相同时,需要使用on=来指定哪一个列作为key,配合suffixes指定冲突列名
可以使用suffixes=自己指定后缀
![](https://file.jishuzhan.net/article/1784629164377116674/a1a0e41eac936b5f0ac52252175f086b.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/b12ed42d97ccd37ef21042ef85efcfb0.webp)
merge合并总结:
- 合并有三种现象:一对一,多对一,多对多。
- 合并默认会找相同的列名进行合并,如果有多个列名相同,用on来指定。
- 如果没有列名相同,但是数据又相同,可以通过left_on,right_on来分别指定要合并的列。
- 如果想和index合并,使用left_index,right_index来指定。
- 如果多个列相同,合并之后可以通过suffixes来区分。
- 还可以通过how来控制合并的结果,默认是内合并,还有外合并outer,左合并left,右合并right。
019,缺失值处理nan
np.nan是浮点类型,能参与到计算中。但计算的结果总是NaN。
![](https://file.jishuzhan.net/article/1784629164377116674/7a5a250962d7dfb9c9258148fa6eee74.webp)
020,缺失值检测
(20.1)Pandas中None与np.nan都视作np.nan
- 创建DataFrame
![](https://file.jishuzhan.net/article/1784629164377116674/e402c2cdedeb6e058957777efa8f7551.webp)
- 使用DataFrame行索引与列索引修改DataFrame数据
![](https://file.jishuzhan.net/article/1784629164377116674/6bc53f9a4bee0f8d391f9a4bf5f43840.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/ba8158a137f744f17dde43df5a234b44.webp)
(20.2)pandas中None与np.nan的操作
(20.2.1)判断函数
- isnull()
- notnull()
![](https://file.jishuzhan.net/article/1784629164377116674/412590bc48a264858d2821f184008827.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/54b0dcd2594163e5ff02a814c1a33e7b.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/adb42f364b9ca9b32356a69db1dc5b6f.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/471cfc3236a557c5f541abc4d8514afb.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/3f21e0831caffd94dc10fcde0340eea3.webp)
021,缺失值处理_过滤数据
(21.1)使用bool值过滤数据
![](https://file.jishuzhan.net/article/1784629164377116674/49b179180788e7403d21fd101ad40211.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/a8e3e939e4dfd90b2042607443e31670.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/e8a5f0d058027b1b9b9d2e35f370d2f3.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/f6372f578aad82b639a01c3f463190be.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/3adc0a3f5cbcb5711f12508c15fd982e.webp)
(21.2)过滤函数dropna
(21.2.1)可以选择过滤的是行还是列(默认为行)
ps:这里数据变了
![](https://file.jishuzhan.net/article/1784629164377116674/b52ee70a348aca5d215c038a5fc74587.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/4d5d7897900abcc8a541f05f73284f12.webp)
(21.2.2)也可以选择过滤的方式 how = 'all'
![](https://file.jishuzhan.net/article/1784629164377116674/46e007acc323993b4870248c202d803c.webp)
(21.2.3)inplace=True 修改原数据
![](https://file.jishuzhan.net/article/1784629164377116674/ee676fdfe4645b3f38bb5a1d1ab4d81f.webp)
022,缺失值处理_填充空值
(22.1)填充函数 fillna() Series/DataFrame
![](https://file.jishuzhan.net/article/1784629164377116674/a4fd0e8e3a991113674d2390bf5bed5d.webp)
(22.2)可以选择前向填充还是后向填充
![](https://file.jishuzhan.net/article/1784629164377116674/952466e792ff730b417821b40db53802.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/ad008067720f3a17c64b0e61d21caf9e.webp)
重新创建数据
![](https://file.jishuzhan.net/article/1784629164377116674/f478eeb86c9d4eaf15bb6940c4c18593.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/5d1b9e31a3a442fbf4fc2f8e38ae3488.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/bda5cecf0763ea6a0344e495c6c5ae4d.webp)
也可以不用重新创建数据,因为没有用到 inplace=True
023,重复值处理
(23.1)使用duplicated() 函数检测重复的行
- 返回元素为布尔类型的Series对象
- 每个元素对应一行,如果该行不是第一次出现,则元素为True
![](https://file.jishuzhan.net/article/1784629164377116674/e18ac2a8e4a9d7ab46aea1166f26e4b7.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/40f3e51399517cca709fe79ebf98091c.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/2508abe2be86434cfd62e8446cb743ec.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/0052b474e0f0d5975964359a5aeaa6cc.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/c0cdf6fb229bfa047276fa80d1208916.webp)
(23.2)使用drop_duplicates() 函数删除重复的行
![](https://file.jishuzhan.net/article/1784629164377116674/3784f3a833b5327285d5bf0a567d0fab.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/b8232905a2893e7dbbb0273b944d6d0c.webp)
024,替换元素replace
使用replace() 函数,对values进行替换操作
![](https://file.jishuzhan.net/article/1784629164377116674/4dc739cb469043865a090ade5d09ac04.webp)
025,数据映射map
map()函数中可以使用lamba函数
![](https://file.jishuzhan.net/article/1784629164377116674/fd6e5462f3126d3c279875c65d420c38.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d8a3f0365415a0e7117716c6b06c0f5b.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/2bc8b58b17aa8c8b47c3068a93f348f1.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/6b38eda4436ae995903b6e0c39eb4a0f.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/c2c76b7fc210fbc2f5b3f30b6a764456.webp)
026,修改索引名rename
![](https://file.jishuzhan.net/article/1784629164377116674/a365a6ab218c40455f4cb8b721c264fd.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/6da2d1ec3242e365b2448cc08ba085d5.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/4eef762f27ed586cd004a5e3f4d06f6c.webp)
027,重置索引reset_index和设置索引set_index
![](https://file.jishuzhan.net/article/1784629164377116674/df86153c45ac9c9bd74fd9c21730dfc1.webp)
028,数据处理apply
apply() 函数:既支持 Series,也支持DataFrame
![](https://file.jishuzhan.net/article/1784629164377116674/e70f56e46afb4ed9e4349109a3e20511.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/aa1dbf20ce66c9d0159bd64ee109d5fd.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/10944bd05225c5a8bd2c4bcf92638c40.webp)
029,数据处理transform
![](https://file.jishuzhan.net/article/1784629164377116674/bcddc555aeafb307d33d8097bad1bce1.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/6a732ebcb13a2ff9a23908a5ece0827c.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d33931745706f96f30a7375efd4ea573.webp)
030,异常值检测和过滤1
(30.1)describe() :查看每一列的描述性统计量
![](https://file.jishuzhan.net/article/1784629164377116674/e094c74e3e614b143fe90f8f21ed4f4c.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/02f0601a80ac74e85d6e39f3fc09d75b.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/7470c80d90527e6f04d85bdc86fe8c36.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/e4a7f1e5dce21b17516403c98b76be80.webp)
(30.2)df.std() :可以求得DataFrame对象每一列的标准差
![](https://file.jishuzhan.net/article/1784629164377116674/f03ebb98b80aba2817c96bb721630494.webp)
(30.3)df.drop() :删除特定索引
![](https://file.jishuzhan.net/article/1784629164377116674/256f6b5f8d11832bb4ca50a2555f5af8.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/72d9a58a74da11614e4d6f45966ef91c.webp)
031, 异常值检测和过滤2
(31.1)unique() :唯一,去重
![](https://file.jishuzhan.net/article/1784629164377116674/3d4c931c900cfe80f9525ddbf8cd0e48.webp)
(31.2)df.query:按条件查询
![](https://file.jishuzhan.net/article/1784629164377116674/b6b2a38c808510f665f766bc999db2da.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/6ebb5ab2a22433c8ac57d5ae0924c9e7.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/099a6e1fbdf43e24b0312f815f5a4ad8.webp)
032,异常值检测和过滤3
(32.1)df.sort_values():根据值排序;df.sort_index():根据索引排序
![](https://file.jishuzhan.net/article/1784629164377116674/905d36c1359eaeab9d4cf4d8191ca41e.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/80f660172ad6c026111166b45444448f.webp)
重新创建数据
![](https://file.jishuzhan.net/article/1784629164377116674/32ffdd477cd27c6b2e301e3503cf6174.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/6669c80e82a1930cdb617c74290e056b.webp)
035,常用聚合函数(count,max,min,median,sum,mean)
![](https://file.jishuzhan.net/article/1784629164377116674/e0ee161ae5ec6709ac013bc6f4bc8c9d.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/15e31fc6a9ec9530771485cef74bd05a.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/3efa6fc25105b60245e48e0d0d4701e2.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/21ef55e1c742ae4294907f2f856a3031.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d5c88dde10e41dc158bbf43ea6b61e0d.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/3c23d65546e43af86d82cdc729f5afbc.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/d153643282349744a26836d8a9c4db74.webp)
038,数据分组聚合
数据聚合是数据处理的最后一步,通常是要使每一个数组生成一个单个的数值。
数据分类处理:
- 分组:先把数据分为几组
- 用函数处理:为不同组的数据应用不同的函数以转换数据
- 合并:把不同组得到的结果合并起来
数据分类处理的核心:groupby()函数
![](https://file.jishuzhan.net/article/1784629164377116674/4e2ef9ccae88545a187a6ebc8a18e27c.webp)
![](https://file.jishuzhan.net/article/1784629164377116674/c9971022e97c1b7bdac8890f24174aa1.webp)
使用.groups属性查看各行的分组情况:
![](https://file.jishuzhan.net/article/1784629164377116674/7b66c048ccc57fff40749eb390fd7be4.webp)
039,CSV数据加载
(39.1)df.to_csv:保存到csv
![](https://file.jishuzhan.net/article/1784629164377116674/664c7c837067c97fd44d7dc4bc87f674.webp)
(39.2)df.read_csv:加载csv数据
![](https://file.jishuzhan.net/article/1784629164377116674/e9e39e01c2f8e8f26fa9305976686acf.webp)