数据分析-Pandas分类数据的get和set

数据分析和处理中，难免会遇到各种数据，那么数据呈现怎样的规律呢？不管金融数据，风控数据，营销数据等等，莫不如此。如何通过图示展示数据的规律？

数据表，时间序列数据在数据分析建模中很常见，例如天气预报，空气状态监测，股票交易等金融场景。数据分析过程中重新调整，重塑数据表是很重要的技巧，此处选择Titanic数据，以及巴黎、伦敦欧洲城市空气质量监测 N O 2 NO_2 NO2数据作为样例。

数据分析

数据分析-Pandas如何转换产生新列

数据分析-Pandas如何统计数据概况

数据分析-Pandas如何轻松处理时间序列数据

数据分析-Pandas如何选择数据子集

数据分析-Pandas如何重塑数据表-CSDN博客

实验数据分析处理，股票序列，时间序列，信号序列，有时候表格的数据并不完全是数值类型，也有可能是字符串，或者其他数据，需要做分类处理。pandas如何控制数据分类处理呢？需要配置哪些参数？

数据梳理

优化的 pandas 的.loc 、.iloc 、.at 和 .iat，使得数据访问方式正常。唯一的区别是返回类型（用于获取），和可以赋值已有的数值。categories

Getting

如果切片操作返回category类型的 DataFrame 或Series的列，则保留 dtype。

python 复制代码

In [144]: idx = pd.Index(["h", "i", "j", "k", "l", "m", "n"])
In [145]: cats = pd.Series(["a", "b", "b", "b", "c", "c", "c"], dtype="category", index=idx)
In [146]: values = [1, 2, 2, 2, 3, 4, 5]
In [147]: df = pd.DataFrame({"cats": cats, "values": values}, index=idx)
In [148]: df.iloc[2:4, :]
Out[148]: 
  cats  values
j    b       2
k    b       2

In [149]: df.iloc[2:4, :].dtypes
Out[149]: 
cats      category
values       int64
dtype: object

In [150]: df.loc["h":"j", "cats"]
Out[150]: 
h    a
i    b
j    b
Name: cats, dtype: category
Categories (3, object): ['a', 'b', 'c']

In [151]: df[df["cats"] == "b"]
Out[151]: 
  cats  values
i    b       2
j    b       2
k    b       2

但是，如果您只选择一个行，不保留类别类型，结果为 dtype：Series``object

python 复制代码

# get the complete "h" row as a Series
In [152]: df.loc["h", :]
Out[152]: 
cats      a
values    1
Name: h, dtype: object

如果从分类数据只返回一个项，也只返回该项的值，而不是长度为"1"的分类数据。

python 复制代码

In [153]: df.iat[0, 0]
Out[153]: 'a'

In [154]: df["cats"] = df["cats"].cat.rename_categories(["x", "y", "z"])
In [155]: df.at["h", "cats"]  # returns a string
Out[155]: 'x'

要获取类型的单个值，需要传入一个list，其中包含单个值

python 复制代码

In [156]: df.loc[["h"], "cats"]
Out[156]: 
h    x
Name: cats, dtype: category
Categories (3, object): ['x', 'y', 'z']

Setting

在包含categories列赋值数值，和其他的categories赋值一样，类型需要包含在内。

python 复制代码

In [169]: idx = pd.Index(["h", "i", "j", "k", "l", "m", "n"])
In [170]: cats = pd.Categorical(["a", "a", "a", "a", "a", "a", "a"], categories=["a", "b"])
In [171]: values = [1, 1, 1, 1, 1, 1, 1]
In [172]: df = pd.DataFrame({"cats": cats, "values": values}, index=idx)
In [173]: df.iloc[2:4, :] = [["b", 2], ["b", 2]]

In [174]: df
Out[174]: 
  cats  values
h    a       1
i    a       1
j    b       2
k    b       2
l    a       1
m    a       1
n    a       1

In [175]: try:
   .....:     df.iloc[2:4, :] = [["c", 3], ["c", 3]]
   .....: except TypeError as e:
   .....:     print("TypeError:", str(e))
   .....: 
TypeError: Cannot setitem on a Categorical with a new category, set the categories first

当然，通过赋值分类数据也能检查两者是否匹配：

python 复制代码

In [176]: df.loc["j":"k", "cats"] = pd.Categorical(["a", "a"], categories=["a", "b"])
In [177]: df
Out[177]: 
  cats  values
h    a       1
i    a       1
j    a       2
k    a       2
l    a       1
m    a       1
n    a       1

In [178]: try:
   .....:     df.loc["j":"k", "cats"] = pd.Categorical(["b", "b"], categories=["a", "b", "c"])
   .....: except TypeError as e:
   .....:     print("TypeError:", str(e))
   .....: 
TypeError: Cannot set a Categorical with another, without identical categories

如果把类型数据赋值DataFrame数值的一部分，那也只有赋值数值部分，类型不涉及。

python 复制代码

In [179]: df = pd.DataFrame({"a": [1, 1, 1, 1, 1], "b": ["a", "a", "a", "a", "a"]})
In [180]: df.loc[1:2, "a"] = pd.Categorical(["b", "b"], categories=["a", "b"])
In [181]: df.loc[2:3, "b"] = pd.Categorical(["b", "b"], categories=["a", "b"])
In [182]: df
Out[182]: 
   a  b
0  1  a
1  b  a
2  b  b
3  1  b
4  1  a

In [183]: df.dtypes
Out[183]: 
a    object
b    object
dtype: object

以上代码只是一个简单示例，示例代码中的表达式可以根据实际问题进行修改。

后面介绍下其他的展示形式。

觉得有用 收藏收藏收藏

点个赞点个赞点个赞

End

GPT专栏文章：

GPT实战系列-ChatGLM3本地部署CUDA11+1080Ti+显卡24G实战方案

GPT实战系列-LangChain + ChatGLM3构建天气查询助手

大模型查询工具助手之股票免费查询接口

GPT实战系列-简单聊聊LangChain

GPT实战系列-大模型为我所用之借用ChatGLM3构建查询助手

GPT实战系列-P-Tuning本地化训练ChatGLM2等LLM模型，到底做了什么？(二)