seaborn内置4份数据集，绘制24个精美图形！

公众号：尤而小屋

作者：Peter

编辑：Peter

大家好，我是Peter~

今天给大家介绍基于seaborn的4份内置数据集绘制24个精美图形，代码复制即可运行。

0 什么是seaborn

Seaborn是一个基于Python的高级数据可视化库，专为数据探索和分析而设计。

首先，Seaborn建立在Matplotlib之上，提供了更加方便的接口来创建各种图表，使得数据分析的过程更加直观和高效。它支持Pandas和Numpy数据结构，这使得它在处理和可视化数据时非常灵活。

Seaborn的主要特点包括：

数据可视化：提供了一系列内置的数据集，以及丰富的图表类型，如条形图、箱线图、散点图等，用于数据的探索性分析。
样式控制：Seaborn允许用户轻松地控制图表的风格和颜色，使得图表更加美观和符合出版标准。
统计分析：除了绘图功能，Seaborn还提供了一些统计方法，帮助用户进行数据的统计分析。
多变量可视化和网格绘图：Seaborn支持多变量数据的可视化，并且可以轻松地在单个图表中绘制多个变量，以便比较和分析。

1 导入库

In [1]:

python 复制代码

import pandas as pd
import numpy as np

import matplotlib
import matplotlib.pyplot as plt
plt.rcParams["font.sans-serif"]=["SimHei"] # 设置字体
plt.rcParams["axes.unicode_minus"]=False # 解决"-"负号的乱码问题

import seaborn as sns
%matplotlib inline 

import warnings
warnings.filterwarnings("ignore")

2 回归散点图relplot

默认是散点图

tips数据集是seaborn库中的一个内置数据集，主要用于统计分析和数据可视化，这个数据集包含了餐饮行业的顾客小费数据。

在seaborn中导入数据使用load_dataset函数。

In [2]:

ini 复制代码

tips = sns.load_dataset("tips")
tips

Out[2]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4
...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3
240	27.18	2.00	Female	Yes	Sat	Dinner	2
241	22.67	2.00	Male	Yes	Sat	Dinner	2
242	17.82	1.75	Male	No	Sat	Dinner	2
243	18.78	3.00	Female	No	Thur	Dinner	2

244 rows × 7 columns

In [3]:

复制代码

tips.columns

Out[3]:

css 复制代码

Index(['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'], dtype='object')

绘制两个变量的回归散点图：

In [4]:

ini 复制代码

sns.relplot(tips, x="total_bill",y="tip")

plt.show()

使用参数hue：

In [5]:

ini 复制代码

sns.relplot(tips, x="total_bill",y="tip", hue="sex")

plt.show()

添加基于col（columns）参数的多子图（一行多列）：

In [6]:

ini 复制代码

sns.relplot(tips, x="total_bill",y="tip", hue="sex", col="day")

plt.show()

同时添加col（列方向）和row（行方向）参数：

In [7]:

ini 复制代码

sns.relplot(tips, x="total_bill",y="tip", hue="sex", col="day", row="smoker")

plt.show()

通过col_wrap参数控制每行的个数：

In [8]:

ini 复制代码

sns.relplot(tips, # 传入数据
            x="total_bill", # x-y数据
            y="tip",
            hue="sex",  # 分组参数指定为sex
            col="day",  # 列参数
            col_wrap=2  # 每两列为一组进行绘制
           )

plt.show()

更为复杂的回归散点图：

In [9]:

ini 复制代码

sns.relplot(tips, # 传入数据
            x="total_bill", # x-y数据
            y="tip",
            hue="sex",  # 分组参数指定为sex
            col="time",  # 列参数
            size="size", # 散点大小
            style="sex",  # 散点的不同风格：圆点或十字星
            palette=["b","r"], # 颜色
            sizes=(10,100)  # 散点大小
           )

plt.show()

3 回归线形图relplot(kind="line")

seaborn库中的"fmri"数据集是一个关于事件相关功能核磁共振成像的数据集。它包含了测试者、时间点、事件、刺激类型、大脑区域和信号等信息。这个数据集可以帮助我们了解大脑在不同刺激和时间点下的活动模式。

In [10]:

ini 复制代码

fmri = sns.load_dataset("fmri")
fmri

Out[10]:

	subject	timepoint	event	region	signal
0	s13	18	stim	parietal	-0.017552
1	s5	14	stim	parietal	-0.080883
2	s12	18	stim	parietal	-0.081033
3	s11	18	stim	parietal	-0.046134
4	s10	18	stim	parietal	-0.037970
...	...	...	...	...	...
1059	s0	8	cue	frontal	0.018165
1060	s13	7	cue	frontal	-0.029130
1061	s12	7	cue	frontal	-0.004939
1062	s11	7	cue	frontal	-0.025367
1063	s0	0	cue	parietal	-0.006899

1064 rows × 5 columns

In [11]:

ini 复制代码

plt.figure(figsize=(8, 5))

sns.relplot(data=fmri,
           x="timepoint",
           y="signal",
           kind="line",  #  设置回归图像为线性line，默认是散点
           col="region",
           hue="event",
           style="event" 
           )

plt.show()

指定子图的大小、宽与高的比例：

In [12]:

ini 复制代码

plt.figure(figsize=(8, 5))

sns.relplot(data=fmri,
            x="timepoint",
            y="signal",
            kind="line",  #  设置回归图像为线性line，默认是散点
            col="region",
            hue="event",
            style="event",
            height=4,# 高度
            aspect=0.7,  # x-y轴的比例
           )

plt.show()

4 线性图lineplot

seaborn库中的"flights"数据集是一个关于航空公司从1949年到1960年每个月乘坐人数的数据集。这个数据集有三列，分别是年、月和乘客数量。通过热度图，我们可以直观地看到每个月的乘客数量分布情况。

In [13]:

ini 复制代码

flights = sns.load_dataset("flights")

flights.head()

Out[13]:

	year	month	passengers
0	1949	Jan	112
1	1949	Feb	118
2	1949	Mar	132
3	1949	Apr	129
4	1949	May	121

In [14]:

ini 复制代码

# 指定获取某个月的数据

feb = flights.query("month == 'Feb'")
feb

Out[14]:

	year	month	passengers
1	1949	Feb	118
13	1950	Feb	126
25	1951	Feb	150
37	1952	Feb	180
49	1953	Feb	196
61	1954	Feb	188
73	1955	Feb	233
85	1956	Feb	277
97	1957	Feb	301
109	1958	Feb	318
121	1959	Feb	342
133	1960	Feb	391

绘制普通折线图：

In [15]:

ini 复制代码

sns.lineplot(data=feb, x="year", y="passengers")

plt.grid()
plt.show()

将数据flights变成宽表的数据结构：

In [16]:

ini 复制代码

flights_wide = flights.pivot(index="year", columns="month", values="passengers")

flights_wide.head()

Out[16]:

month	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
year
1949	112	118	132	129	121	135	148	148	136	119	104	118
1950	115	126	141	135	125	149	170	170	158	133	114	140
1951	145	150	178	163	172	178	199	199	184	162	146	166
1952	171	180	193	181	183	218	230	242	209	191	172	194
1953	196	196	236	235	229	243	264	272	237	211	180	201

绘制下图得到相同的效果：

In [17]:

ini 复制代码

sns.lineplot(data=flights_wide["Feb"])

plt.grid()
plt.show()

将所有的月份全部绘制出来：

In [18]:

ini 复制代码

sns.lineplot(data=flights_wide)   # 基于宽表数据

plt.show()

传递hue、size、style等参数：

In [19]:

ini 复制代码

sns.lineplot(data=flights, x="year",y="passengers", hue="month")   # 基于原始数据

plt.show()

显示不同的标志markers：

In [20]:

ini 复制代码

sns.lineplot(data=flights, 
             x="year",
             y="passengers",
             hue="month", 
             style="month",
             markers=True,  # 显示不同的标志markers
            )   

plt.show()

看看参数dashes的效果：

In [21]:

ini 复制代码

sns.lineplot(data=flights, 
             x="year",
             y="passengers",
             hue="month", 
             style="month",
             markers=True,  # 显示标志
             dashes=False
            )   
plt.show()

orient：指定对绘图的垂直维度进行聚合和排序

In [22]:

ini 复制代码

sns.lineplot(data=flights, x="year", y="passengers", orient="x")

plt.show()

带上误差棒的线形图：

In [23]:

ini 复制代码

sns.lineplot(data=fmri, # 使用上面的fmri数据集
             x="timepoint",
             y="signal",
             hue="event",
             err_style="bars", 
             errorbar=("se", 2) 
            )

plt.grid()  # 显示网格
plt.show()

5 概率密度图displot

displot结合了直方图和核密度图（KDE）

Penguins数据集是一个常用于数据科学和机器学习领域的数据集，它包含了对南极洲不同地区生活的企鹅种群的研究数据。

这个数据集通常用于数据探索、可视化以及分类任务的教学和实践中。

In [24]:

ini 复制代码

penguins = sns.load_dataset("penguins")
penguins

Out[24]:

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female
...	...	...	...	...	...	...	...
339	Gentoo	Biscoe	NaN	NaN	NaN	NaN	NaN
340	Gentoo	Biscoe	46.8	14.3	215.0	4850.0	Female
341	Gentoo	Biscoe	50.4	15.7	222.0	5750.0	Male
342	Gentoo	Biscoe	45.2	14.8	212.0	5200.0	Female
343	Gentoo	Biscoe	49.9	16.1	213.0	5400.0	Male

344 rows × 7 columns

In [25]:

复制代码

penguins.columns

Out[25]:

css 复制代码

Index(['species', 'island', 'bill_length_mm', 'bill_depth_mm',       'flipper_length_mm', 'body_mass_g', 'sex'],
      dtype='object')

1、基于直方图

In [26]:

ini 复制代码

sns.displot(data=penguins,x="bill_length_mm")
plt.show()

2、基于kde密度图：

In [27]:

ini 复制代码

sns.displot(data=penguins,x="bill_length_mm", kind="kde")

plt.grid()
plt.show()

3、基于经验累积分布函数ecdf

In [28]:

ini 复制代码

sns.displot(data=penguins, x="bill_length_mm", kind="ecdf")

plt.grid()
plt.show()

4、同时显示直方图和kde图

In [29]:

ini 复制代码

sns.displot(data=penguins,x="bill_length_mm",kde=True)

plt.show()

5、绘制两个变量的关系图：仅用于直方图和KDEs

In [30]:

ini 复制代码

#  1-默认直方图
sns.displot(data=penguins,x="bill_length_mm",y="flipper_length_mm")

plt.show()

指定为KDES：

In [31]:

ini 复制代码

# 2-指定为KDES
sns.displot(data=penguins,x="bill_length_mm",y="flipper_length_mm",kind="kde")

plt.show()

6、绘制边缘值rug参数：

In [32]:

ini 复制代码

sns.displot(data=penguins,
            x="bill_length_mm",
            y="flipper_length_mm",
            kind="kde",  # 指定为kde
            rug=True  #  显示坐标轴的边缘值
           )

plt.show()

7、使用hue参数绘制不同类型：

In [33]:

ini 复制代码

sns.displot(data=penguins,x="bill_length_mm", kind="kde",hue="species")

# plt.grid()
plt.show()