人工智能【第7篇】数据可视化：Matplotlib与Seaborn实战（万字长文+完整代码）

作者的话 ：数据可视化是数据分析和AI开发中不可或缺的技能。一张好的图表胜过千言万语！本文将带你从零开始，掌握Python最强大的两个可视化库------Matplotlib 和Seaborn，让你轻松创建专业级的数据可视化作品！

一、Matplotlib：Python可视化的基石

1.1 什么是Matplotlib？

Matplotlib是Python最基础、最常用的数据可视化库，提供了类似MATLAB的绘图接口。

核心特点：

完全控制图表的每个元素
支持多种输出格式（PNG、PDF、SVG等）
与NumPy、Pandas无缝集成
可高度自定义的样式系统

安装：

python 复制代码

pip install matplotlib

1.2 快速入门：第一个图表

python 复制代码

import matplotlib.pyplot as plt
import numpy as np

# 准备数据
x = np.linspace(0, 10, 100)
y = np.sin(x)

# 创建图表
plt.figure(figsize=(10, 6))
plt.plot(x, y, label='sin(x)', color='blue', linewidth=2)

# 添加标题和标签
plt.title('正弦函数曲线', fontsize=16)
plt.xlabel('X轴', fontsize=12)
plt.ylabel('Y轴', fontsize=12)

# 添加图例和网格
plt.legend()
plt.grid(True, alpha=0.3)

# 显示图表
plt.show()

1.3 常用图表类型

图表类型	函数	适用场景
折线图	plt.plot()	趋势变化、时间序列
散点图	plt.scatter()	相关性分析、分布展示
柱状图	plt.bar()	类别比较
直方图	plt.hist()	数据分布
饼图	plt.pie()	占比展示
箱线图	plt.boxplot()	数据分布和异常值

1.4 折线图详解

python 复制代码

# 多条折线对比
x = np.arange(0, 10, 0.1)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.sin(x) * np.cos(x)

plt.figure(figsize=(12, 6))

plt.plot(x, y1, label='sin(x)', linestyle='-', color='red', linewidth=2)
plt.plot(x, y2, label='cos(x)', linestyle='--', color='blue', linewidth=2)
plt.plot(x, y3, label='sin(x)*cos(x)', linestyle=':', color='green', linewidth=2)

plt.title('三角函数对比', fontsize=16)
plt.xlabel('弧度', fontsize=12)
plt.ylabel('函数值', fontsize=12)
plt.legend(loc='upper right')
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
plt.axvline(x=0, color='black', linestyle='-', linewidth=0.5)

plt.show()

1.5 散点图详解

python 复制代码

# 生成数据
np.random.seed(42)
n = 100
x = np.random.randn(n)
y = 2 * x + np.random.randn(n) * 0.5
colors = np.random.rand(n)
sizes = 1000 * np.random.rand(n)

plt.figure(figsize=(10, 6))
plt.scatter(x, y, c=colors, s=sizes, alpha=0.6, cmap='viridis')
plt.colorbar(label='颜色值')
plt.title('散点图示例', fontsize=16)
plt.xlabel('X值', fontsize=12)
plt.ylabel('Y值', fontsize=12)
plt.show()

1.6 柱状图详解

python 复制代码

# 单组柱状图
categories = ['A', 'B', 'C', 'D', 'E']
values = [23, 45, 56, 78, 32]

plt.figure(figsize=(10, 6))
bars = plt.bar(categories, values, color=['red', 'blue', 'green', 'orange', 'purple'])

# 在柱子上添加数值
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{height}', ha='center', va='bottom')

plt.title('类别数值对比', fontsize=16)
plt.xlabel('类别', fontsize=12)
plt.ylabel('数值', fontsize=12)
plt.show()

# 分组柱状图
x = np.arange(5)
width = 0.35
men_means = [20, 34, 30, 35, 27]
women_means = [25, 32, 34, 20, 25]

fig, ax = plt.subplots(figsize=(10, 6))
rects1 = ax.bar(x - width/2, men_means, width, label='男性', color='skyblue')
rects2 = ax.bar(x + width/2, women_means, width, label='女性', color='lightcoral')

ax.set_xlabel('组别')
ax.set_ylabel('数值')
ax.set_title('男女数值对比')
ax.set_xticks(x)
ax.set_xticklabels(['G1', 'G2', 'G3', 'G4', 'G5'])
ax.legend()
plt.show()

1.7 子图布局

python 复制代码

# 创建2x2子图
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 子图1：折线图
x = np.linspace(0, 10, 100)
axes[0, 0].plot(x, np.sin(x))
axes[0, 0].set_title('正弦函数')

# 子图2：散点图
axes[0, 1].scatter(np.random.randn(100), np.random.randn(100))
axes[0, 1].set_title('散点图')

# 子图3：柱状图
axes[1, 0].bar(['A', 'B', 'C'], [10, 20, 15])
axes[1, 0].set_title('柱状图')

# 子图4：直方图
axes[1, 1].hist(np.random.randn(1000), bins=30)
axes[1, 1].set_title('直方图')

plt.tight_layout()
plt.show()

1.8 图表美化技巧

python 复制代码

# 设置中文字体
plt.rcParams['font.sans-serif'] = ['SimHei', 'Arial Unicode MS']
plt.rcParams['axes.unicode_minus'] = False

# 自定义样式
plt.style.use('seaborn-v0_8-darkgrid')

# 完整示例：专业级图表
fig, ax = plt.subplots(figsize=(12, 7))

# 数据
months = ['1月', '2月', '3月', '4月', '5月', '6月']
sales = [120, 150, 180, 160, 200, 220]
profit = [20, 30, 45, 35, 50, 60]

# 主坐标轴 - 销售额
ax.bar(months, sales, alpha=0.7, color='steelblue', label='销售额')
ax.set_xlabel('月份', fontsize=12)
ax.set_ylabel('销售额（万元）', fontsize=12, color='steelblue')
ax.tick_params(axis='y', labelcolor='steelblue')

# 次坐标轴 - 利润
ax2 = ax.twinx()
ax2.plot(months, profit, color='red', marker='o', linewidth=2, label='利润')
ax2.set_ylabel('利润（万元）', fontsize=12, color='red')
ax2.tick_params(axis='y', labelcolor='red')

# 标题和图例
ax.set_title('2024年上半年销售业绩', fontsize=16, pad=20)
lines1, labels1 = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines1 + lines2, labels1 + labels2, loc='upper left')

plt.show()

二、Seaborn：统计可视化利器

2.1 什么是Seaborn？

Seaborn是基于Matplotlib的高级统计数据可视化库，提供了更美观的默认样式和更简洁的API。

核心特点：

美观的默认样式
内置统计估计和置信区间
自动处理Pandas DataFrame
丰富的统计图表类型

安装：

python 复制代码

pip install seaborn

2.2 快速开始

python 复制代码

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# 设置样式
sns.set_style('whitegrid')
sns.set_palette('husl')

# 加载示例数据
tips = sns.load_dataset('tips')
print(tips.head())

2.3 分类数据可视化

python 复制代码

# 箱线图
plt.figure(figsize=(10, 6))
sns.boxplot(data=tips, x='day', y='total_bill', hue='sex')
plt.title('不同日期消费金额分布')
plt.show()

# 小提琴图
plt.figure(figsize=(10, 6))
sns.violinplot(data=tips, x='day', y='total_bill', hue='sex', split=True)
plt.title('消费金额分布（小提琴图）')
plt.show()

# 柱状图（带误差线）
plt.figure(figsize=(10, 6))
sns.barplot(data=tips, x='day', y='total_bill', hue='sex')
plt.title('平均消费金额')
plt.show()

2.4 分布数据可视化

python 复制代码

# 直方图和核密度估计
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
sns.histplot(tips['total_bill'], kde=True, bins=20)
plt.title('消费金额分布')

plt.subplot(1, 2, 2)
sns.kdeplot(data=tips, x='total_bill', hue='sex', fill=True)
plt.title('按性别区分的消费金额分布')

plt.tight_layout()
plt.show()

# 联合分布图
sns.jointplot(data=tips, x='total_bill', y='tip', kind='scatter', hue='sex')
plt.show()

# 成对关系图
sns.pairplot(tips, hue='sex', diag_kind='kde')
plt.show()

2.5 关系数据可视化

python 复制代码

# 散点图（带回归线）
plt.figure(figsize=(10, 6))
sns.regplot(data=tips, x='total_bill', y='tip', scatter_kws={'alpha':0.5})
plt.title('消费金额与小费关系')
plt.show()

# 多变量散点图
plt.figure(figsize=(10, 6))
sns.scatterplot(data=tips, x='total_bill', y='tip', hue='sex', 
                size='size', sizes=(50, 300), alpha=0.7)
plt.title('消费金额与小费关系（多维度）')
plt.show()

# 热力图
plt.figure(figsize=(10, 8))
corr = tips.select_dtypes(include=[np.number]).corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0, 
            square=True, fmt='.2f')
plt.title('数值特征相关性热力图')
plt.show()

2.6 高级图表

python 复制代码

# FacetGrid：分面网格
g = sns.FacetGrid(tips, col='sex', row='smoker', margin_titles=True)
g.map(sns.scatterplot, 'total_bill', 'tip')
g.add_legend()
plt.show()

# Catplot：分类图表集合
g = sns.catplot(data=tips, x='day', y='total_bill', hue='sex', 
                kind='box', col='smoker')
plt.show()

# 分面散点图
sns.lmplot(data=tips, x='total_bill', y='tip', col='sex', row='smoker')
plt.show()

三、实战案例：完整的数据可视化项目

3.1 案例背景

分析某电商平台的销售数据，创建完整的数据可视化报告。

python 复制代码

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# 设置样式
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('Set2')
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

# 生成模拟数据
np.random.seed(42)
n_orders = 1000

data = pd.DataFrame({
    'order_id': range(1, n_orders + 1),
    'date': pd.date_range('2024-01-01', periods=n_orders, freq='H'),
    'category': np.random.choice(['电子产品', '服装', '食品', '家居'], n_orders),
    'region': np.random.choice(['华东', '华南', '华北', '西部'], n_orders),
    'amount': np.random.uniform(50, 1000, n_orders),
    'quantity': np.random.randint(1, 10, n_orders),
    'customer_age': np.random.randint(18, 65, n_orders),
    'customer_gender': np.random.choice(['男', '女'], n_orders)
})

data['month'] = data['date'].dt.month
data['day_of_week'] = data['date'].dt.day_name()
data['hour'] = data['date'].dt.hour

print(f"数据集规模：{len(data)}条记录")
print(data.head())

3.2 销售趋势分析

python 复制代码

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 1. 月度销售趋势
monthly_sales = data.groupby('month')['amount'].sum()
axes[0, 0].plot(monthly_sales.index, monthly_sales.values, 
                marker='o', linewidth=2, markersize=8)
axes[0, 0].set_title('月度销售趋势', fontsize=14)
axes[0, 0].set_xlabel('月份')
axes[0, 0].set_ylabel('销售额')
axes[0, 0].grid(True, alpha=0.3)

# 2. 品类销售对比
category_sales = data.groupby('category')['amount'].sum().sort_values(ascending=True)
axes[0, 1].barh(category_sales.index, category_sales.values, color='steelblue')
axes[0, 1].set_title('各品类销售额', fontsize=14)
axes[0, 1].set_xlabel('销售额')

# 3. 区域销售分布
region_sales = data.groupby('region')['amount'].sum()
axes[1, 0].pie(region_sales.values, labels=region_sales.index, 
               autopct='%1.1f%%', startangle=90)
axes[1, 0].set_title('区域销售占比', fontsize=14)

# 4. 时段销售热力图
hourly_sales = data.pivot_table(values='amount', index='day_of_week', 
                                columns='hour', aggfunc='sum')
sns.heatmap(hourly_sales, cmap='YlOrRd', ax=axes[1, 1], cbar_kws={'label': '销售额'})
axes[1, 1].set_title('时段销售热力图', fontsize=14)
axes[1, 1].set_xlabel('小时')
axes[1, 1].set_ylabel('星期')

plt.tight_layout()
plt.show()

3.3 客户画像分析

python 复制代码

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 1. 年龄分布
axes[0, 0].hist(data['customer_age'], bins=20, color='skyblue', edgecolor='black')
axes[0, 0].set_title('客户年龄分布', fontsize=14)
axes[0, 0].set_xlabel('年龄')
axes[0, 0].set_ylabel('人数')

# 2. 性别消费对比
gender_sales = data.groupby('customer_gender')['amount'].agg(['sum', 'mean'])
x = np.arange(len(gender_sales.index))
width = 0.35
axes[0, 1].bar(x - width/2, gender_sales['sum']/1000, width, label='总销售额(千元)')
ax2 = axes[0, 1].twinx()
ax2.bar(x + width/2, gender_sales['mean'], width, color='orange', label='平均消费')
axes[0, 1].set_title('性别消费对比', fontsize=14)
axes[0, 1].set_xticks(x)
axes[0, 1].set_xticklabels(gender_sales.index)

# 3. 年龄与消费关系
axes[1, 0].scatter(data['customer_age'], data['amount'], alpha=0.5)
axes[1, 0].set_title('年龄与消费金额关系', fontsize=14)
axes[1, 0].set_xlabel('年龄')
axes[1, 0].set_ylabel('消费金额')

# 添加回归线
z = np.polyfit(data['customer_age'], data['amount'], 1)
p = np.poly1d(z)
axes[1, 0].plot(data['customer_age'], p(data['customer_age']), "r--", alpha=0.8)

# 4. 箱线图：各品类消费分布
sns.boxplot(data=data, x='category', y='amount', ax=axes[1, 1])
axes[1, 1].set_title('各品类消费分布', fontsize=14)
axes[1, 1].set_xlabel('品类')
axes[1, 1].set_ylabel('消费金额')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

3.4 交互式图表（使用Matplotlib动画）

python 复制代码

from matplotlib.animation import FuncAnimation

# 动态展示销售趋势
fig, ax = plt.subplots(figsize=(12, 6))

months = sorted(data['month'].unique())
categories = data['category'].unique()

lines = []
for i, cat in enumerate(categories):
    line, = ax.plot([], [], marker='o', label=cat, linewidth=2)
    lines.append(line)

ax.set_xlim(1, 12)
ax.set_ylim(0, data.groupby(['month', 'category'])['amount'].sum().max() * 1.1)
ax.set_xlabel('月份')
ax.set_ylabel('销售额')
ax.set_title('各品类月度销售趋势动画')
ax.legend()
ax.grid(True, alpha=0.3)

def init():
    for line in lines:
        line.set_data([], [])
    return lines

def animate(frame):
    current_months = months[:frame+1]
    for i, cat in enumerate(categories):
        cat_data = data[data['category'] == cat].groupby('month')['amount'].sum()
        y_values = [cat_data.get(m, 0) for m in current_months]
        lines[i].set_data(current_months, y_values)
    return lines

anim = FuncAnimation(fig, animate, init_func=init, frames=len(months), 
                     interval=500, blit=True, repeat=True)

plt.show()

# 保存动画（需要安装ffmpeg）
# anim.save('sales_animation.mp4', writer='ffmpeg', fps=2)

四、Matplotlib vs Seaborn 对比

特性	Matplotlib	Seaborn
控制粒度	完全控制	高级抽象
学习曲线	较陡	较平缓
默认样式	简单	美观
统计功能	基础	丰富
Pandas集成	良好	优秀
适用场景	精细控制、出版级	快速探索、统计分析

五、保存和导出图表

python 复制代码

# 保存为不同格式
plt.savefig('figure.png', dpi=300, bbox_inches='tight')
plt.savefig('figure.pdf', bbox_inches='tight')
plt.savefig('figure.svg', bbox_inches='tight')

# 设置全局参数
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 100
plt.rcParams['savefig.dpi'] = 300
plt.rcParams['font.size'] = 12

六、学习资源推荐

官方文档：Matplotlib和Seaborn官方文档
《Python数据可视化》：系统学习数据可视化
Kaggle Notebooks：大量可视化案例
ColorBrewer：专业配色方案

下一篇预告：【第8篇】监督学习实战：线性回归与逻辑回归算法详解

本文为系列第7篇，系统讲解了Python数据可视化的核心技术。如有疑问欢迎评论区交流！

标签：#Matplotlib #Seaborn #Python #数据可视化 #人工智能 #数据分析 #教程