5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
冬奇Lab3 分钟前
Agent 系列(15):Agent 记忆系统进阶——短期、长期、压缩,三层记忆架构
人工智能·llm·agent
大雨淅淅3 分钟前
【机器人】ROS2 机械臂控制(MoveIt2)从入门到实战
人工智能·python·神经网络·学习·算法·机器学习·机器人
m0_564876846 分钟前
怎么写好一个好的skill
人工智能·深度学习·职场和发展
星雨流星天的笔记本6 分钟前
英语介词学习
学习
zhangfeng11336 分钟前
把权重写死在芯片的架构 Taalas(HC1)芯片:车载 GPU / 智能驾驶 / 机器人 / 算力卡适配总结
人工智能·深度学习·语言模型·架构·机器人·gpu算力·芯片
芝士爱知识a7 分钟前
【2026量化新纪元】深度评测:以AlphaGBM为核心的顶级AI量化分析软件推荐及全维度选型指南
人工智能·机器学习·因子挖掘·ai量化·alphagbm·量化交易软件测评
kgduu8 分钟前
msi文件右键以管理员身份运行
笔记
OBiO20138 分钟前
精准靶向血管平滑肌AAV在心血管疾病研究中的应用
人工智能
ST——Jess8 分钟前
传统文化的数智化解构:当代专业命理师排盘工具与效能进化深度测评报告
人工智能
孟俊宇-MJY9 分钟前
CSDN AI数字营销全功能实测
大数据·人工智能