5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
炽烈小老头15 小时前
【每天学习一点算法 2026/04/11】Pow(x, n)
学习·算法
舒一笑15 小时前
AI 时代最火的新岗位,不是提示词工程师,而是 Harness 工程师
人工智能·程序员·设计
旖-旎15 小时前
哈希表(存在重复元素)(3)
数据结构·c++·学习·算法·leetcode·散列表
apcipot_rain15 小时前
Python实战——蒙特卡洛模拟分析杀牌游戏技能收益
python·游戏·数学建模
明月醉窗台15 小时前
[jetson] AGX Xavier 安装Ubuntu18.04及jetpack4.5
人工智能·算法·nvidia·cuda·jetson
青稞社区.15 小时前
从最基础的模型出发,深度剖析高性能 VLA 的设计空间
人工智能·agi
老绿光15 小时前
Python 字典完全指南:从入门到实战
linux·服务器·python
weixin_5134499615 小时前
walk_these_ways项目学习记录第九篇(通过行为多样性 (MoB) 实现地形泛化)--学习算法
学习·算法·机器学习
夜猫逐梦15 小时前
【AI】 Claude Code 源码泄露:一场关于安全与学习的风波
人工智能·安全·claude code·源码泄漏
浔川python社15 小时前
更多人工智能出现,会带来哪些利与弊
人工智能