5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
雾蓝回针1 分钟前
[全网首发]解决Parallels Desktop运行“第五人格“时 使用涂鸦/快捷发言会导致视角偏移的问题
笔记·macos
小糖学代码4 分钟前
LLM系列:1.python入门:2.数值型对象
人工智能·python·ai
gs8014016 分钟前
Ascend 服务器是什么?(Ascend Server / 昇腾服务器)
运维·服务器·人工智能
csdn_aspnet23 分钟前
AI赋能各类主流编程语言
人工智能·ai·软件开发
狐5723 分钟前
2025-12-04-牛客刷题笔记-25_12-4-质数统计
笔记·算法
零日失眠者26 分钟前
这5个Python库一旦掌握就离不开
后端·python
齐生126 分钟前
iOS 知识点 - 一篇文章弄清「输入事件系统」(【事件传递机制、响应链机制】以及相关知识点)
笔记·面试
CodeNerd影32 分钟前
RAG文件检索增强(基于吴恩达课程)
人工智能
用户83562907805132 分钟前
如何使用 Python 从 Word 文档中批量提取表格数据
后端·python
阿里云大数据AI技术39 分钟前
一行代码,让Elasticsearch 集群瞬间雪崩——5000W 数据压测下的性能避坑全攻略
人工智能