5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
真智AI6 分钟前
用 LLM 辅助生成可跑的 Python 单元测试:pytest + coverage 覆盖率报告(含运行指令与排坑)
python·单元测试·pytest
中屹指纹浏览器6 分钟前
2026年指纹浏览器技术迭代与风控对抗演进
经验分享·笔记
zy_destiny15 分钟前
【工业场景】用YOLOv26实现桥梁检测
人工智能·深度学习·yolo·机器学习·计算机视觉·目标跟踪
2501_9418372618 分钟前
蘑菇可食用性分类识别_YOLO11分割模型实现与优化_1
人工智能·数据挖掘
2501_9418372618 分钟前
基于YOLO11-Aux改进的圣女果目标检测实现
人工智能·目标检测·计算机视觉
1104.北光c°19 分钟前
【从零开始学Redis | 第一篇】Redis常用数据结构与基础
java·开发语言·spring boot·redis·笔记·spring·nosql
0思必得019 分钟前
[Web自动化] Selenium处理文件上传和下载
前端·爬虫·python·selenium·自动化·web自动化
莫有杯子的龙潭峡谷26 分钟前
在 Windows 系统上安装 OpenClaw
人工智能·node.js·安装教程·openclaw
Funny_AI_LAB28 分钟前
AI Agent最新重磅综述:迈向高效智能体,记忆、工具学习和规划综述
人工智能·学习·算法·语言模型·agi
zhangshuang-peta41 分钟前
超越Composio:ContextForge与Peta作为集成平台的替代方案
人工智能·ai agent·mcp·peta