5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
Nicole-----2 分钟前
Python - Union联合类型注解
开发语言·python
TDengine (老段)7 分钟前
从 ETL 到 Agentic AI:工业数据管理变革与 TDengine IDMP 的治理之道
数据库·数据仓库·人工智能·物联网·时序数据库·etl·tdengine
蓝桉80236 分钟前
如何进行神经网络的模型训练(视频代码中的知识点记录)
人工智能·深度学习·神经网络
liliangcsdn1 小时前
Leiden社区发现算法的学习和示例
学习·数据分析·知识图谱
程序员Xu1 小时前
【LeetCode热题100道笔记】二叉树的右视图
笔记·算法·leetcode
星期天要睡觉1 小时前
深度学习——数据增强(Data Augmentation)
人工智能·深度学习
程序员Xu2 小时前
【LeetCode热题100道笔记】二叉搜索树中第 K 小的元素
笔记·算法·leetcode
南山二毛2 小时前
机器人控制器开发(导航算法——导航栈关联坐标系)
人工智能·架构·机器人
DKPT2 小时前
JVM中如何调优新生代和老生代?
java·jvm·笔记·学习·spring
phltxy2 小时前
JVM——Java虚拟机学习
java·jvm·学习