5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
Python×CATIA工业智造2 小时前
Frida RPC高级应用:动态模拟执行Android so文件实战指南
开发语言·python·pycharm
千宇宙航2 小时前
闲庭信步使用SV搭建图像测试平台:第三十一课——基于神经网络的手写数字识别
图像处理·人工智能·深度学习·神经网络·计算机视觉·fpga开发
onceco2 小时前
领域LLM九讲——第5讲 为什么选择OpenManus而不是QwenAgent(附LLM免费api邀请码)
人工智能·python·深度学习·语言模型·自然语言处理·自动化
天水幼麟3 小时前
动手学深度学习-学习笔记(总)
笔记·深度学习·学习
狐凄3 小时前
Python实例题:基于 Python 的简单聊天机器人
开发语言·python
悦悦子a啊4 小时前
Python之--基本知识
开发语言·前端·python
jndingxin5 小时前
OpenCV CUDA模块设备层-----高效地计算两个 uint 类型值的带权重平均值
人工智能·opencv·计算机视觉
天水幼麟5 小时前
动手学深度学习-学习笔记【二】(基础知识)
笔记·深度学习·学习
Sweet锦5 小时前
零基础保姆级本地化部署文心大模型4.5开源系列
人工智能·语言模型·文心一言
绿皮的猪猪侠5 小时前
算法笔记上机训练实战指南刷题
笔记·算法·pta·上机·浙大