5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
亚马逊云开发者1 分钟前
相得益彰:Mem0 记忆框架与亚马逊云科技的企业级 AI 实践
人工智能
AAA修煤气灶刘哥10 分钟前
Y-Agent Studio :打破 DAG 的“无环”铁律?揭秘有向有环图如何让智能体真正“活”起来
人工智能·低代码·agent
趙卋傑15 分钟前
接口自动化测试
python·pycharm·pytest
WWZZ202516 分钟前
快速上手大模型:深度学习9(池化层、卷积神经网络1)
人工智能·深度学习·神经网络·算法·机器人·大模型·具身智能
__如果20 分钟前
Surgical Video Understanding LLM
人工智能
吴佳浩20 分钟前
LangChain 入门指南:核心概念与理论框架
人工智能
BoBoZz1921 分钟前
CellTypeSource
python·vtk·图形渲染·图形处理
('-')28 分钟前
《从根上理解MySQL是怎样运行的》第二张学习笔记
笔记·学习·mysql
q***57741 小时前
Python中的简单爬虫
爬虫·python·信息可视化
韩立学长1 小时前
【开题答辩实录分享】以《粤港澳大湾区活动数据可视化分析系统》为例进行答辩实录分享
python·信息可视化·flask