5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
snow_star_dream12 小时前
(笔记)VSC python应用--函数补全注释添加
笔记·python
calvinpaean12 小时前
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos论文学习
学习
wubba lubba dub dub75012 小时前
第三十四周 学习周报
学习
songyuc12 小时前
【SAR】旋转框定义法学习笔记
笔记·学习
zilikew12 小时前
Flutter框架跨平台鸿蒙开发——小语种学习APP的开发流程
学习·flutter·华为·harmonyos·鸿蒙
无忧智库12 小时前
未来已来:深度解析城市空中交通(UAM)垂直起降场(Vertiport)智能化配套设施建设方案(WORD)
人工智能
郝学胜-神的一滴12 小时前
Python中的Mixin继承:灵活组合功能的强大模式
开发语言·python·程序人生
叫我:松哥12 小时前
基于python强化学习的自主迷宫求解,集成迷宫生成、智能体训练、模型评估等
开发语言·人工智能·python·机器学习·pygame
2501_9449347312 小时前
大专学历行政转型管理的必要性分析
人工智能
2301_7644413312 小时前
2025年YOLO算法案例应用领域应用趋势
python·yolo