5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
艾莉丝努力练剑5 分钟前
【LeetCode&数据结构】单链表的应用——反转链表问题、链表的中间节点问题详解
c语言·开发语言·数据结构·学习·算法·leetcode·链表
人生游戏牛马NPC1号2 小时前
学习 Flutter (三):玩安卓项目实战 - 上
android·学习·flutter
橡晟4 小时前
深度学习入门:让神经网络变得“深不可测“⚡(二)
人工智能·python·深度学习·机器学习·计算机视觉
墨尘游子4 小时前
神经网络的层与块
人工智能·python·深度学习·机器学习
Leah01054 小时前
什么是神经网络,常用的神经网络,如何训练一个神经网络
人工智能·深度学习·神经网络·ai
深圳卢先生4 小时前
CentOS 安装jenkins笔记
笔记·centos·jenkins
倔强青铜34 小时前
苦练Python第18天:Python异常处理锦囊
开发语言·python
PyAIExplorer4 小时前
图像亮度调整的简单实现
人工智能·计算机视觉
u_topian4 小时前
【个人笔记】Qt使用的一些易错问题
开发语言·笔记·qt
企鹅与蟒蛇5 小时前
Ubuntu-25.04 Wayland桌面环境安装Anaconda3之后无法启动anaconda-navigator问题解决
linux·运维·python·ubuntu·anaconda