5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
CaracalTiger1 分钟前
什么是Clawdbot?Clawdbot下载、安装、配置教程(最新版Moltbot)
python·编辑器·aigc·idea·ai编程·intellij idea·agi
企业老板ai培训3 分钟前
从九尾狐AI案例拆解企业AI培训的技术实现与降本增效架构
人工智能
星火开发设计5 分钟前
枚举类 enum class:强类型枚举的优势
linux·开发语言·c++·学习·算法·知识
WJX_KOI5 小时前
Open Notebook 一个开源的结合AI的记笔记软件
python
程序员清洒5 小时前
Flutter for OpenHarmony:GridView — 网格布局实现
android·前端·学习·flutter·华为
喜欢吃燃面5 小时前
Linux:环境变量
linux·开发语言·学习
代码游侠5 小时前
ARM开发——阶段问题综述(二)
运维·arm开发·笔记·单片机·嵌入式硬件·学习
0思必得05 小时前
[Web自动化] 反爬虫
前端·爬虫·python·selenium·自动化
Elastic 中国社区官方博客6 小时前
使用 Discord 和 Elastic Agent Builder A2A 构建游戏社区支持机器人
人工智能·elasticsearch·游戏·搜索引擎·ai·机器人·全文检索
2301_822382766 小时前
Python上下文管理器(with语句)的原理与实践
jvm·数据库·python