目录

5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

本文是转载文章,点击查看原文
如有侵权,请联系 xyy@jishuzhan.net 删除
相关推荐
ak啊3 分钟前
寻路算法实现
python·算法
旧厂街小江5 分钟前
LeetCode第95题:不同的二叉搜索树 II
python·算法·c#
这里有鱼汤5 分钟前
Pandas:从入门到精通,只需这一篇就够了!
后端·python·pandas
qzw12106 分钟前
【AI大模型】搭建本地大模型GPT-J:详细步骤及常见问题
人工智能·gpt
新加坡内哥谈技术9 分钟前
GPT-4o Image
人工智能·深度学习·语言模型·自然语言处理·copilot
X204611 分钟前
OpenAI拥抱MCP:Agents SDK已经支持MCP
人工智能
gs8014011 分钟前
知识库外挂 vs 大脑全开:RAG与纯生成式模型(如GPT)的终极Battle
人工智能·gpt
北京_宏哥17 分钟前
🔥《一头扎进》系列之Python+Selenium框架实战篇21- 价值好几K的框架,呵!这个框架有点意思啊!!!
python·selenium·前端工程化
北京_宏哥18 分钟前
🔥《一头扎进》系列之Python+Selenium框架设计篇20- 价值好几K的框架,狼来了,狼来了....,狼没来,框架真的来了
python·selenium·前端工程化
Shaoxi Zhang24 分钟前
Spring学习笔记06——bean、java bean、spring bean、POJO几个概念讲解
java·学习·spring