5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
数据大魔方6 分钟前
【期货量化实战】豆粕期货量化交易策略(Python完整代码)
开发语言·数据库·python·算法·github·程序员创富
Engineer邓祥浩9 分钟前
设计模式学习(12) 23-10 外观模式
学习·设计模式·外观模式
@汤圆酱12 分钟前
【无标题】
python·jmeter
专注于大数据技术栈12 分钟前
java学习--Vector
java·学习
宵时待雨16 分钟前
STM32笔记归纳1:STM32的基本信息与引脚分布
笔记·stm32·嵌入式硬件
雷焰财经16 分钟前
务实深耕,全栈赋能:宇信科技引领金融AI工程化落地新范式
人工智能·科技·金融
西柚小萌新18 分钟前
【计算机视觉CV:标注工具】--ISAT
人工智能·计算机视觉
内存不泄露20 分钟前
基于 Spring Boot 的医院预约挂号系统(全端协同)设计与实现
java·vue.js·spring boot·python·flask
三万棵雪松21 分钟前
【AI小智硬件程序(八)】
c++·人工智能·嵌入式·esp32·ai小智
基层小星23 分钟前
用ai写完材料有个差不多后,材料星如何精准修改润色?
人工智能·ai·ai写作·笔杆子·公文写作·修改润色