5-AM Project: day8 Practical data science with Python 4

Chapter 5 Exploratory Data Analysis and Visualization

  • EDA and visualization libraries in Python
  • Performing EDA with Seaborn and pandas
  • Using EDA Python packages
  • Using visualization best practices
  • Making plots with Plotly

Performing EDA with Seaborn and pandas

Making boxplots and letter-value plots

python 复制代码
import matplotlib.pyplot as plt
df['Minutes'].plot.box()
plt.show()

f = plt.figure(figsize=(5.5, 5.5))  # this changes the size of the image -- more on this is chapter 5
f.patch.set_facecolor('w')  # sets background color behind axis labels
df['Minutes'].plot.box()
plt.tight_layout()  # auto-adjust margins

Making histograms and violin plots

python 复制代码
sns.histplot(x=df['Minutes'], kde=True)

Let's look at a few groups of data at once with a violin plot. Let's first select the top five genres by number of songs and create a separate DataFrame with only this data:

python 复制代码
top_5_genres = df['Genre'].value_counts().index[:5]
top_5_data = data=df[df['Genre'].isin(top_5_genres)]

Making scatter plots with Matplotlib and Seaborn

python 复制代码
plt.scatter(df['Minutes'], df['MB'])

Examining correlations and making correlograms

python 复制代码
sns.pairplot(data=df)

Making missing value plots

python 复制代码
import missingno as msno
msno.matrix(df)

This shows a matrix of non-missing values in gray and missing values in white. Each row is a line across each column. From this, we see that the Composer column has several missing values, but none of the other columns are missing any values. The spark line on the right side shows the total missing values across all columns for each row and shows the maximum and minimum number of complete values for the rows. In our case, 7 means the minimum number of non-missing values in a row is 7, and the maximum number of non-missing values in a row is 8.

Using EDA Python packages

python 复制代码
from pandas_profiling import ProfileReport

report = ProfileReport(df)

report

Using visualization best practices

Saving plots for sharing and reports

相关推荐
codists5 分钟前
《计算机组成及汇编语言原理》阅读笔记:p82-p85
笔记
ladymorgana8 分钟前
【运维笔记】windows 11 中提示:无法成功完成操作,因为文件包含病毒或潜在的垃圾软件。
运维·windows·笔记
炭烤玛卡巴卡8 分钟前
初学elasticsearch
大数据·学习·elasticsearch·搜索引擎
开发者每周简报17 分钟前
求职市场变化
人工智能·面试·职场和发展
又蓝24 分钟前
使用 Python 操作 Excel 表格
开发语言·python·excel
oneouto27 分钟前
selenium学习笔记(一)
笔记·学习·selenium
AI前沿技术追踪31 分钟前
OpenAI 12天发布会:AI革命的里程碑@附35页PDF文件下载
人工智能
张铁铁是个小胖子36 分钟前
MyBatis学习
java·学习·mybatis
余~~1853816280037 分钟前
稳定的碰一碰发视频、碰一碰矩阵源码技术开发,支持OEM
开发语言·人工智能·python·音视频
我曾经是个程序员1 小时前
鸿蒙学习记录之http网络请求
服务器·学习·http