A Brief Introduction of the Violin Plot and Box Plot

Date Author Version Note
2024.03.03 Dog Tao V1.0 Release the note.

文章目录

  • [A Brief Introduction of the Violin Plot and Box Plot](#A Brief Introduction of the Violin Plot and Box Plot)
    • [Box Plot](#Box Plot)
    • [Violin Plot](#Violin Plot)
    • [Histogram with Error Bar](#Histogram with Error Bar)
    • Comparison
    • [Example 1](#Example 1)
    • [Example 2](#Example 2)

A Brief Introduction of the Violin Plot and Box Plot

Box Plot

A Box Plot, also known as a Box-and-Whisker Plot, provides a visual summary of a data set's central tendency, variability, and skewness. The "box" represents the interquartile range (IQR) where the middle 50% of data points lie, with a line inside the box indicating the median value. The "whiskers" extend from the box to show the range of the data, typically to 1.5 * IQR beyond the quartiles, though this can vary. Data points outside of the whiskers are often considered outliers.

Violin Plot

A Violin Plot combines features of the Box Plot with a kernel density plot, which shows the distribution shape of the data. The width of the violin at different values indicates the kernel density estimation of the data at that value, providing a deeper insight into the distribution of the data, including multimodality (multiple peaks). It includes a marker for the median of the data and often includes a box plot inside the violin.

Histogram with Error Bar

A Histogram is a graphical representation of the distribution of numerical data, where the data is divided into bins, and the frequency of data points within each bin is depicted. An Error Bar can be added to a histogram to represent the variability of the data. The error bars typically represent the standard deviation, standard error, or confidence interval for the data.

Comparison

  • Violin Plot: This plot provides a visual summary of the data distribution along with its probability density. The width of the plot at different values indicates the density of the data at that point, showing where the data is more concentrated.

  • Box Plot: This plot shows the median (central line), interquartile range (edges of the box), and potential outliers (dots outside the 'whiskers'). It's useful for identifying the central tendency and spread of the data, as well as outliers.

  • Histogram with Error Bar: The histogram shows the frequency distribution of the data across different bins. The error bars on each bin represent the variability of the data within that bin, using the standard error of the mean to give an idea of the uncertainty around the count in each bin.

Example 1

python 复制代码
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# Generating a random dataset
np.random.seed(10)
data = np.random.normal(loc=0, scale=1, size=100)

# Setting up the matplotlib figure
plt.figure(figsize=(14, 6))

# Creating a subplot for the Violin Plot
plt.subplot(1, 3, 1)
sns.violinplot(data=data, inner="quartile", color="lightgray")
plt.title('Violin Plot')

# Creating a subplot for the Box Plot
plt.subplot(1, 3, 2)
sns.boxplot(data=data, width=0.3, color="skyblue")
plt.title('Box Plot')

# Creating a subplot for the Histogram with Error Bar
plt.subplot(1, 3, 3)
mean = np.mean(data)
std = np.std(data)
count, bins, ignored = plt.hist(data, bins=10, color="pink", edgecolor='black', alpha=0.7)
plt.errorbar((bins[:-1] + bins[1:]) / 2, count, yerr=std / np.sqrt(count), fmt='o', color='red', ecolor='lightgray', elinewidth=3, capsize=0)
plt.title('Histogram with Error Bar')

plt.tight_layout()
plt.show()

Example 2

python 复制代码
# Generating two random datasets for comparison
np.random.seed(10)
data1 = np.random.normal(loc=0, scale=1, size=100)  # Dataset 1
data2 = np.random.normal(loc=1, scale=1.5, size=100)  # Dataset 2

# Setting up the matplotlib figure
plt.figure(figsize=(14, 6))

### Creating a customized Violin Plot
plt.subplot(1, 3, 1)
sns.violinplot(data=[data1, data2], inner="quartile", split=True, palette=["lightblue", "lightgreen"], orient="h")
plt.title('Customized Violin Plot')

### Creating a customized Box Plot
plt.subplot(1, 3, 2)
sns.boxplot(data=[data1, data2], width=0.5, palette=["skyblue", "lightgreen"], orient="h", showmeans=True, notch=True, meanprops={"marker":"o", "markerfacecolor":"red", "markeredgecolor":"black"})
plt.title('Customized Box Plot')

### Creating a customized Histogram with Error Bars
plt.subplot(1, 3, 3)

# Histogram for Dataset 1
count1, bins1, ignored1 = plt.hist(data1, bins=10, color="skyblue", edgecolor='black', alpha=0.5, label='Dataset 1')

# Histogram for Dataset 2
count2, bins2, ignored2 = plt.hist(data2, bins=10, color="lightgreen", edgecolor='black', alpha=0.5, label='Dataset 2')

# Error Bars for Dataset 1
std1 = np.std(data1)
plt.errorbar((bins1[:-1] + bins1[1:]) / 2, count1, yerr=std1 / np.sqrt(count1), fmt='o', color='blue', ecolor='lightgray', elinewidth=3, capsize=0)

# Error Bars for Dataset 2
std2 = np.std(data2)
plt.errorbar((bins2[:-1] + bins2[1:]) / 2, count2, yerr=std2 / np.sqrt(count2), fmt='o', color='green', ecolor='lightgray', elinewidth=3, capsize=0)

# Mean Lines and Legend
plt.axvline(np.mean(data1), color='blue', linestyle='dashed', linewidth=1)
plt.axvline(np.mean(data2), color='green', linestyle='dashed', linewidth=1)
plt.legend()

plt.title('Customized Histogram with Error Bars')

plt.tight_layout()
plt.show()
相关推荐
小猪咪piggy1 分钟前
【Python】(3) 函数
开发语言·python
pp起床1 分钟前
Gen_AI 第三课 大模型内部原理
人工智能
UI设计兰亭妙微5 分钟前
UI 设计组件的价值与实践+常用 UI 设计组件核心规范清单
人工智能·ui
夜鸣笙笙5 分钟前
交换最小值和最大值
python
OJAC1116 分钟前
当计算机专业站在十字路口:近屿智能看见了什么?
人工智能
2301_8223636010 分钟前
使用Pandas进行数据分析:从数据清洗到可视化
jvm·数据库·python
m0_6038887114 分钟前
Toward Cognitive Supersensing in Multimodal Large Language Model
人工智能·机器学习·ai·语言模型·论文速览
码界奇点21 分钟前
基于Flask与OpenSSL的自签证书管理系统设计与实现
后端·python·flask·毕业设计·飞书·源代码管理
java1234_小锋32 分钟前
分享一套优质的基于Python的房屋数据分析预测系统(scikit-learn机器学习+Flask)
python·数据分析·scikit-learn
CCPC不拿奖不改名36 分钟前
RAG基础:基于LangChain 的文本分割实战+文本分块
人工智能·python·langchain·知识库·改行学it·rag·向量库