A Brief Introduction of the Violin Plot and Box Plot

Date Author Version Note
2024.03.03 Dog Tao V1.0 Release the note.

文章目录

  • [A Brief Introduction of the Violin Plot and Box Plot](#A Brief Introduction of the Violin Plot and Box Plot)
    • [Box Plot](#Box Plot)
    • [Violin Plot](#Violin Plot)
    • [Histogram with Error Bar](#Histogram with Error Bar)
    • Comparison
    • [Example 1](#Example 1)
    • [Example 2](#Example 2)

A Brief Introduction of the Violin Plot and Box Plot

Box Plot

A Box Plot, also known as a Box-and-Whisker Plot, provides a visual summary of a data set's central tendency, variability, and skewness. The "box" represents the interquartile range (IQR) where the middle 50% of data points lie, with a line inside the box indicating the median value. The "whiskers" extend from the box to show the range of the data, typically to 1.5 * IQR beyond the quartiles, though this can vary. Data points outside of the whiskers are often considered outliers.

Violin Plot

A Violin Plot combines features of the Box Plot with a kernel density plot, which shows the distribution shape of the data. The width of the violin at different values indicates the kernel density estimation of the data at that value, providing a deeper insight into the distribution of the data, including multimodality (multiple peaks). It includes a marker for the median of the data and often includes a box plot inside the violin.

Histogram with Error Bar

A Histogram is a graphical representation of the distribution of numerical data, where the data is divided into bins, and the frequency of data points within each bin is depicted. An Error Bar can be added to a histogram to represent the variability of the data. The error bars typically represent the standard deviation, standard error, or confidence interval for the data.

Comparison

  • Violin Plot: This plot provides a visual summary of the data distribution along with its probability density. The width of the plot at different values indicates the density of the data at that point, showing where the data is more concentrated.

  • Box Plot: This plot shows the median (central line), interquartile range (edges of the box), and potential outliers (dots outside the 'whiskers'). It's useful for identifying the central tendency and spread of the data, as well as outliers.

  • Histogram with Error Bar: The histogram shows the frequency distribution of the data across different bins. The error bars on each bin represent the variability of the data within that bin, using the standard error of the mean to give an idea of the uncertainty around the count in each bin.

Example 1

python 复制代码
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# Generating a random dataset
np.random.seed(10)
data = np.random.normal(loc=0, scale=1, size=100)

# Setting up the matplotlib figure
plt.figure(figsize=(14, 6))

# Creating a subplot for the Violin Plot
plt.subplot(1, 3, 1)
sns.violinplot(data=data, inner="quartile", color="lightgray")
plt.title('Violin Plot')

# Creating a subplot for the Box Plot
plt.subplot(1, 3, 2)
sns.boxplot(data=data, width=0.3, color="skyblue")
plt.title('Box Plot')

# Creating a subplot for the Histogram with Error Bar
plt.subplot(1, 3, 3)
mean = np.mean(data)
std = np.std(data)
count, bins, ignored = plt.hist(data, bins=10, color="pink", edgecolor='black', alpha=0.7)
plt.errorbar((bins[:-1] + bins[1:]) / 2, count, yerr=std / np.sqrt(count), fmt='o', color='red', ecolor='lightgray', elinewidth=3, capsize=0)
plt.title('Histogram with Error Bar')

plt.tight_layout()
plt.show()

Example 2

python 复制代码
# Generating two random datasets for comparison
np.random.seed(10)
data1 = np.random.normal(loc=0, scale=1, size=100)  # Dataset 1
data2 = np.random.normal(loc=1, scale=1.5, size=100)  # Dataset 2

# Setting up the matplotlib figure
plt.figure(figsize=(14, 6))

### Creating a customized Violin Plot
plt.subplot(1, 3, 1)
sns.violinplot(data=[data1, data2], inner="quartile", split=True, palette=["lightblue", "lightgreen"], orient="h")
plt.title('Customized Violin Plot')

### Creating a customized Box Plot
plt.subplot(1, 3, 2)
sns.boxplot(data=[data1, data2], width=0.5, palette=["skyblue", "lightgreen"], orient="h", showmeans=True, notch=True, meanprops={"marker":"o", "markerfacecolor":"red", "markeredgecolor":"black"})
plt.title('Customized Box Plot')

### Creating a customized Histogram with Error Bars
plt.subplot(1, 3, 3)

# Histogram for Dataset 1
count1, bins1, ignored1 = plt.hist(data1, bins=10, color="skyblue", edgecolor='black', alpha=0.5, label='Dataset 1')

# Histogram for Dataset 2
count2, bins2, ignored2 = plt.hist(data2, bins=10, color="lightgreen", edgecolor='black', alpha=0.5, label='Dataset 2')

# Error Bars for Dataset 1
std1 = np.std(data1)
plt.errorbar((bins1[:-1] + bins1[1:]) / 2, count1, yerr=std1 / np.sqrt(count1), fmt='o', color='blue', ecolor='lightgray', elinewidth=3, capsize=0)

# Error Bars for Dataset 2
std2 = np.std(data2)
plt.errorbar((bins2[:-1] + bins2[1:]) / 2, count2, yerr=std2 / np.sqrt(count2), fmt='o', color='green', ecolor='lightgray', elinewidth=3, capsize=0)

# Mean Lines and Legend
plt.axvline(np.mean(data1), color='blue', linestyle='dashed', linewidth=1)
plt.axvline(np.mean(data2), color='green', linestyle='dashed', linewidth=1)
plt.legend()

plt.title('Customized Histogram with Error Bars')

plt.tight_layout()
plt.show()
相关推荐
美狐美颜sdk35 分钟前
跨平台直播美颜SDK集成实录:Android/iOS如何适配贴纸功能
android·人工智能·ios·架构·音视频·美颜sdk·第三方美颜sdk
DeepSeek-大模型系统教程1 小时前
推荐 7 个本周 yyds 的 GitHub 项目。
人工智能·ai·语言模型·大模型·github·ai大模型·大模型学习
郭庆汝1 小时前
pytorch、torchvision与python版本对应关系
人工智能·pytorch·python
小雷FansUnion3 小时前
深入理解MCP架构:智能服务编排、上下文管理与动态路由实战
人工智能·架构·大模型·mcp
资讯分享周3 小时前
扣子空间PPT生产力升级:AI智能生成与多模态创作新时代
人工智能·powerpoint
思则变4 小时前
[Pytest] [Part 2]增加 log功能
开发语言·python·pytest
叶子爱分享4 小时前
计算机视觉与图像处理的关系
图像处理·人工智能·计算机视觉
鱼摆摆拜拜4 小时前
第 3 章:神经网络如何学习
人工智能·神经网络·学习
一只鹿鹿鹿4 小时前
信息化项目验收,软件工程评审和检查表单
大数据·人工智能·后端·智慧城市·软件工程
张较瘦_4 小时前
[论文阅读] 人工智能 | 深度学习系统崩溃恢复新方案:DaiFu框架的原位修复技术
论文阅读·人工智能·深度学习