classification_report分类报告的含义

基础知识

混淆矩阵（Confusion Matrix）

可以看出来类别之间相互误分的情况，查看是否有特定的类别相互混淆，能够帮我们调整后续模型，比如一些类别设置权重衰减。

精度（Precision）

precisoin即准确率，也称查准率。

精确率表示模型预测为正类别的样本中有多少是真正的正类别。

准确率（Accuracy）

正确分类的样本占总样本数的比例。

召回率（Recall）

recall是召回率，也称查全率

在所有实际为正类别的样本中，模型能够正确预测为正类别的比例。

高召回率意味着模型能够有效地捕捉到实际为正类别的样本。

与Precision的关系：负相关。

F1分数（F1-score）

F1 分数的取值范围是 [0, 1]，越接近 1 表示模型的性能越好，同时考虑到了模型在查准率和查全率之间的平衡。

示例1：

sql 复制代码

y_true = [1, 2, 3, 1, 2, 3, 1, 2, 3]
y_predicted = [1, 2, 3, 3, 2, 1, 3, 2, 3]


from sklearn.metrics import classification_report
print(classification_report(y_true, y_predicted))

输出结果：

可以加上target_names参数

效果如下：

sql 复制代码

print(classification_report(y_test, y_predicted, target_names=['a类', 'b类', 'c类']))

如图左边显示出了新传入的标签名。

示例2：

sql 复制代码

from sklearn.metrics import classification_report
Y_test=[0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
Y_prediction=[0, 1, 0, 0, 0, 1, 1, 0, 0, 1]
print(classification_report(Y_test,Y_prediction))

输出结果：

得到该10个数据的二分类的分类报告：

先画个混淆矩阵：

给出了每类别对应的精度（Precision）、召回率（Recall）F1分数（F1-score）、真实中有多少个是该类别的（Support）、准确率（Accuracy）、宏平均（macro avg）和加权平均（weighted avg）。

Precision：预测为x的样本中，有多少被正确预测为x。

Precision_0=4/(2+4)=0.67

Precision_1=3/(3+1)=0.75

Recall：实际为x的类别中，有多少预测为x。

Recall_0=3/5=0.60

Recall_1=4/5=0.80

F1分数：2×Precision×Recall /(Precision+Recall)。

Accuracy：全部样本里被分类正确的比例。

Accuracy=7/10

macro avg：上面类别各分数的直接平均。

macro avg_precision=(0.67+0.75)/2=0.71

weighted avg：上面类别各分数的加权（权值为support）平均。

macro avg_precision=(0.675+0.755)/10=0.71