一、tf2中常用的损失函数介绍
在 TensorFlow 2 中,编译模型时可以选择不同的损失函数来定义模型的目标函数。不同的损失函数适用于不同的问题类型和模型架构。下面是几种常见的损失函数以及它们的作用和适用场景:
1.均方误差(Mean Squared Error, MSE):MSE 是回归问题中常用的损失函数,用于衡量预测值与真实值之间的平均平方差。较大的误差会得到更大的惩罚,适用于回归任务。
python
model.compile(loss='mse', ...)
**2.二进制交叉熵(Binary Cross Entropy):**二进制交叉熵是二分类问题中常用的损失函数,用于衡量两个类别之间的差异性。适用于二分类问题,输出为一个概率值的 sigmoid 激活的模型。
python
model.compile(loss='binary_crossentropy', ...)
3.多类交叉熵(Categorical Cross Entropy):多类交叉熵是多分类问题中常用的损失函数,用于衡量多个类别之间的差异性。适用于多分类问题,输出为每个类别的概率分布的 softmax 激活的模型。
python
model.compile(loss='categorical_crossentropy', ...)
4.稀疏分类交叉熵(Sparse Categorical Cross Entropy):类似于多类交叉熵,但适用于标签以整数形式表示的多分类问题,而不是 one-hot 编码。
python
model.compile(loss='sparse_categorical_crossentropy', ...)
5.KL 散度损失(Kullback-Leibler Divergence):KL 散度用于衡量两个概率分布的差异性。在生成模型中,常与自动编码器等模型结合使用,促使模型输出接近于预定义的概率分布。
python
model.compile(loss='kullback_leibler_divergence', ...)
除了上述常见的损失函数之外,还有其他一些定制化的损失函数,可以根据具体任务和需求来自定义。通过 tf.keras.losses 模块,您可以查看更多可用的损失函数,并选择适合自己模型的损失函数。在选择损失函数时,需要根据任务类型、数据分布以及模型设计进行合理选择,以获得最佳的训练效果。
二、两种损失函数的比较分析
多类交叉熵(Categorical Cross Entropy)和稀疏分类交叉熵(Sparse Categorical Cross Entropy)
相同点:都可用于数据多分类任务。
不同点:对数据的输入要求不一样,多类交叉熵(Categorical Cross Entropy)要求数据为 one-hot 编码,这个主要是针对数据的标签数据,比如我们的数据标签数据读取的时候,其类别是0-9,这个数据可以是一列数据,这个时候我们可以使用稀疏分类交叉熵(Sparse Categorical Cross Entropy)函数直接进行编译。
one_hot编码(独热编码)说明:
一种将每个元素表示为二进制向量的编码方式,其中只有一个元素为1,其余元素都为0。例如,如果我们有一个长度为N的列表,那么它的one-hot编码将是一个NxN的矩阵,其中第i行表示第i个元素的编码。例如,如果我们有一个包含3种颜色的列表["红","蓝","绿"],那么它们的one-hot编码将是:
红:[1,0,0] 蓝:[0,1,0] 绿:[0,0,1]
这种编码方式常用于机器学习中,可以将每个类别标签转换为one-hot向量以便进行训练。
如果是使用多类交叉熵(Categorical Cross Entropy)作为损失函数,那么我们对数据进行one-hot编码,代码有的地方使用:
python
y_train=tf.keras.utils.to_categorical(y_train) #报错
y_test=tf.keras.utils.to_categorical(y_test)
在tensorflow2.5环境下报错:
python
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [12,16] and labels shape [204]
[[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_Classifier_tf_imgVali.py:286) ]] [Op:__inference_train_function_762]
Function call stack:
train_function
这里我们可以使用以下代码替代:
python
y_train_one_hot = tf.one_hot(y_train, depth=num_classes)
y_test_one_hot = tf.one_hot(y_test, depth=num_classes)
三、示例代码分析
Sparse Categorical Cross Entropy和Categorical Cross Entropy对应的损失函数围为:
python
loss='sparse_categorical_crossentropy' loss='categorical_crossentropy'
使用minist数据做一个简单的MLP模型分类,这里先使用Sparse Categorical Cross Entropy损失函数。代码如下:
python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# 准备数据集
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784) / 255.0
x_test = x_test.reshape(-1, 784) / 255.0
# 构建模型
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=784))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
# 编译模型
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# 训练模型
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_test, y_test))
# 评估模型
loss, accuracy = model.evaluate(x_test, y_test)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)
# 使用模型进行预测
predictions = model.predict(x_test[:5])
print('Predictions:', tf.argmax(predictions, axis=1))
print('Labels:', y_test[:5])
运行结果如下:
python
D:\PycharmProjects\pythonProject\venv\Scripts\python.exe D:/PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_TEST_MINIST.py
2023-10-14 22:28:27.465600: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 22:28:30.610122: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2023-10-14 22:28:30.637119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 22:28:30.637445: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 22:28:30.648571: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-14 22:28:30.648748: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-14 22:28:30.652682: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2023-10-14 22:28:30.654729: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2023-10-14 22:28:30.657643: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2023-10-14 22:28:30.661178: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2023-10-14 22:28:30.662311: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-14 22:28:30.662510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 22:28:30.662864: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-14 22:28:30.663583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 22:28:30.663941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 22:28:31.130464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-14 22:28:31.130645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2023-10-14 22:28:31.130748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2023-10-14 22:28:31.130967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6001 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-14 22:28:31.709522: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10
2023-10-14 22:28:31.920032: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-14 22:28:32.369951: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
1875/1875 [==============================] - 5s 2ms/step - loss: 0.2845 - accuracy: 0.9174 - val_loss: 0.1443 - val_accuracy: 0.9547
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1261 - accuracy: 0.9633 - val_loss: 0.1085 - val_accuracy: 0.9646
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0937 - accuracy: 0.9716 - val_loss: 0.1034 - val_accuracy: 0.9690
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0731 - accuracy: 0.9772 - val_loss: 0.0987 - val_accuracy: 0.9714
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0612 - accuracy: 0.9810 - val_loss: 0.0828 - val_accuracy: 0.9749
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0507 - accuracy: 0.9835 - val_loss: 0.0955 - val_accuracy: 0.9702
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0430 - accuracy: 0.9859 - val_loss: 0.0863 - val_accuracy: 0.9746
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0374 - accuracy: 0.9874 - val_loss: 0.0935 - val_accuracy: 0.9737
Epoch 9/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0328 - accuracy: 0.9894 - val_loss: 0.0902 - val_accuracy: 0.9754
Epoch 10/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0287 - accuracy: 0.9900 - val_loss: 0.0902 - val_accuracy: 0.9771
313/313 [==============================] - 1s 2ms/step - loss: 0.0902 - accuracy: 0.9771
Test Loss: 0.09022707492113113
Test Accuracy: 0.9771000146865845
Predictions: tf.Tensor([7 2 1 0 4], shape=(5,), dtype=int64)
Labels: [7 2 1 0 4]
Process finished with exit code 0
我们将损失函数修改为Categorical Cross Entropy运行代码就会报错:
python
ValueError: Shapes (32, 1) and (32, 10) are incompatible
这是因为我们没有将标签数据转化为独热编码,我们转换一下,,在model.fit()函数前加上:
python
y_train = tf.one_hot(y_train, depth=10)
y_test= tf.one_hot(y_test, depth=10)
运行结果如下:
python
D:\PycharmProjects\pythonProject\venv\Scripts\python.exe D:/PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_TEST_MINIST.py
2023-10-14 23:20:04.708405: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 23:20:07.803493: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2023-10-14 23:20:07.833164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 23:20:07.833480: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 23:20:07.840527: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-14 23:20:07.840689: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-14 23:20:07.844132: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2023-10-14 23:20:07.845657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2023-10-14 23:20:07.848488: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2023-10-14 23:20:07.852061: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2023-10-14 23:20:07.853130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-14 23:20:07.853317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 23:20:07.853652: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-14 23:20:07.854467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 23:20:07.854879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 23:20:08.326771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-14 23:20:08.326942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2023-10-14 23:20:08.327041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2023-10-14 23:20:08.327252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6001 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-14 23:20:08.914697: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10
2023-10-14 23:20:09.138669: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
1/1875 [..............................] - ETA: 21:57 - loss: 2.4066 - accuracy: 0.12502023-10-14 23:20:09.626287: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2784 - accuracy: 0.9182 - val_loss: 0.1517 - val_accuracy: 0.9510
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1217 - accuracy: 0.9633 - val_loss: 0.1258 - val_accuracy: 0.9611
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0874 - accuracy: 0.9731 - val_loss: 0.1045 - val_accuracy: 0.9666
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0701 - accuracy: 0.9778 - val_loss: 0.0929 - val_accuracy: 0.9718
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0569 - accuracy: 0.9821 - val_loss: 0.0853 - val_accuracy: 0.9751
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0488 - accuracy: 0.9844 - val_loss: 0.0911 - val_accuracy: 0.9706
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0402 - accuracy: 0.9868 - val_loss: 0.0847 - val_accuracy: 0.9748
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0355 - accuracy: 0.9882 - val_loss: 0.0975 - val_accuracy: 0.9723
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0307 - accuracy: 0.9895 - val_loss: 0.1027 - val_accuracy: 0.9743
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0281 - accuracy: 0.9907 - val_loss: 0.1004 - val_accuracy: 0.9734
313/313 [==============================] - 1s 2ms/step - loss: 0.1004 - accuracy: 0.9734
Test Loss: 0.10037881135940552
Test Accuracy: 0.9733999967575073
Predictions: tf.Tensor([7 2 1 0 4], shape=(5,), dtype=int64)
Labels: tf.Tensor(
[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]], shape=(5, 10), dtype=float32)
Process finished with exit code 0
注意事项:
使用稀疏交叉熵损失函数编译的模型的预测结果标签从0开始。如果自己的数据是从1开始的,那么后面做验证分析的时候需要注意两者应该保持一致。
在使用稀疏交叉熵损失函数进行多分类问题训练时,标签通常使用整数表示,并且标签值的范围是从0到类别数量减1。模型的输出也应该是每个类别的概率分布。
例如,如果有3个类别,标签将被编码为0、1和2,并且模型的输出将是一个长度为3的概率分布向量,表示对每个类别的预测概率。
在预测时,模型会返回对每个类别的预测概率,通过取最大概率对应的索引,就可以得到预测的类别。这个索引范围是从0到类别数量减1。与稀疏交叉熵不同,使用普通的(非稀疏)交叉熵损失函数时,标签通常使用 one-hot 编码,其中每个类别都由一个向量表示,只有真实标签对应的位置为1,其余都为0。在这种情况下,预测结果的标签也是从0开始的。