Tensorflow2 中对模型进行编译，不同loss函数的选择下输入数据格式需求变化

一、tf2中常用的损失函数介绍

在 TensorFlow 2 中，编译模型时可以选择不同的损失函数来定义模型的目标函数。不同的损失函数适用于不同的问题类型和模型架构。下面是几种常见的损失函数以及它们的作用和适用场景：

1.均方误差（Mean Squared Error, MSE）：MSE 是回归问题中常用的损失函数，用于衡量预测值与真实值之间的平均平方差。较大的误差会得到更大的惩罚，适用于回归任务。

python 复制代码

model.compile(loss='mse', ...)

**2.二进制交叉熵（Binary Cross Entropy）：**二进制交叉熵是二分类问题中常用的损失函数，用于衡量两个类别之间的差异性。适用于二分类问题，输出为一个概率值的 sigmoid 激活的模型。

python 复制代码

model.compile(loss='binary_crossentropy', ...)

3.多类交叉熵（Categorical Cross Entropy）：多类交叉熵是多分类问题中常用的损失函数，用于衡量多个类别之间的差异性。适用于多分类问题，输出为每个类别的概率分布的 softmax 激活的模型。

python 复制代码

model.compile(loss='categorical_crossentropy', ...)

4.稀疏分类交叉熵（Sparse Categorical Cross Entropy）：类似于多类交叉熵，但适用于标签以整数形式表示的多分类问题，而不是 one-hot 编码。

python 复制代码

model.compile(loss='sparse_categorical_crossentropy', ...)

5.KL 散度损失（Kullback-Leibler Divergence）：KL 散度用于衡量两个概率分布的差异性。在生成模型中，常与自动编码器等模型结合使用，促使模型输出接近于预定义的概率分布。

python 复制代码

model.compile(loss='kullback_leibler_divergence', ...)

除了上述常见的损失函数之外，还有其他一些定制化的损失函数，可以根据具体任务和需求来自定义。通过 tf.keras.losses 模块，您可以查看更多可用的损失函数，并选择适合自己模型的损失函数。在选择损失函数时，需要根据任务类型、数据分布以及模型设计进行合理选择，以获得最佳的训练效果。

二、两种损失函数的比较分析

多类交叉熵（Categorical Cross Entropy）和稀疏分类交叉熵（Sparse Categorical Cross Entropy）

相同点：都可用于数据多分类任务。

不同点：对数据的输入要求不一样，多类交叉熵（Categorical Cross Entropy）要求数据为 one-hot 编码，这个主要是针对数据的标签数据，比如我们的数据标签数据读取的时候，其类别是0-9，这个数据可以是一列数据，这个时候我们可以使用稀疏分类交叉熵（Sparse Categorical Cross Entropy）函数直接进行编译。

one_hot编码（独热编码）说明：

一种将每个元素表示为二进制向量的编码方式，其中只有一个元素为1，其余元素都为0。例如，如果我们有一个长度为N的列表，那么它的one-hot编码将是一个NxN的矩阵，其中第i行表示第i个元素的编码。例如，如果我们有一个包含3种颜色的列表["红","蓝","绿"]，那么它们的one-hot编码将是：

红：[1,0,0] 蓝：[0,1,0] 绿：[0,0,1]

这种编码方式常用于机器学习中，可以将每个类别标签转换为one-hot向量以便进行训练。

如果是使用多类交叉熵（Categorical Cross Entropy）作为损失函数，那么我们对数据进行one-hot编码，代码有的地方使用：

python 复制代码

y_train=tf.keras.utils.to_categorical(y_train) #报错
    y_test=tf.keras.utils.to_categorical(y_test)

在tensorflow2.5环境下报错：

python 复制代码

tensorflow.python.framework.errors_impl.InvalidArgumentError:  logits and labels must have the same first dimension, got logits shape [12,16] and labels shape [204]
	 [[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_Classifier_tf_imgVali.py:286) ]] [Op:__inference_train_function_762]

Function call stack:
train_function

这里我们可以使用以下代码替代：

python 复制代码

y_train_one_hot = tf.one_hot(y_train, depth=num_classes)
y_test_one_hot = tf.one_hot(y_test, depth=num_classes)

三、示例代码分析

Sparse Categorical Cross Entropy和Categorical Cross Entropy对应的损失函数围为：

python 复制代码

loss='sparse_categorical_crossentropy'    loss='categorical_crossentropy'

使用minist数据做一个简单的MLP模型分类，这里先使用Sparse Categorical Cross Entropy损失函数。代码如下：

python 复制代码

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# 准备数据集
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784) / 255.0
x_test = x_test.reshape(-1, 784) / 255.0

# 构建模型
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=784))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

# 编译模型
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# 训练模型
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_test, y_test))

# 评估模型
loss, accuracy = model.evaluate(x_test, y_test)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)

# 使用模型进行预测
predictions = model.predict(x_test[:5])
print('Predictions:', tf.argmax(predictions, axis=1))
print('Labels:', y_test[:5])

运行结果如下：

python 复制代码

D:\PycharmProjects\pythonProject\venv\Scripts\python.exe D:/PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_TEST_MINIST.py
2023-10-14 22:28:27.465600: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 22:28:30.610122: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2023-10-14 22:28:30.637119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 22:28:30.637445: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 22:28:30.648571: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-14 22:28:30.648748: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-14 22:28:30.652682: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2023-10-14 22:28:30.654729: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2023-10-14 22:28:30.657643: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2023-10-14 22:28:30.661178: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2023-10-14 22:28:30.662311: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-14 22:28:30.662510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 22:28:30.662864: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-14 22:28:30.663583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 22:28:30.663941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 22:28:31.130464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-14 22:28:31.130645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2023-10-14 22:28:31.130748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2023-10-14 22:28:31.130967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6001 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-14 22:28:31.709522: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10
2023-10-14 22:28:31.920032: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-14 22:28:32.369951: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
1875/1875 [==============================] - 5s 2ms/step - loss: 0.2845 - accuracy: 0.9174 - val_loss: 0.1443 - val_accuracy: 0.9547
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1261 - accuracy: 0.9633 - val_loss: 0.1085 - val_accuracy: 0.9646
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0937 - accuracy: 0.9716 - val_loss: 0.1034 - val_accuracy: 0.9690
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0731 - accuracy: 0.9772 - val_loss: 0.0987 - val_accuracy: 0.9714
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0612 - accuracy: 0.9810 - val_loss: 0.0828 - val_accuracy: 0.9749
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0507 - accuracy: 0.9835 - val_loss: 0.0955 - val_accuracy: 0.9702
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0430 - accuracy: 0.9859 - val_loss: 0.0863 - val_accuracy: 0.9746
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0374 - accuracy: 0.9874 - val_loss: 0.0935 - val_accuracy: 0.9737
Epoch 9/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0328 - accuracy: 0.9894 - val_loss: 0.0902 - val_accuracy: 0.9754
Epoch 10/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0287 - accuracy: 0.9900 - val_loss: 0.0902 - val_accuracy: 0.9771
313/313 [==============================] - 1s 2ms/step - loss: 0.0902 - accuracy: 0.9771
Test Loss: 0.09022707492113113
Test Accuracy: 0.9771000146865845
Predictions: tf.Tensor([7 2 1 0 4], shape=(5,), dtype=int64)
Labels: [7 2 1 0 4]

Process finished with exit code 0

我们将损失函数修改为Categorical Cross Entropy运行代码就会报错：

python 复制代码

 ValueError: Shapes (32, 1) and (32, 10) are incompatible

这是因为我们没有将标签数据转化为独热编码，我们转换一下,,在model.fit()函数前加上：

python 复制代码

y_train = tf.one_hot(y_train, depth=10)
y_test= tf.one_hot(y_test, depth=10)

运行结果如下：

python 复制代码

D:\PycharmProjects\pythonProject\venv\Scripts\python.exe D:/PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_TEST_MINIST.py
2023-10-14 23:20:04.708405: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 23:20:07.803493: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2023-10-14 23:20:07.833164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 23:20:07.833480: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 23:20:07.840527: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-14 23:20:07.840689: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-14 23:20:07.844132: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2023-10-14 23:20:07.845657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2023-10-14 23:20:07.848488: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2023-10-14 23:20:07.852061: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2023-10-14 23:20:07.853130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-14 23:20:07.853317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 23:20:07.853652: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-14 23:20:07.854467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 23:20:07.854879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 23:20:08.326771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-14 23:20:08.326942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 
2023-10-14 23:20:08.327041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N 
2023-10-14 23:20:08.327252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6001 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-14 23:20:08.914697: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10
2023-10-14 23:20:09.138669: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
   1/1875 [..............................] - ETA: 21:57 - loss: 2.4066 - accuracy: 0.12502023-10-14 23:20:09.626287: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2784 - accuracy: 0.9182 - val_loss: 0.1517 - val_accuracy: 0.9510
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1217 - accuracy: 0.9633 - val_loss: 0.1258 - val_accuracy: 0.9611
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0874 - accuracy: 0.9731 - val_loss: 0.1045 - val_accuracy: 0.9666
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0701 - accuracy: 0.9778 - val_loss: 0.0929 - val_accuracy: 0.9718
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0569 - accuracy: 0.9821 - val_loss: 0.0853 - val_accuracy: 0.9751
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0488 - accuracy: 0.9844 - val_loss: 0.0911 - val_accuracy: 0.9706
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0402 - accuracy: 0.9868 - val_loss: 0.0847 - val_accuracy: 0.9748
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0355 - accuracy: 0.9882 - val_loss: 0.0975 - val_accuracy: 0.9723
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0307 - accuracy: 0.9895 - val_loss: 0.1027 - val_accuracy: 0.9743
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0281 - accuracy: 0.9907 - val_loss: 0.1004 - val_accuracy: 0.9734
313/313 [==============================] - 1s 2ms/step - loss: 0.1004 - accuracy: 0.9734
Test Loss: 0.10037881135940552
Test Accuracy: 0.9733999967575073
Predictions: tf.Tensor([7 2 1 0 4], shape=(5,), dtype=int64)
Labels: tf.Tensor(
[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]], shape=(5, 10), dtype=float32)

Process finished with exit code 0

注意事项：

使用稀疏交叉熵损失函数编译的模型的预测结果标签从0开始。如果自己的数据是从1开始的，那么后面做验证分析的时候需要注意两者应该保持一致。

在使用稀疏交叉熵损失函数进行多分类问题训练时，标签通常使用整数表示，并且标签值的范围是从0到类别数量减1。模型的输出也应该是每个类别的概率分布。

例如，如果有3个类别，标签将被编码为0、1和2，并且模型的输出将是一个长度为3的概率分布向量，表示对每个类别的预测概率。

在预测时，模型会返回对每个类别的预测概率，通过取最大概率对应的索引，就可以得到预测的类别。这个索引范围是从0到类别数量减1。与稀疏交叉熵不同，使用普通的（非稀疏）交叉熵损失函数时，标签通常使用 one-hot 编码，其中每个类别都由一个向量表示，只有真实标签对应的位置为1，其余都为0。在这种情况下，预测结果的标签也是从0开始的。