Trae张量操作大全：从基础运算到广播机制

一、张量基础概念

1.1 什么是张量？

张量（Tensor）是深度学习中数据的基本表示形式，可理解为多维数组。它具有以下特点：

标量（0 维张量） ：单个数值，如 5。
向量（1 维张量） ：一维数组，如 [1, 2, 3]。
矩阵（2 维张量） ：二维数组，如 [[1, 2], [3, 4]]。
更高维张量 ：三维及以上，如 [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]。

张量的维度称为秩（Rank），每个维度的长度称为形状（Shape）。例如，形状为 (3, 4) 的张量是 2 维张量，秩为 2。

张量类型	维数	示例
标量	0	`5`
向量	1	`[1, 2, 3]`
矩阵	2	`[[1, 2], [3, 4]]`
3D 张量	3	`[ [[1, 2], [3, 4]], [[5, 6], [7, 8]] ]`

1.2 张量的创建

在 Trae 中，使用 tf.constant() 和 tf.Variable() 创建张量。

python 复制代码

import tensorflow as tf

# 创建标量
scalar = tf.constant(5)
print(f"标量: {scalar}, 类型: {scalar.dtype}, 形状: {scalar.shape}")

# 创建向量
vector = tf.constant([1, 2, 3])
print(f"向量: {vector}, 类型: {vector.dtype}, 形状: {vector.shape}")

# 创建矩阵
matrix = tf.constant([[1, 2], [3, 4]])
print(f"矩阵: {matrix}, 类型: {matrix.dtype}, 形状: {matrix.shape}")

# 创建 3D 张量
tensor_3d = tf.constant([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(f"3D 张量: {tensor_3d}, 类型: {tensor_3d.dtype}, 形状: {tensor_3d.shape}")

代码解释：

tf.constant()：创建不可变张量，默认为 int32 类型。
scalar：标量，形状为空（()）。
vector：向量，形状为 (3,)。
matrix：矩阵，形状为 (2, 2)。
tensor_3d：3D 张量，形状为 (2, 2, 2)。

1.3 张量的属性

张量具有以下重要属性：

dtype ：数据类型，如 int32、float32 等。
shape：张量形状，表示各维度的大小。
ndim：张量的维度数（秩）。

python 复制代码

# 查看张量属性
print(f"张量类型: {matrix.dtype}")
print(f"张量形状: {matrix.shape}")
print(f"张量维度数: {matrix.ndim}")

输出示例：

makefile 复制代码

张量类型: tf.int32
张量形状: (2, 2)
张量维度数: 2

1.4 张量基础概念总结（使用 mermaid 绘制思维导图）

graph TD A[张量基础概念] --> B[什么是张量] A --> C[张量的创建] A --> D[张量的属性] B --> E[标量] B --> F[向量] B --> G[矩阵] B --> H[更多维张量] C --> I[tf.constant()] C --> J[tf.Variable()] D --> K[dtype] D --> L[shape] D --> M[ndim]

二、张量基础运算

2.1 张量的算术运算

张量支持基本的算术运算，包括加法、减法、乘法和除法。

python 复制代码

# 张量算术运算
a = tf.constant([1, 2, 3])
b = tf.constant([4, 5, 6])

# 加法
add_result = tf.add(a, b)  # 或 a + b
print(f"加法结果: {add_result}")

# 减法
sub_result = tf.subtract(a, b)  # 或 a - b
print(f"减法结果: {sub_result}")

# 乘法
mul_result = tf.multiply(a, b)  # 或 a * b（元素级乘法）
print(f"乘法结果: {mul_result}")

# 除法
div_result = tf.divide(a, b)  # 或 a / b
print(f"除法结果: {div_result}")

代码解释：

tf.add()：张量加法，对应元素相加。
tf.subtract()：张量减法，对应元素相减。
tf.multiply()：元素级乘法（不是矩阵乘法）。
tf.divide()：元素级除法。

输出示例：

ini 复制代码

加法结果: tf.Tensor([5 7 9], shape=(3,), dtype=int32)
减法结果: tf.Tensor([-3 -3 -3], shape=(3,), dtype=int32)
乘法结果: tf.Tensor([4 10 18], shape=(3,), dtype=int32)
除法结果: tf.Tensor([0.25 0.4  0.5 ], shape=(3,), dtype=float64)

2.2 张量的矩阵运算

矩阵运算包括矩阵乘法、转置和求逆等。

python 复制代码

# 矩阵运算
matrix1 = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)
matrix2 = tf.constant([[5, 6], [7, 8]], dtype=tf.float32)

# 矩阵乘法
matmul_result = tf.matmul(matrix1, matrix2)  # 或 matrix1 @ matrix2
print(f"矩阵乘法结果:\n{matmul_result}")

# 矩阵转置
transpose_result = tf.transpose(matrix1)
print(f"矩阵转置结果:\n{transpose_result}")

# 矩阵求逆（仅适用于可逆矩阵）
inverse_result = tf.linalg.inv(matrix1)
print(f"矩阵求逆结果:\n{inverse_result}")

代码解释：

tf.matmul()：矩阵乘法，计算两个矩阵的乘积。
tf.transpose()：矩阵转置，交换矩阵的行和列。
tf.linalg.inv()：矩阵求逆，仅适用于方阵且行列式不为零。

输出示例：

lua 复制代码

矩阵乘法结果:
tf.Tensor(
[[19. 22.]
 [43. 50.]], shape=(2, 2), dtype=float32)
矩阵转置结果:
tf.Tensor(
[[1. 3.]
 [2. 4.]], shape=(2, 2), dtype=float32)
矩阵求逆结果:
tf.Tensor(
[[-2.   1. ]
 [ 1.5 -0.5]], shape=(2, 2), dtype=float32)

2.3 张量的逻辑运算

张量支持逻辑运算，如比较运算和布尔运算。

python 复制代码

# 逻辑运算
a = tf.constant([1, 2, 3])
b = tf.constant([2, 2, 2])

# 大于比较
greater_result = tf.greater(a, b)  # 或 a > b
print(f"大于比较结果: {greater_result}")

# 小于比较
less_result = tf.less(a, b)  # 或 a < b
print(f"小于比较结果: {less_result}")

# 布尔运算（按位与、或、非）
bool_a = tf.constant([True, False, True])
bool_b = tf.constant([False, True, True])

and_result = tf.logical_and(bool_a, bool_b)  # 或 bool_a & bool_b
or_result = tf.logical_or(bool_a, bool_b)  # 或 bool_a | bool_b
not_result = tf.logical_not(bool_a)  # 或 ~bool_a

代码解释：

tf.greater()：比较两个张量，返回布尔张量，表示对应元素是否满足大于关系。
tf.less()：比较两个张量，返回布尔张量，表示对应元素是否满足小于关系。
tf.logical_and()：布尔按位与运算。
tf.logical_or()：布尔按位或运算。
tf.logical_not()：布尔按位非运算。

输出示例：

python 复制代码

大于比较结果: tf.Tensor([False False  True], shape=(3,), dtype=bool)
小于比较结果: tf.Tensor([ True False False], shape=(3,), dtype=bool)

2.4 张量基础运算总结（使用 mermaid 绘制流程图）

graph TD A[张量基础运算] --> B[算术运算] A --> C[矩阵运算] A --> D[逻辑运算] B --> E[加法] B --> F[减法] B --> G[乘法] B --> H[除法] C --> I[矩阵乘法] C --> J[矩阵转置] C --> K[矩阵求逆] D --> L[大于比较] D --> M[小于比较] D --> N[布尔运算]

三、张量索引与切片

3.1 张量索引

张量索引与 Python 列表索引类似，用于访问张量中的单个元素。

python 复制代码

# 张量索引
tensor = tf.constant([1, 2, 3, 4, 5])

# 访问第一个元素
first_element = tensor[0]
print(f"第一个元素: {first_element}")

# 访问最后一个元素
last_element = tensor[-1]
print(f"最后一个元素: {last_element}")

# 访问指定位置元素
element = tensor[2]
print(f"第三个元素: {element}")

代码解释：

tensor[0]：访问索引为 0 的元素（第一个元素）。
tensor[-1]：访问最后一个元素。

输出示例：

ini 复制代码

第一个元素: tf.Tensor(1, shape=(), dtype=int32)
最后一个元素: tf.Tensor(5, shape=(), dtype=int32)
第三个元素: tf.Tensor(3, shape=(), dtype=int32)

3.2 张量切片

张量切片用于获取张量中的一部分元素，语法为 tensor[start:end:step]。

python 复制代码

# 张量切片
tensor = tf.constant([1, 2, 3, 4, 5, 6])

# 获取前 3 个元素
first_three = tensor[:3]
print(f"前 3 个元素: {first_three}")

# 获取从索引 2 到索引 5 的元素
middle_elements = tensor[2:5]
print(f"中间元素: {middle_elements}")

# 获取所有偶数索引的元素
even_indices = tensor[::2]
print(f"偶数索引元素: {even_indices}")

# 获取所有奇数索引的元素
odd_indices = tensor[1::2]
print(f"奇数索引元素: {odd_indices}")

代码解释：

tensor[:3]：从索引 0 到 2（不包含 3），即前 3 个元素。
tensor[2:5]：从索引 2 到 4（不包含 5）。
tensor[::2]：步长为 2，获取所有偶数索引的元素。
tensor[1::2]：从索引 1 开始，步长为 2，获取所有奇数索引的元素。

输出示例：

ini 复制代码

前 3 个元素: tf.Tensor([1 2 3], shape=(3,), dtype=int32)
中间元素: tf.Tensor([3 4 5], shape=(3,), dtype=int32)
偶数索引元素: tf.Tensor([1 3 5], shape=(3,), dtype=int32)
奇数索引元素: tf.Tensor([2 4 6], shape=(3,), dtype=int32)

3.3 多维张量索引与切片

对于多维张量，索引和切片操作可以扩展到多个维度。

python 复制代码

# 多维张量索引与切片
matrix = tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# 访问第二行第三列元素
element = matrix[1, 2]
print(f"第二行第三列元素: {element}")

# 获取第一行所有列
first_row = matrix[0, :]
print(f"第一行所有列: {first_row}")

# 获取所有行的第二列
second_column = matrix[:, 1]
print(f"所有行的第二列: {second_column}")

# 获取子矩阵（前两行，后两列）
sub_matrix = matrix[:2, 1:]
print(f"子矩阵:\n{sub_matrix}")

代码解释：

matrix[1, 2]：访问第二行（索引为 1）第三列（索引为 2）的元素。
matrix[0, :]：获取第一行（索引为 0）的所有列。
matrix[:, 1]：获取所有行的第二列（索引为 1）。
matrix[:2, 1:]：获取前两行（索引 0 和 1），从第二列开始到最后一列。

输出示例：

ini 复制代码

第二行第三列元素: tf.Tensor(6, shape=(), dtype=int32)
第一行所有列: tf.Tensor([1 2 3], shape=(3,), dtype=int32)
所有行的第二列: tf.Tensor([2 5 8], shape=(3,), dtype=int32)
子矩阵:
tf.Tensor(
[[2 3]
 [5 6]], shape=(2, 2), dtype=int32)

3.4 张量索引与切片总结（使用 mermaid 绘制思维导图）

graph TD A[张量索引与切片] --> B[张量索引] A --> C[张量切片] A --> D[多维张量索引与切片] B --> E[访问单个元素] C --> F[获取部分元素] C --> G[步长操作] D --> H[多维索引] D --> I[多维切片]

四、张量形状变换

4.1 改变张量形状

通过 tf.reshape() 改变张量的形状，但需保持元素总数不变。

python 复制代码

# 改变张量形状
tensor = tf.constant([1, 2, 3, 4, 5, 6])

# 改变形状为 (2, 3)
reshaped_tensor = tf.reshape(tensor, (2, 3))
print(f"改变形状为 (2, 3):\n{reshaped_tensor}")

# 改变形状为 (3, 2)
reshaped_tensor = tf.reshape(tensor, (3, 2))
print(f"改变形状为 (3, 2):\n{reshaped_tensor}")

# 使用 -1 自动计算维度
reshaped_tensor = tf.reshape(tensor, (2, -1))
print(f"使用 -1 自动计算维度:\n{reshaped_tensor}")

代码解释：

tf.reshape(tensor, shape)：将张量改变为指定形状。
形状参数中可以使用 -1，表示自动计算该维度的大小。

输出示例：

lua 复制代码

改变形状为 (2, 3):
tf.Tensor(
[[1 2 3]
 [4 5 6]], shape=(2, 3), dtype=int32)
改变形状为 (3, 2):
tf.Tensor(
[[1 2]
 [3 4]
 [5 6]], shape=(3, 2), dtype=int32)
使用 -1 自动计算维度:
tf.Tensor(
[[1 2 3]
 [4 5 6]], shape=(2, 3), dtype=int32)

4.2 张量的拼接

将多个张量沿着指定轴拼接在一起。

python 复制代码

# 张量拼接
tensor1 = tf.constant([[1, 2], [3, 4]])
tensor2 = tf.constant([[5, 6], [7, 8]])

# 沿行拼接（轴=0）
concat_row = tf.concat([tensor1, tensor2], axis=0)
print(f"沿行拼接:\n{concat_row}")

# 沿列拼接（轴=1）
concat_col = tf.concat([tensor1, tensor2], axis=1)
print(f"沿列拼接:\n{concat_col}")

代码解释：

tf.concat([tensor1, tensor2], axis)：沿指定轴拼接张量。
axis=0：沿行拼接，增加行数。
axis=1：沿列拼接，增加列数。

输出示例：

ini 复制代码

沿行拼接:
tf.Tensor(
[[1 2]
 [3 4]
 [5 6]
 [7 8]], shape=(4, 2), dtype=int32)
沿列拼接:
tf.Tensor(
[[1 2 5 6]
 [3 4 7 8]], shape=(2, 4), dtype=int32)

4.3 张量的堆叠

将多个张量沿着新轴堆叠在一起。

python 复制代码

# 张量堆叠
tensor1 = tf.constant([1, 2, 3])
tensor2 = tf.constant([4, 5, 6])

# 沿新轴堆叠（轴=0）
stacked_tensor = tf.stack([tensor1, tensor2], axis=0)
print(f"沿新轴堆叠 (轴=0):\n{stacked_tensor}")

# 沿新轴堆叠（轴=1）
stacked_tensor = tf.stack([tensor1, tensor2], axis=1)
print(f"沿新轴堆叠 (轴=1):\n{stacked_tensor}")

代码解释：

tf.stack([tensor1, tensor2], axis)：沿指定新轴堆叠张量。
axis=0：在第一个维度前插入新轴。
axis=1：在第二个维度前插入新轴。

输出示例：

lua 复制代码

沿新轴堆叠 (轴=0):
tf.Tensor(
[[1 2 3]
 [4 5 6]], shape=(2, 3), dtype=int32)
沿新轴堆叠 (轴=1):
tf.Tensor(
[[1 4]
 [2 5]
 [3 6]], shape=(3, 2), dtype=int32)

4.4 张量的展开与挤压

展开和挤压张量的维度。

python 复制代码

# 展开张量维度
tensor = tf.constant([[1, 2, 3]])

# 在第 0 轴展开
expanded_tensor = tf.expand_dims(tensor, axis=0)
print(f"在第 0 轴展开:\n{expanded_tensor}")

# 在第 1 轴展开
expanded_tensor = tf.expand_dims(tensor, axis=1)
print(f"在第 1 轴展开:\n{expanded_tensor}")

# 挤压张量维度
tensor = tf.constant([[[1], [2], [3]]])

# 挤压维度为 1 的轴
squeezed_tensor = tf.squeeze(tensor, axis=1)
print(f"挤压第 1 轴后的张量:\n{squeezed_tensor}")

代码解释：

tf.expand_dims(tensor, axis)：在指定位置插入一个维度为 1 的轴。
tf.squeeze(tensor, axis)：移除指定的维度为 1 的轴。

输出示例：

lua 复制代码

在第 0 轴展开:
tf.Tensor(
[[[1 2 3]]], shape=(1, 1, 3), dtype=int32)
在第 1 轴展开:
tf.Tensor(
[[[1]
  [2]
  [3]]], shape=(1, 3, 1), dtype=int32)
挤压第 1 轴后的张量:
tf.Tensor(
[[1]
 [2]
 [3]], shape=(1, 3), dtype=int32)

4.5 张量形状变换总结（使用 mermaid 绘制流程图）

graph TD A[张量形状变换] --> B[改变张量形状] A --> C[张量拼接] A --> D[张量堆叠] A --> E[张量展开与挤压] B --> F[tf.reshape()] C --> G[tf.concat()] D --> H[tf.stack()] E --> I[tf.expand_dims()] E --> J[tf.squeeze()]

五、广播机制详解

5.1 什么是广播机制？

广播机制（Broadcasting）允许形状不同的张量进行运算，通过自动扩展形状较小的张量来匹配形状较大的张量。

广播规则：

对齐维度：将两个张量的形状从右向左对齐。
维度匹配 ：
- 如果某一维度的大小相等，则可以广播。
- 如果某一维度的大小为 1，则可以沿该维度广播。
- 如果某一维度的大小不为 1 且不相等，则无法广播，会抛出错误。

5.2 广播机制实例分析

5.2.1 标量与向量广播

标量与向量进行运算时，标量会被广播到与向量相同的形状。

python 复制代码

# 标量与向量广播
vector = tf.constant([1, 2, 3])
scalar = tf.constant(2)

# 标量与向量加法
result = vector + scalar
print(f"标量与向量加法结果: {result}")

代码解释：

标量 2 被广播为向量 [2, 2, 2]，然后与原向量 [1, 2, 3] 相加。

输出示例：

ini 复制代码

标量与向量加法结果: tf.Tensor([3 4 5], shape=(3,), dtype=int32)

5.2.2 向量与矩阵广播

向量与矩阵进行运算时，向量会被广播以匹配矩阵的形状。

python 复制代码

# 向量与矩阵广播
matrix = tf.constant([[1, 2, 3], [4, 5, 6]])
vector = tf.constant([1, 0, 1])

# 向量与矩阵加法
result = matrix + vector
print(f"向量与矩阵加法结果:\n{result}")

代码解释：

向量 [1, 0, 1] 被广播为与矩阵相同的形状 (2, 3)，然后逐元素相加。

输出示例：

lua 复制代码

向量与矩阵加法结果:
tf.Tensor(
[[2 2 4]
 [5 5 7]], shape=(2, 3), dtype=int32)

5.2.3 矩阵与矩阵广播

两个矩阵进行运算时，若形状不完全相同但满足广播条件，则会广播到兼容的形状。

python 复制代码

# 矩阵与矩阵广播
matrix1 = tf.constant([[1, 2], [3, 4]])
matrix2 = tf.constant([[1], [2]])

# 矩阵加法
result = matrix1 + matrix2
print(f"矩阵加法结果:\n{result}")

代码解释：

矩阵 matrix2 的形状为 (2, 1)，在第二维度（列）被广播为 (2, 2)，然后与 matrix1 相加。

输出示例：

lua 复制代码

矩阵加法结果:
tf.Tensor(
[[2 3]
 [5 6]], shape=(2, 2), dtype=int32)

5.3 广播机制总结（使用 mermaid 绘制思维导图）

graph TD A[广播机制详解] --> B[广播规则] A --> C[广播实例分析] B --> D[对齐维度] B --> E[维度匹配] C --> F[标量与向量广播] C --> G[向量与矩阵广播] C --> H[矩阵与矩阵广播] E --> I[维度相等] E --> J[维度为 1] E --> K[维度不匹配（报错）]

六、实战案例分析

6.1 数据预处理：标准化与归一化

在机器学习和深度学习中，数据预处理是重要步骤。张量操作可用于实现数据的标准化和归一化。

python 复制代码

# 数据预处理：标准化与归一化
import tensorflow as tf

# 生成随机数据
data = tf.random.normal([100, 5], mean=0, stddev=1.0)

# 标准化：将数据转换为均值为 0，标准差为 1
mean = tf.reduce_mean(data, axis=0)
std = tf.math.reduce_std(data, axis=0)
standardized_data = (data - mean) / std

# 归一化：将数据缩放到 [0, 1] 范围
min_val = tf.reduce_min(data, axis=0)
max_val = tf.reduce_max(data, axis=0)
normalized_data = (data - min_val) / (max_val - min_val)

代码解释：

tf.reduce_mean(data, axis=0)：计算每列的均值。
tf.math.reduce_std(data, axis=0)：计算每列的标准差。
tf.reduce_min(data, axis=0) 和 tf.reduce_max(data, axis=0)：计算每列的最小值和最大值。
标准化公式：(data - mean) / std
归一化公式：(data - min) / (max - min)

6.2 图像数据操作：调整大小与归一化

在图像处理任务中，张量操作可用于调整图像大小和归一化像素值。

python 复制代码

# 图像数据操作：调整大小与归一化
import tensorflow as tf

# 加载图像数据（假设为 28x28x3 的图像）
image = tf.random.uniform([28, 28, 3], minval=0, maxval=255, dtype=tf.int32)

# 调整图像大小为 64x64x3
resized_image = tf.image.resize(image, [64, 64])

# 归一化像素值到 [0, 1]
normalized_image = tf.cast(resized_image, tf.float32) / 255.0

代码解释：

tf.image.resize(image, [64, 64])：调整图像大小。
tf.cast(resized_image, tf.float32)：将图像数据类型转换为 float32。
归一化：将像素值除以 255，使其范围变为 [0, 1]。

6.3 自定义损失函数与梯度计算

在深度学习模型训练中，张量操作可用于构建自定义损失函数并计算梯度。

python 复制代码

# 自定义损失函数与梯度计算
import tensorflow as tf

# 定义模型参数
weights = tf.Variable([1.0, 2.0], dtype=tf.float32)
bias = tf.Variable(0.0, dtype=tf.float32)

# 定义输入和目标输出
inputs = tf.constant([[1.0, 2.0], [3.0, 4.0]], dtype=tf.float32)
targets = tf.constant([3.0, 7.0], dtype=tf.float32)

# 定义前向传播
def forward_pass(inputs):
    return tf.reduce_sum(inputs * weights) + bias

# 定义自定义损失函数
def custom_loss(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true - y_pred))  # 均方误差

# 计算前向传播结果
with tf.GradientTape() as tape:
    predictions = forward_pass(inputs)
    loss = custom_loss(targets, predictions)

# 计算梯度
gradients = tape.gradient(loss, [weights, bias])

# 更新模型参数（简单梯度下降）
learning_rate = 0.01
weights.assign_sub(learning_rate * gradients[0])
bias.assign_sub(learning_rate * gradients[1])

print(f"更新后的权重: {weights.numpy()}")
print(f"更新后的偏置: {bias.numpy()}")

代码解释：

tf.GradientTape()：记录张量操作以自动计算梯度。
tape.gradient(loss, [weights, bias])：计算损失对参数的梯度。
assign_sub()：将梯度乘以学习率后从参数中减去，实现梯度下降更新。