tensorflow搭建神经网络

基础知识

张量Tensor：多维

创建张量：

python 复制代码

import tendorflow as tf
tf.constant(张量内容，dtype)

转换成张量：

python 复制代码

tf.convert_to_tenor(数据名，dtype)

全0 全1 填充

python 复制代码

tf.zero(维度)
tf.ones(维度)
tf.fill(维度，指定值)

生成正态分布的随机数

python 复制代码

tf.random.normal(维度，mean,stddev)

生成截断式正太分布的随机数（μ-2σ，μ+2σ）

python 复制代码

tf.random.truncated_normal(维度，mean，stddev)

生成均匀分布随机数

复制代码

tf.random.uniform(维度，minval,maxval)

常见函数

python 复制代码

#tensor->其他数据类型
tf.cast(张量，dtype)
#最小值 最大值
tf.reduce_min(张量，axis)
tf.reduce_max(张量，axis)
#axis:axis=1:行；axis=0:列
#将变量标记为 可训练
tf.Variable()
#四则运算 需要维度相同
tf.add(tensor1,tensor2)
tf.substract(t1,t2)
tf.multiply(t1,t2)
tf.divide(t1,t2)
#平方 次方 开方
tf.square(t)
tf.pow(t,n)
tf.sqrt(t)
#矩阵乘
tf.matmul(矩阵1,矩阵2)
#切分传入张量的第一维度 生成输入的特征和标签对 构建数据集
data = tf.data.Dataset.from_tensor_slices((特征，标签))
#求梯度
with tf.GradientTape() as tape:
    ...
    ...
grad = tape.gradient(函数，对谁求导)
#遍历每个元素
for imgs,labels in enumerate(datas):
    print(imgs)
    print(labels)
#独热编码
tf.one_hot(待转换数据，depth=几分类)
#n分类的n个输出通过softmax() 符合概率分布
tf.nn.softmax(x)
#赋值 
w = tf.Variable(4)
w.assign_sub(1) # w -= 1
#最大值索引
tf.argmax(tensor1,axis)
#条件
tf.where(条件，真返回A，假返回B)
#返回随机数 seed=常数 每次生成随机数相同
np.random.RandomState(seed)
#两个数组按垂直方向叠加
np.vstack((a,b))
#还有一些函数
np.mgrid[ 起始值 : 结束值 : 步长 ，起始值 : 结束值 : 步长 , ... ]
x.ravel( ) 将x变为一维数组，"把 . 前变量拉直"
np.c_[ ] 使返回的间隔数值点配对

神经网络优化过程

神经网络NN 复杂度
1. 层数 = 隐藏层数 + 输出层（除了输入层）
2. 空间复杂度 = 总的权值个数+总的偏置个数
3. 时间复杂度 = 总的权值个数
学习率 learning rate：lr

学习率过小：参数更新慢模型收敛慢

学习率过大：不容易找到梯度的极小值来回震荡

策略：

指数衰减学习率：先用较大的学习率，快速得到最优解，逐步减小学习率，使模型在训练后期稳定

指数衰减学习率 = 初始学习率 * 学习率衰减率 ^（当前轮数/ 多少轮衰减一次）
激活函数

非线性函数

为了增加模型的复杂度，防止模型过拟合

常见：
1. sigmoid()
  
  f(x) = 1/(1+exp(-x))
2. Tanh()
  
  f(x) = (1-exp(-2x))/(1+exp(-2x))
3. Relu()
  
  f(x) = {0|x<0;x|x>=0}
4. Leaky Relu()
  
  f(x) = max(ax,a)
5. 初学者建议：
  
  首选relu()
  
  lr设置较小值
  
  特征标准化（0，1）
  
  初始参数正态分布（0，sqrt(2/当前层输入特征个数)）
损失函数loss

-预测值y与真实值y_的差距
1. 均方误差 MSE
  python 复制代码
```
loss_mse = tf.reduce_mean(tf.square(y_-y))
```
2. 交叉熵损失函数 CE
  python 复制代码
```
tf.losses.categorical_crossentropy(y_,y)
```
3. softmax 和 CE 结合
  python 复制代码
```
tf.nn.softmax_cross_entropy_with_logits(y_,y)
```
欠拟合 & 过拟合
1. 欠拟合：训练集拟合效果差
  
  解决：增加输入特征项增加参数减少正则化参数
2. 过拟合：训练太多了泛化能力差验证集测试集拟合效果差待优化的参数过多容易导致模型过拟合
  
  解决：数据清洗增大训练集采用正则化增大正则化参数
正则化

正则化在损失函数中引入模型复杂度指标利用给参数加权值弱化数据集的噪声
1. L1 正则化
  
  会使很多参数变为0，因此L1正则化可以通过稀疏参数，减少参数数量，降低复杂度
2. L2 正则化
  
  会使很多参数接近0，因此L2正则化可以通过减小参数的值来降低复杂度
  python 复制代码
```
tf.nn.l2_loss(weight)
```

优化器

SGD 随机梯度下降

python 复制代码

w1.assign_sub(lr*grads[0])
b1.assign_sub(lr*grads[1])

SGDM 增加一阶动量

python 复制代码

m_w = beta * m_w + (1 - beta) * grads[0]
w1.assign_sub(lr*m_w)

Adagrad 增加二阶动量

python 复制代码

v_w += tf.square(grads[0])
w1.assign_sub(lr*grads[0]/tf.sqrt(v_w))

RMSProp 增加二阶动量

python 复制代码

v_w += beta * v_w + (1 - beta) * tf.square(grads[0])
w1.assign_sub(lr*grads[0]/tf.sqrt(v_w))

Adam 结合一阶和二阶
python 复制代码
```
省略
```

搭建神经网络

tensorflow搭建神经网络

import 导入各种包
train test 训练集&测试集
model=tf.keras.models.Sequential 网络结构
model.compile 优化器损失函数评判标准
model.fit 训练模型
model.summary 我称之为文字版可视化
code:

python 复制代码

import tensorflow as tf
from sklearn import datasets
import numpy as np

x_train = datasets.load_iris().data
y_train = datasets.load_iris().target

np.random.seed()
np.random.shuffle(x_train)
np.random.seed()
np.random.shuffle(y_train)
tf.random.set_seed()

model = tf.keras.models.Sequential([
  tf.keras.layer.Dense(3,activation='softmax',kernel_reqularizer=tf.keras.regularizers.l2())
])

model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.1),
             loss = tf.keras.losses.SparesCategoricalCrossentropy(from_logits=False),
             metrics = ['sparse_categorical_accuracy'])

model.fit(x_train,y_train,batch_size=32,epochs=500,validation_split=0.2,validation_freq=20)

model.summary()

model

可以定义成一个类

python 复制代码

class MyModel(Model):
    def __init__(self):
        super(MyModel,self).__init__()
        定义网络结构
    #前向传播
    def call(self,x):
        调用上面定义的网络结构
        return result
    
myModel = MyModel()
result = myModel(x)

神经网络功能扩展

自制数据集

数据增强

python 复制代码

image_gen_train = tf.keras.preprocessing.image.ImageDataGenerator(
	rescale = 所有的数据将乘以该数值
    rotation_range = 随机旋转角度数范围
    width_shift_range = 随机宽度偏移量
    height_shift_range = 随机高度偏移量
    horizontal_flip = 是否随机水平翻转
    zoom_range = 随机缩放的范围[1-n,1+n]
)
image_gen_train.fit(x_train)
...
...
model.fit(image_gen_train.flow(x_train,y_train,batch_size=32),....)

断点续训，存取模型

python 复制代码

#读取模型
checkpoint_save_path = '保存路径'
if os.path.exists(checkpoint_save_path+'.index'):
    print('-----load the model-----')
    model.load.weights(checkpoint_save_path)
#保存模型
cp_callback = tf.keras.callbacks.ModelCheckpoint(
	filepath = checkpoint_save_path,
    save_weights_only = True,
    save_best_only = True
)
history = model.fit(x_train,y_train,batch_size=32,epochs=5,
                   validation_data = (x_test,y_test),validation_freq=1,
                   callbacks = [cp_callback]
)

参数提取把参数存入文本

python 复制代码

#threshold:超过多少省略显示
np.set_printoptions(threshold=np.inf) #inf:无穷大
...
...
print(model.trainable_variables)
file = open('./weight.txt,'w)
for v in model.trainable_variables:
    file.write(str(v.name)+'\n')
    file.write(str(v.shape)+'\n')
    file.write(str(v.numpy())+'\n')
file.close()

训练指标可视化查看训练效果

acc loss:

python 复制代码

acc = history.history['sparse_categorical_accuracy']
val_acc = history.history['val_sparse_categorical_accuracy']
loss = history.history['losss']
val_loss = history.history['val_loss']

plt.........

应用

python 复制代码

#复现模型 前向传播
model = tf.keras.models.Sequential([
    #拉直层
    tf.keras.layers.Flatten(),
    #全连接层
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dense(10,activation='softmax')
])
#加载参数
model.load_weights(model_save_path)
#预测
result = model.predict(x_predict)

卷积神经网络 CNN

卷积层

卷积：图像特征提取
池化层

池化：下采样减少特征数
全连接层

全连接NN ：每个神经元与前后相邻层的每一个神经元都有连接关系，输入是特征，输出为预测的结果

会先对原始图像进行特征提取再把提取到的特征送给全连接网络
CNN
1. CNN:卷积神经网络
  
  借助卷积核提取特征后，送入全连接网络
  
  卷积就是特征提取器
  1. CNN主要模块流程
    1. 卷积层 Conv2D
    2. 批标准化层 BatchNormalization
    3. 激活层 Activation
    4. 池化层 MaxPool2D
    5. 舍弃层 Dropout
    6. 拉直层 Flatten
    7. 全连接层 Dense
  2. 说明
    
    根据网络结构具体编码
    
    有的网络结构包含多个卷积层和池化层也是为了更好的提取特征
2. 感受野：CNN各输出特征图中的每个像素点在原始的输入图片上映射区域的大小
3. padding
  
  全零填充：可以保证输入图片的维度和输出的特征图的维度相同
  
  padding -SAME:全0填充输出特征图的维度=输入特征图的维度/步长（向上取整）
  
  padding -VALID:不全0填充输出特征图的维度=（输入特征图的维度-卷积核维度+1）/ 步长（向上取整）
4. 卷积层 conv
  python 复制代码
```
Conv2D(filters,kernel_size,strides,padding,activation,input_shape)
```
  filters = 卷积核个数
  
  kernel_size = 卷积核尺寸
  
  strides = 步长
  
  activation = 激活函数（有BN此处不写）
  
  input_shape = (高，宽，通道数) 可省略
5. 批标准化BN batch normalization
  
  标准化：mean：0 std：1
  
  批标准化：对一小批数据batch 做标准化处理
  
  BN层位于卷积层之后激活层之前
  python 复制代码
```
BatchNormalization()
```
6. 池化 pooling
  
  最大池化：提取图片纹理
  
  均值池化：保留背景特征
  python 复制代码
```
tf.keras.layers.MaxPool2D(pool_size,strides,padding)
tf.keras.layers.AveragePooling2D()
```
7. 舍弃 dropout
  
  在神经网络训练时，将一部分神经元按照一定概率从神经网络中暂时舍弃，神经网络使用的时候，被舍弃的神经元恢复连接
  python 复制代码
```
Dropout(舍弃的概率)
```
经典CNN
1. LeNet
  
  共享卷积核，减少网络参数
2. ALexNet
  
  使用relu激活函数，使用dropout
3. VGGNet
  
  小尺寸卷积核减少参数，网络结构规整，适合并行加速
4. Inception
  
  一层内使用不同尺寸卷积核，提升感知力，使用BN，缓解梯度消失
5. ResNet
  
  层间残差跳连，引入前方信息，缓解模型退化，使神经网络参数加深成为可能

循环神经网络 RNN

网络结构
循环核

参数时间共享循环层提取时间信息

ht = tanh(Xt×Wxh+ht-1×Whh+bh)

yt = softmax(ht×why+by)

前向传播：记忆体内存储的状态信息ht，在每个时刻都被刷新，三个参数矩阵Wxh Whh Why固定不变

反向传播：三个参数矩阵被梯度下降法更新

循环计算层

层数：循环核个数

python 复制代码

tf.keras.layers.SimpleRNN(记忆体个数，activation,return_sequences=是否每个时刻输出ht到下一层)
#return_sequences = True:各时间步输出ht
#return_sequences = False:仅最后时间步输出ht

Embedding 独热编码

用低维向量实现了编码，这种编码通过神经网络训练优化，能表达出单词见的相关性
python 复制代码
```
tf.keras.layers.Embedding(词汇表大小，编码维度)
```
编码维度：用几个数字表达一个单词
RNN

借助循环核提取时间特征后，送入全连接网络
1. RNN主要模块流程
  1. 循环层 SimpleRNN / LSTM / GRU
  2. 批标准化层 BatchNormalization
  3. 激活层 Activation
  4. 池化层 GlobalMaxPool1D/GlobalAvgPool1D
  5. 舍弃层 Dropout
  6. 展平层 Reshape
  7. 全连接层 Dense
经典RNN
1. 简单RNN
2. LSTM
  
  增加遗忘门细胞态（长期记忆）记忆体（短期记忆）候选态（归纳出的新知识）
  python 复制代码
```
tf.keras.layers.LSTM(记忆体个数，return_sequences=是否返回输出)
```
3. GRU
  
  更新门重置门记忆体候选隐藏层
  python 复制代码
```
tf.keras.layers.GPU(记忆体个数，return_sequences=是否返回输出)
```
4. RNN 的核心是循环层（LSTM 和 GRU 是改进型循环层，解决长序列依赖问题）