神经网络常见激活函数 2-tanh函数(双曲正切)

tanh

函数+求导

tanh函数
tanh ⁡ ( x ) = e x − e − x e x + e − x = 2 ⋅ s i g m o i d ( 2 x ) − 1 \begin{aligned} \tanh (x) &= \frac{e^x - e^{-x}}{e^x+e^{-x}} \\ &=2\cdot\rm sigmoid(2x) - 1 \end{aligned} tanh(x)=ex+e−xex−e−x=2⋅sigmoid(2x)−1
tanh函数求导
d d x tanh ⁡ ( x ) = ( e x − e − x ) ' ∗ ( e x + e − x ) − ( e x − e − x ) ∗ ( e x + e − x ) ' ( e x + e − x ) 2 = ( e x + e − x ) ∗ ( e x + e − x ) − ( e x − e − x ) ∗ ( e x − e − x ) ( e x + e − x ) 2 = 1 − ( e x − e − x ) 2 ( e x + e − x ) 2 = 1 − t a n h 2 ( x ) \begin{aligned} \frac{d}{dx}\tanh(x) &= \frac{(e^x-e^{-x})^{'} * (e^x+e^{-x}) - (e^x-e^{-x}) * (e^x+e^{-x})^{'} }{(e^x+e^{-x})^2} \\ & = \frac{(e^x+e^{-x})* (e^x+e^{-x}) - (e^x-e^{-x}) * (e^x-e^{-x})}{(e^x+e^{-x})^2} \\ & =1- \frac{(e^x-e^{-x})^2}{(e^x+e^{-x})^2} \\ & =1- \rm tanh ^2(x) \end{aligned} dxdtanh(x)=(ex+e−x)2(ex−e−x)'∗(ex+e−x)−(ex−e−x)∗(ex+e−x)'=(ex+e−x)2(ex+e−x)∗(ex+e−x)−(ex−e−x)∗(ex−e−x)=1−(ex+e−x)2(ex−e−x)2=1−tanh2(x)

函数和导函数图像

画图

python 复制代码

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

def tanh(x):
    return (np.exp(x)-np.exp(-x)) / (np.exp(x)+np.exp(-x))
    
def tanh_derivative(x):
    return 1 - tanh(x)**2
    

x = np.linspace(-6,6,1000)
y = [tanh(i) for i in x]
y1 = [tanh_derivative(i) for i in x]

plt.figure(figsize=(12,8))
ax = plt.gca()
plt.plot(x,y,label='tanh')
plt.plot(x,y1,label='Derivative')
plt.title('tanh and Partial Derivative')

#设置上边和右边无边框
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
#设置x坐标刻度数字或名称的位置
ax.xaxis.set_ticks_position('bottom')
#设置边框位置
ax.spines['bottom'].set_position(('data', 0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))

plt.legend()

优缺点

tanh函数优点
1. 在分类任务中，双曲正切函数（tanh）逐渐取代 Sigmoid 函数作为标准的激活函数，因为它具有许多神经网络所钟爱的特性。它是完全可微分的，反对称，且对称中心在原点。
2. 输出是 S 型曲线，能够打破网络层与网络层之间的线性关系。负输入被映射为负，零输入被映射为接近零；tanh 的输出区间为 [-1, 1]，且值域以 0 为中心。
3. 在一般的二元分类问题中，tanh 函数常用于隐藏层，而 Sigmoid 函数用于输出层，但这并不是固定规则，需要根据特定问题进行调整。
tanh函数缺点
1. 当输入较大或较小时，输出几乎是平滑的且梯度较小，这不利于权重更新。
2. tanh 函数需要进行指数运算，因此计算复杂度较高。
3. 当神经网络的层数增多时，由于反向传播中链式求导的累乘效应，函数进入饱和区（梯度接近于零的区域）会导致梯度逐层衰减，出现梯度消失现象。

pytorch 中的tanh函数

代码

python 复制代码

import torch

f = torch.nn.Tanh()
x = torch.randn(2)
tanh_x = f(x)

print(f"x: \n{x}")
print(f"tanh_x:\n{tanh_x}")

"""输出"""
tensor([-0.2992,  0.5793])
tanh_x:
tensor([-0.2905,  0.5222])

tensorflow 中的tanh函数

代码

python: 3.10.9

tensorflow: 2.18.0

python 复制代码

import tensorflow as tf

f = tf.nn.tanh
x = tf.random.normal([2])

tanh_x = f(x)

print(f"x: \n{x}")
print(f"tanh_x:\n{tanh_x}")

"""输出"""
x: 
[-0.26015657 -0.9198781 ]
tanh_x:
[-0.25444195 -0.72583973]