用TensorFlow进行逻辑回归(二)

逻辑回归的例子

逻辑回归是经典的分类算法。为了简单，我们考虑二分类。这意味着，我们要处理识别二个分类的问题，我们的标签为 0 或 1。我们要一个与线性回归不同的激活函数，不同的损失函数，神经元的输出略有不同。我们的目的是能构建模型能预测一个新的观察属于两个分类中的哪一个。对于输入 x 神经元应输出概率P (y = 1| x) 作为分类1.我们分类观察作为分类1,如果 P (y = 1| x) > 0.5, 或者分类 0,如果P (y = 1| x) < 0.5.

损失函数

作为损失函数，我们使用交叉熵. 对于一个观察函数为

对于多于一个观察损失函数是求和所有的观察

后面我们可以从头书写完整的逻辑回归，但是现在，tensorflow会为我们处理细节--微分，梯度下降，等的实现。我们只需要构建对的神经元。

激活函数

记住:我们想要神经元输出观察是分类 0或分类 1的概率。因些，我们想要激活函数输出0 到1的值。否则我们不能把它作为概率。对于逻辑回归，我们使用 sigmoid函数作为激活函数.

数据集

要构建一个有趣的模型，我们要使用修改版的 MNIST数据集。你可以在 http://yann.lecun.com/exdb/ mnist/找到相关信息。

MNIST数据集是大的手写字数据集可以用来训练我们的模型。 MNIST数据集包含70,000张图像。 "原始的黑白图像 (bilevel)来自NIST的大小被归一化以适合 20×20像素盒子而保留它们的比例.结果图像包含灰度水平作为归一化算法使用的 anti-aliasing技术的结果。图像集中于 28×28图像，通过计算像素质心，我后转换图像使中心在28×28 区域内" (来源: http://yann. lecun.com/exdb/mnist/).

我们的特征是每个像素的灰度值，所以我们有28 × 28 = 784个特征取值为0到255 (灰度值).数据集包含10个数字，从0到9. 用下面的代码，你可以准备数据以供下一节使用。通常，我们先导入必要的库.

#List3-46

from sklearn.datasets import fetch_mldata

然后加载数据.

mnist = fetch_mldata('MNIST original') X,y = mnist["data"], mnist["target"]

现在X包含输入图像且y是目标标签 (记住在机器学习里我们想要预测的值称为目标). 只要输入 X.shape就会得到 X的形状: (70000, 784). 注意 X 有70,000行 (每行是一张图像)和 784列 (每列是一个特征,或像素灰度值,本例中).我们检查一下数据集里有多少个数字.

#List3-47

for i in range(10):

print ("digit", i, "appears", np.count_nonzero(y == i), "times")

结果如下:

|-------|---|---------|------|-------|
| digit | 0 | appears | 6903 | times |
| digit | 1 | appears | 7877 | times |
| digit | 2 | appears | 6990 | times |
| digit | 3 | appears | 7141 | times |
| digit | 4 | appears | 6824 | times |
| digit | 5 | appears | 6313 | times |
| digit | 6 | appears | 6876 | times |
| digit | 7 | appears | 7293 | times |
| digit | 8 | appears | 6825 | times |
| digit | 9 | appears | 6958 | times |

定义一个函数来可视化数字是有用的.

#List3-48

def plot_digit(some_digit):

some_digit_image = some_digit.reshape(28,28) plt.imshow(some_digit_image, cmap = matplotlib.cm.binary, interpolation

= "nearest") plt.axis("off") plt.show()

例如，我们随机的作图 (见图 3-46).

plot_digit(X[36003])

图 3-46. 数据集里第 36,003个数字，它很容易的识别为 5

我们这里要实现的模型是二分类的简单的逻辑回归。所以数据集必须减少为两个分类，这种情况，两个数字。我们选择1和2.我们提取数据集，只呈现 1或 2. 我们的神经元会试图识别给定的图像是分类 0 (数字 1)还是分类 1 (数字 2).

X_train = X[np.any([y == 1,y == 2], axis = 0)] y_train = y[np.any([y == 1,y == 2], axis = 0)]

接下来，输入观察必须归一化. (注意，当你使用 sigmoid 激活函数时你不想让你的输入数据过大,因为有784个值.)

X_train_normalised = X_train/255.0

我们选择 255,因为特征是像素灰度值，取值为 0到255.后面我们讨论为什么要归一化输入特征。现在相信我这是必要的步骤。每一列，我们想要一个输入观察，每一行表示特征，所以我们要改变张量的形状

X_train_tr = X_train_normalised.transpose() y_train_tr = y_train.reshape(1,y_train.shape[0])

我们可以定义变量n_dim 包含特征数

n_dim = X_train_tr.shape[0]

现在时重点。数据集的标签为 1或 2 (它们只告诉你图像表示哪个数字). 然而，我们要构建我们的损失函数使用分类标签为0和1, 所以我们要改变 y_train_tr数组的尺度。

注意 当处理二分类时，记得检查标签的值。有时候，使用错误的标签 (不是 0 和 1)会浪费很多时间来理解模型为什么不工作。

y_train_shifted = y_train_tr - 1

现在所有表示 1的图像来标签 0, 所有表示2的图像有标签 1.最后，我们给变理合适的名称.

Xtrain = X_train_tr ytrain = y_train_shifted

图3-47展示了我们处理的数字.

图3-47. 从数据集随机选取的6个数字。相对的标签在括号里 (记住现在标签为0或 1).

tensorflow 实现

tensorflow实现并不难，与线性回归几乎相同。首先定义placeholders和variables.

tf.reset_default_graph()

X = tf.placeholder(tf.float32, [n_dim, None]) Y = tf.placeholder(tf.float32, [1, None])

learning_rate = tf.placeholder(tf.float32, shape=())

W = tf.Variable(tf.zeros([1, n_dim])) b = tf.Variable(tf.zeros(1))

init = tf.global_variables_initializer()

注意代码与上面的线性回归相同。但是我们必须定义不同的损失函数和不神经元输出 (sigmoid函数).

y_ = tf.sigmoid(tf.matmul(W,X)+b)

cost = - tf.reduce_mean(Y * tf.log(y_)+(1-Y) * tf.log(1-y_))

training_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

我们用sigmoid函数作为神经元的输出,使用 tf.sigmoid(). 运行模型的代码与线性回归相同.我们只改变函数名称.

def run_logistic_model(learning_r, training_epochs, train_obs, train_labels, debug = False):

sess = tf.Session() sess.run(init)

cost_history = np.empty(shape=[0], dtype = float)

for epoch in range(training_epochs+1):

sess.run(training_step, feed_dict = {X: train_obs, Y: train_labels, learning_rate: learning_r})

cost_ = sess.run(cost, feed_dict={ X:train_obs, Y: train_labels, learning_rate: learning_r})

cost_history = np.append(cost_history, cost_)

if (epoch % 500 == 0) & debug:

print("Reached epoch",epoch,"cost J =", str.format('{0:.6f}', cost_))

return sess, cost_history

我们运行模型并看一下结果.我们从学习速率0.01开始.

sess, cost_history = run_logistic_model(learning_r = 0.01,

training_epochs = 5000, train_obs = Xtrain, train_labels = ytrain, debug = True)

代码的输出如下 ( 3000 epochs后停止):

Reached epoch 0 cost J = 0.678598 Reached epoch 500 cost J = 0.108655 Reached epoch 1000 cost J = 0.078912 Reached epoch 1500 cost J = 0.066786 Reached epoch 2000 cost J = 0.059914 Reached epoch 2500 cost J = 0.055372 Reached epoch 3000 cost J = nan

发生什么了? 突然，某些点，我们的损失函数取nan (不是数字).看来模型在某个点工作得不好。如果学习速率太大，或初始化权重错误，你的值

y ˆ(i ) = P (y (i ) = 1|x (i ) ) 或能接按于零或1 ( sigmoid函数取值接近 0 或1 以于非常大的正或负的z值）.记住，在损失函数里，你有两个项tf.log(y_)和 tf.log(1-y_), 因为 log函数对于值零没有定义，如果 y_ 为 0或 1, 你会得到 nan, 因为代码试图评估tf.log(0).例如，我们可以运行模型使用学习速率 2.0.只要一个 epoch, 你的损失函数就得到nan值。很容易理解原因，如果你打印b 在第一个训练步前后。简单的修改你的模型代码，使用下面的版本 :

def run_logistic_model(learning_r, training_epochs, train_obs, train_ labels, debug = False):

sess = tf.Session() sess.run(init)

cost_history = np.empty(shape=[0], dtype = float) for epoch in range(training_epochs+1):

print ('epoch: ', epoch)

print(sess.run(b, feed_dict={X:train_obs, Y: train_labels, learning_rate: learning_r}))

sess.run(training_step, feed_dict = {X: train_obs, Y: train_labels, learning_rate: learning_r})

print(sess.run(b, feed_dict={X:train_obs, Y: train_labels, learning_rate: learning_r}))

cost_ = sess.run(cost, feed_dict={ X:train_obs, Y: train_labels, learning_rate: learning_r})

cost_history = np.append(cost_history, cost_)

if (epoch % 500 == 0) & debug:

print("Reached epoch",epoch,"cost J =", str.format('{0:.6f}', cost_))

return sess, cost_history

你得到下面的结果 (训练一个epoch):

epoch: 0

0.

-0.05966223

Reached epoch 0 cost J = nan epoch: 1

-0.05966223

nan

你看到b 从 0变到 -0.05966223然后再到 nan? 因此, z = wT X + b这为 nan, 然后y = σ (z) 也变为 nan, 最后损失函数是 y 的函数，结果为 nan. 这是因为学习速率过大。解决方案呢? 你应该试试不同的学习速率 (更少的值).

我们试试并得到更稳定的结果，在 2500 epochs之后.我们运行模型:

sess, cost_history = run_logistic_model(learning_r = 0.005,

training_epochs = 5000, train_obs = Xtrain, train_labels = ytrain, debug = True)

输出如下

Reached epoch 0 cost J = 0.685799 Reached epoch 500 cost J = 0.154386 Reached epoch 1000 cost J = 0.108590 Reached epoch 1500 cost J = 0.089566 Reached epoch 2000 cost J = 0.078767 Reached epoch 2500 cost J = 0.071669 Reached epoch 3000 cost J = 0.066580 Reached epoch 3500 cost J = 0.062715 Reached epoch 4000 cost J = 0.059656 Reached epoch 4500 cost J = 0.057158 Reached epoch 5000 cost J = 0.055069

再也没有nan输出了。你可以在图 3-48看损失数.要评估我们的模型，我们必须选择优化量度 (如前所述).对于二分类问题，分类量度是准确率 (记为 a ) 可以理解为结果与真实值的差的量度.数学上，它这样计算

要得到准确率，我们运行下面的代码. (记住我们分类观察i 为分类 0 如果 P (y(i) = 1| x(i)) < 0.5, 或者分类 1如果 P (y(i) = 1| x(i)) > 0.5.)
correct_prediction1 = tf.equal(tf.greater(y_, 0.5), tf.equal(Y,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction1, tf.float32)) print(sess.run(accuracy, feed_dict={X:Xtrain, Y: ytrain, learning_rate: 0.05}))

使用这个模型，我们得到准确率为 98.6%. 这对于一个神经元的网络不差。

图 3-48. 损失函数与 epochs，学习速率为0.005

你可以运行前面的模型更多的epochs (使用学习速度为 0.005) 。你会发现，在7000 epochs,又出现 nan 。解决方案是减少学习速率增加 epochs数。简单的解决方案是，每 500 减半学习速率，就会消除 nans.后面详细讨论相似的方法.