你可以在Github 找到下面完整的代码(A naive example)。
今天的目标是实现一个Dense隐藏层网络,来实现一个二分类器。
数据实例
首先我们从一个分类问题开始。我们对同方差不同均值的两个二维高斯分布进行抽样,生成两簇数量各是200的数据集。
import numpy as np
import matplotlib.pyplot as plt
from Dense import Dense
Asamples = np.random.multivariate_normal([6, 6], [[1, 0], [0, 1]], 200)
Bsamples = np.random.multivariate_normal([1, 1], [[1, 0], [0, 1]], 200)
plt.figure()
plt.plot(Asamples[:, 0], Asamples[:, 1], 'r.')
plt.plot(Bsamples[:, 0], Bsamples[:, 1], 'b.')
plt.show()
如图下图所示,两簇数据集中,一个聚集在左下角,另一个聚集在右上角。我们将左下角的数据集定义为类别1,将右上角的数据集定义为类2。
现在我们就有了分类问题所需的数据特征x_{train}和标签y_{train}
x_train = np.vstack((Asamples, Bsamples))
y_train = np.append(np.ones(200), np.zeros(200))
x_{train}的大小为2 \times 400
y_{train}的大小为1\times 400
很显然,我们可以肉眼分类上图所示的数据点。但是我们的目标是实现一个简单的神经网络来分类上面数据。
神经网络的基本要素
我们思考,能够神经网络来分类需要实现什么?
首先,我们能够使用训练数据训练它。其次,还能用训练好地神经网络来分类测试数据。
那么,我们需要实现
def train(self, x_train, y_train, learning_rate=0.01):
""" Train neural network"""
pass
def evaluate(self, x_test, y_test):
""" Evaluate test data"""
pass
接着,神经网络需要由一个合适的数据结构实现。我们先笨拙地选择class实现。
class Dense:
def __init__(self):
""" Do something """
pass
def train(self, x_train, y_train, learning_rate=0.01):
""" Train neural network"""
pass
def evaluate(self, x_test, y_test):
""" Evaluate test data"""
pass
一切看起来清晰很多。那么就是具体的技术细节了。我们需要定义网络的结构,定义网络的训练方式。
针对具体的问题,2维的输入变量被映射到1维的输出变量。我们设计如下图所示的神经网络
H^1 = h(W^1 x + b^1)
Z = W^3 H^1 +b^2
g = \frac{1}{1+e^{-z}}
我们定义如下参数并初始化。
# offset in first layer
self._b11 = np.random.rand(1)
self._b12 = np.random.rand(1)
self._b13 = np.random.rand(1)
self._b14 = np.random.rand(1)
# x1 to 4 hidden nodes
self._w111 = np.random.rand(1)
self._w112 = np.random.rand(1)
self._w113 = np.random.rand(1)
self._w114 = np.random.rand(1)
# x2 to 4 hidden nodes
self._w121 = np.random.rand(1)
self._w122 = np.random.rand(1)
self._w123 = np.random.rand(1)
self._w124 = np.random.rand(1)
# 4 hidden nodes to output node
self._w211 = np.random.rand(1)
self._w221 = np.random.rand(1)
self._w231 = np.random.rand(1)
self._w241 = np.random.rand(1)
# offset in second layer
self._b2 = np.random.rand(1)
self._logits = np.random.rand(1)
self._h1 = np.random.rand(1)
self._h2 = np.random.rand(1)
self._h3 = np.random.rand(1)
self._h4 = np.random.rand(1)
self._o1 = np.random.rand(1)
Ok, 我们有了网络的结构,下来就是训练的设置了。我们还需要
- Loss function 用来描述网络性能差距
-
ForwardPropagation 用来从input开始到output计算网络的参数值
-
BackPropagation 用来从output到input依据导数的链式法则更新参数值
- Learning_rate 用来控制参数更新的步长
这是一个二分类问题,我选择Binary cross entropy作为损失函数。
L(\mathbf{y}, \mathbf{p})=-\sum_{m=1}^{M} y_{m} \ln g_{m}
可以写作 L=-y \log (\hat{y})+(1-y) \log (1-\hat{y})
def _ForwardPropagation(self, x):
self._x1 = x[0]
self._x2 = x[1]
self._h1 = max(0, self._w111 * x[0] + self._w121 * x[1] + self._b11)
self._h2 = max(0, self._w112 * x[0] + self._w122 * x[1] + self._b12)
self._h3 = max(0, self._w113 * x[0] + self._w123 * x[1] + self._b13)
self._h4 = max(0, self._w114 * x[0] + self._w124 * x[1] + self._b14)
self._logits = self._b2 + self._w211 * self._h1 + self._w221 * self._h2 + self._w231 * self._h3 + self._w241 * self._h4
self._pred = sigmoid(self._logits)
这里要注意,我在实现时,发现如果显示计算 \frac{\partial{Loss}}{\partial{\hat{y}}} = -\frac{y}{\hat{y}}+\frac{1-y}{1-\hat{y}} 可能会出现分母为0的情况(y = \frac{1}{1+e^{-z}} 在1或者0出饱和)。输出结果会变成 Nan
。这是不可接受的。我查阅资料后发现,大家普遍的做法是直接求\frac{\partial{Loss}}{\partial{z}}, 因为f^{\prime}(z)=f(z)(1-f(z)), 则有\frac{\partial{Loss}}{\partial{z}} = \hat{y} - y。 这样我们就规避了分母为0的计算。按照链式法则,我设计如下的BP代码。
def _BackPropagation(self, y_true, learning_rate):
d_L_d_logits = self._pred - y_true
d_L_d_W211 = d_L_d_logits * self._h1
d_L_d_W221 = d_L_d_logits * self._h2
d_L_d_W231 = d_L_d_logits * self._h3
d_L_d_W241 = d_L_d_logits * self._h4
d_L_d_b2 = d_L_d_logits
d_L_d_h1 = d_L_d_logits * self._w211
d_L_d_h2 = d_L_d_logits * self._w221
d_L_d_h3 = d_L_d_logits * self._w231
d_L_d_h4 = d_L_d_logits * self._w241
d_h1_d_h1 = 1
d_h1_d_w111 = d_h1_d_h1 * self._x1
d_h1_d_w121 = d_h1_d_h1 * self._x2
d_h1_d_b11 = d_h1_d_h1
d_h2_d_h2 = 1
d_h2_d_w112 = d_h2_d_h2 * self._x1
d_h2_d_w122 = d_h2_d_h2 * self._x2
d_h2_d_b12 = d_h2_d_h2
d_h3_d_h3 = 1
d_h3_d_w113 = d_h3_d_h3 * self._x1
d_h3_d_w123 = d_h3_d_h3 * self._x2
d_h3_d_b13 = d_h3_d_h3
d_h4_d_h4 = 1
d_h4_d_w114 = d_h4_d_h4 * self._x1
d_h4_d_w124 = d_h4_d_h4 * self._x2
d_h4_d_b14 = d_h4_d_h4
# Update parameters
self._w111 -= learning_rate * d_L_d_h1 * d_h1_d_w111
self._w121 -= learning_rate * d_L_d_h1 * d_h1_d_w121
self._w112 -= learning_rate * d_L_d_h2 * d_h2_d_w112
self._w122 -= learning_rate * d_L_d_h2 * d_h2_d_w122
self._w113 -= learning_rate * d_L_d_h3 * d_h3_d_w113
self._w123 -= learning_rate * d_L_d_h3 * d_h3_d_w123
self._w114 -= learning_rate * d_L_d_h4 * d_h4_d_w114
self._w124 -= learning_rate * d_L_d_h4 * d_h4_d_w124
self._b11 -= learning_rate * d_L_d_h1 * d_h1_d_b11
self._b12 -= learning_rate * d_L_d_h2 * d_h2_d_b12
self._b13 -= learning_rate * d_L_d_h3 * d_h3_d_b13
self._b14 -= learning_rate * d_L_d_h4 * d_h4_d_b14
self._w211 -= learning_rate * d_L_d_W211
self._w221 -= learning_rate * d_L_d_W221
self._w231 -= learning_rate * d_L_d_W231
self._w241 -= learning_rate * d_L_d_W241
self._b2 -= learning_rate * d_L_d_b2
这些代码看起是相当的hard code。 我会在后面逐步优化他们。
到这里,我们就可以实现一个train function了。Batch gradient descent 被选择作为学习策略。这里数据量并不多,我暂时不实现SGD 😋. 我们设置学习率为0.01, epoch为100。在每一个epoch中,我们对所有的x_train进行 forward propagation和back propagation. 并在每隔10 epochs后计算整个数据集上的loss function value.
def train(self, x_train, y_train, learning_rate=0.01, epochs=100):
for epoch in range(epochs):
for i, x in enumerate(x_train):
self._ForwardPropagation(x)
self._BackPropagation(y_train[i], learning_rate)
if epoch % 10 == 0:
preds = np.zeros(y_train.shape[0])
for i in range(y_train.shape[0]):
preds[i] = self._feedforward(x_train[i])
loss = binary_cross_entropy(y_train, preds)
print("[Epoch %2d] loss : %.3f" % (epoch, loss))
evaluate function就更容易了。这里我采用accuracy作为evaluation的measure。
def evaluate(self, x_test, y_test):
preds = np.zeros(y_test.shape[0])
for i in range(y_test.shape[0]):
preds[i] = self._feedforward(x_test[i])
num = 0
for i in range(x_test.shape[0]):
y = 0
if 0.5 <= preds[i] < 1:
y = 1
elif preds[i] >= 0:
y = 0
if y_test[i] == y:
num += 1
accuracy = num/x_test.shape[0]
print("[Train accuracy: %.3f]" % accuracy)
测试
在x_train, y_train数据集上,我得到如下的结果
[Epoch 0] loss : 4.786
[Epoch 10] loss : 0.061
[Epoch 20] loss : 0.011
[Epoch 30] loss : 0.006
[Epoch 40] loss : 0.004
[Epoch 50] loss : 0.004
[Epoch 60] loss : 0.003
[Epoch 70] loss : 0.003
[Epoch 80] loss : 0.002
[Epoch 90] loss : 0.002
[Epoch 100] loss : 0.002
[Train accuracy: 1.000]
看起来似乎不错,100 epochs后网络学到了分类两个高斯分布数据集的能力。
文章评论