Natural Computation Methods for Machine Learning Note 03

2020年2月7日 2315点热度 0人点赞 0条评论

Natural Computation Methods for Machine Learning Note 03

Pattern recognition and the Perceptron

This course, I learnt basic pattern recognition , perceptron and an overview of how to train/adjust the perceptron.

Pattern recognition

Here are some basic terms we should know.

pattern recognition = feature extraction + classification

Feature extraction = find "good" feature to classify => feature vector, X(This is very sensitive to assumptions )

Classification = Find a discriminant that separates the classes(There is an infinite number of solutions)

Example: Nearest neighbour classifiers
Classify the unknown sample (vector) X to k nearest classes.

How do we measure the distance for 'nearest' classes?

Distance measures

Define distance between two vectors

a = (a_1,a_2,\cdots, a_n) \ and \ b=(b_1,b_2,\cdots,b_n)
l_p \ norm
l_p(\bar{x}) = \left( \sum_{i=1} x_i^p \right)^\frac{1}{p}

Specially, l_2 = Euclidean distance. l_1 = city block (Manhattan) distance
When the perceptron is a classifier, we have

1& \text{if } s>0\
0& \text{if } s \leq 0

S =\sum_{i=1}^n w_ix_i-\theta = \sum_{i=0}^n w_ix_i \text{where} \begin{cases}

\theta is the bias/ threshold, \sum_{i=0}^n w_ix_i \text{where} \begin{cases}
is the augmented vector notation. This defines a hyperplane n-dimension in input space.

Let us consider a hyperplane example (2D):

S =\sum_{i=1}^n w_ix_i-\theta =w_1x_1+w_2x_2-\theta

The discriminant found by setting S=0, we have w_1x_1+w_2w_2-\theta = 0 \Rightarrow x_2 = \frac{\theta-w_ix_i}{w_2} = -\frac{w_1}{w_2}x_2+\frac{\theta}{w_2}=kx+m. This is a line.

\TODO there should be 3 figures.
Conclusion: The weights define the position and slope of the line (in the general case, a hyper plane). Threshold(\theta) moves the hyperplane.

Training(adjusting the line automatically)

We have a number of pairs (x, d) of feature vectors (x) an desired responses (d).

For each such pair and perceptron output y.

If y=d, do nothing

If y=0, d=1, Reinforce the connections (to increase the weighted sum).

If y=1, d=0, Weaken the connections (to decrease the weighted sum).

(Reinforce/inhibit = add/subtract the corresponding input value.)

But when to stop?

multiply the weight change by a gain factor/learning rate/step length \eta, where 0\leq \eta \leq1 \Rightarrow \Delta w_i=g\delta x_i where \delta = d-y


The algorithm converges to an optimal discriminant in a finite number of steps, if such a discriminant exists.(not always exist).

Linear separability, e.g. XOR

In the following figure, there are multiply perceptrons.


Dong Wang

Master student of computer science at Uppsala University in Sweden. My primary research interests are deep learning, computer vision, federated learning and internet-of-things.