Natural Computation Methods for Machine Learning Note 03

2020年2月7日 1611点热度 0人点赞 0条评论

Natural Computation Methods for Machine Learning Note 03

Pattern recognition and the Perceptron

This course, I learnt basic pattern recognition , perceptron and an overview of how to train/adjust the perceptron.

Pattern recognition

Here are some basic terms we should know.

pattern recognition = feature extraction + classification

Feature extraction = find "good" feature to classify => feature vector, X(This is very sensitive to assumptions )

Classification = Find a discriminant that separates the classes(There is an infinite number of solutions)

Example: Nearest neighbour classifiers
Classify the unknown sample (vector) X to k nearest classes.

How do we measure the distance for 'nearest' classes?

Distance measures

Define distance between two vectors

a = (a_1,a_2,\cdots, a_n) \ and \ b=(b_1,b_2,\cdots,b_n)
l_p \ norm
l_p(\bar{x}) = \left( \sum_{i=1} x_i^p \right)^\frac{1}{p}

Specially, l_2 = Euclidean distance. l_1 = city block (Manhattan) distance
When the perceptron is a classifier, we have

1& \text{if } s>0\
0& \text{if } s \leq 0

S =\sum_{i=1}^n w_ix_i-\theta = \sum_{i=0}^n w_ix_i \text{where} \begin{cases}

\theta is the bias/ threshold, \sum_{i=0}^n w_ix_i \text{where} \begin{cases}
is the augmented vector notation. This defines a hyperplane n-dimension in input space.

Let us consider a hyperplane example (2D):

S =\sum_{i=1}^n w_ix_i-\theta =w_1x_1+w_2x_2-\theta

The discriminant found by setting S=0, we have w_1x_1+w_2w_2-\theta = 0 \Rightarrow x_2 = \frac{\theta-w_ix_i}{w_2} = -\frac{w_1}{w_2}x_2+\frac{\theta}{w_2}=kx+m. This is a line.

\TODO there should be 3 figures.
Conclusion: The weights define the position and slope of the line (in the general case, a hyper plane). Threshold(\theta) moves the hyperplane.

Training(adjusting the line automatically)

We have a number of pairs (x, d) of feature vectors (x) an desired responses (d).

For each such pair and perceptron output y.

If y=d, do nothing

If y=0, d=1, Reinforce the connections (to increase the weighted sum).

If y=1, d=0, Weaken the connections (to decrease the weighted sum).

(Reinforce/inhibit = add/subtract the corresponding input value.)

But when to stop?

multiply the weight change by a gain factor/learning rate/step length \eta, where 0\leq \eta \leq1 \Rightarrow \Delta w_i=g\delta x_i where \delta = d-y


The algorithm converges to an optimal discriminant in a finite number of steps, if such a discriminant exists.(not always exist).

Linear separability, e.g. XOR

In the following figure, there are multiply perceptrons.


Dong Wang

A final year master's student in computer science at Uppsala University in Sweden. I am interested in deep learning, computer vision, and optimization. I am actively looking for Ph.D. position.