Natural Computation Methods for Machine Learning Note 03

2020年2月7日 4119点热度 0人点赞 0条评论

Natural Computation Methods for Machine Learning Note 03

Contents hide

1 Natural Computation Methods for Machine Learning Note 03

1.1 Pattern recognition and the Perceptron

1.2 Pattern recognition

1.2.1 Distance measures

1.3 Training(adjusting the line automatically)

1.4 Perceptron

Pattern recognition and the Perceptron

This course, I learnt basic pattern recognition , perceptron and an overview of how to train/adjust the perceptron.

Pattern recognition

Here are some basic terms we should know.

pattern recognition = feature extraction + classification

Feature extraction = find "good" feature to classify => feature vector, X(This is very sensitive to assumptions )

Classification = Find a discriminant that separates the classes(There is an infinite number of solutions)

Example: Nearest neighbour classifiers
Classify the unknown sample (vector) $X$ to $k$ nearest classes.

How do we measure the distance for 'nearest' classes?

Distance measures

Define distance between two vectors

$a = (a_1,a_2,\cdots, a_n) \ and \ b=(b_1,b_2,\cdots,b_n)$
$l_p \ norm$
$l_p(\bar{x}) = \left( \sum_{i=1} x_i^p \right)^\frac{1}{p}$

Specially, $l_2$ = Euclidean distance. $l_1$ = city block (Manhattan) distance
When the perceptron is a classifier, we have

$f=f_h(s)=\begin{cases} 1& \text{if } s>0\ 0& \text{if } s \leq 0 \end{cases}$

$S =\sum_{i=1}^n w_ix_i-\theta = \sum_{i=0}^n w_ix_i \text{where} \begin{cases} x_0=-1\ w_0=\theta \end{cases}$

$\theta$ is the bias/ threshold, $\sum_{i=0}^n w_ix_i \text{where} \begin{cases} x_0=-1\ w_0=\theta \end{cases}$ is the augmented vector notation. This defines a hyperplane n-dimension in input space.

Let us consider a hyperplane example (2D):

$S =\sum_{i=1}^n w_ix_i-\theta =w_1x_1+w_2x_2-\theta$

The discriminant found by setting $S=0$ , we have $w_1x_1+w_2w_2-\theta = 0 \Rightarrow x_2 = \frac{\theta-w_ix_i}{w_2} = -\frac{w_1}{w_2}x_2+\frac{\theta}{w_2}=kx+m$ . This is a line.

\TODO there should be 3 figures.
Conclusion: The weights define the position and slope of the line (in the general case, a hyper plane). Threshold( $\theta$ ) moves the hyperplane.

Training(adjusting the line automatically)

We have a number of pairs (x, d) of feature vectors (x) an desired responses (d).

For each such pair and perceptron output y.

If y=d, do nothing

If y=0, d=1, Reinforce the connections (to increase the weighted sum).

If y=1, d=0, Weaken the connections (to decrease the weighted sum).

(Reinforce/inhibit = add/subtract the corresponding input value.)

But when to stop?

multiply the weight change by a gain factor/learning rate/step length $\eta$ , where $0\leq \eta \leq1 \Rightarrow \Delta w_i=g\delta x_i$ where $\delta = d-y$