Natural Computation Methods for Machine Learning Note 03
Pattern recognition and the Perceptron
This course, I learnt basic pattern recognition , perceptron and an overview of how to train/adjust the perceptron.
Pattern recognition
Here are some basic terms we should know.
pattern recognition = feature extraction + classification
Feature extraction = find "good" feature to classify => feature vector, X(This is very sensitive to assumptions )
Classification = Find a discriminant that separates the classes(There is an infinite number of solutions)
Example: Nearest neighbour classifiers
Classify the unknown sample (vector) X to k nearest classes.
How do we measure the distance for 'nearest' classes?
Distance measures
Define distance between two vectors
a = (a_1,a_2,\cdots, a_n) \ and \ b=(b_1,b_2,\cdots,b_n)
l_p \ norm
l_p(\bar{x}) = \left( \sum_{i=1} x_i^p \right)^\frac{1}{p}
Specially, l_2 = Euclidean distance. l_1 = city block (Manhattan) distance
When the perceptron is a classifier, we have
f=f_h(s)=\begin{cases}
1& \text{if } s>0\
0& \text{if } s \leq 0
\end{cases}
S =\sum_{i=1}^n w_ix_i-\theta = \sum_{i=0}^n w_ix_i \text{where} \begin{cases}
x_0=-1\
w_0=\theta
\end{cases}
\theta is the bias/ threshold, \sum_{i=0}^n w_ix_i \text{where} \begin{cases}
x_0=-1\
w_0=\theta
\end{cases} is the augmented vector notation. This defines a hyperplane n-dimension in input space.
Let us consider a hyperplane example (2D):
S =\sum_{i=1}^n w_ix_i-\theta =w_1x_1+w_2x_2-\theta
The discriminant found by setting S=0, we have w_1x_1+w_2w_2-\theta = 0 \Rightarrow x_2 = \frac{\theta-w_ix_i}{w_2} = -\frac{w_1}{w_2}x_2+\frac{\theta}{w_2}=kx+m. This is a line.
\TODO there should be 3 figures.
Conclusion: The weights define the position and slope of the line (in the general case, a hyper plane). Threshold(\theta) moves the hyperplane.
Training(adjusting the line automatically)
We have a number of pairs (x, d) of feature vectors (x) an desired responses (d).
For each such pair and perceptron output y.
If y=d, do nothing
If y=0, d=1, Reinforce the connections (to increase the weighted sum).
If y=1, d=0, Weaken the connections (to decrease the weighted sum).
(Reinforce/inhibit = add/subtract the corresponding input value.)
But when to stop?
multiply the weight change by a gain factor/learning rate/step length \eta, where 0\leq \eta \leq1 \Rightarrow \Delta w_i=g\delta x_i where \delta = d-y
Perceptron
The algorithm converges to an optimal discriminant in a finite number of steps, if such a discriminant exists.(not always exist).
Limitations
Linear separability, e.g. XOR
In the following figure, there are multiply perceptrons.
文章评论