木叶下

  • 编程算法
  • 深度学习
  • 微小工作
  • 善用软件
  • 杂记
  • 诗人远方
南国羽说
文字记录生活
  1. 首页
  2. 深度学习
  3. 正文

Natural Computation Methods for Machine Learning Note 06

2020年2月12日 4207点热度 0人点赞 0条评论

Natural Computation Methods for Machine Learning Note 06

Contents hide
1 Natural Computation Methods for Machine Learning Note 06
1.1 Automatic dimensioning
1.1.1 Weight decay (pruning)
1.1.2 The Upstart algorithm (growing)
1.2 Second order methods
1.2.1 Quickprop
1.2.2 No free lunch theorem
1.2.3 Preprocessing
1.3 Exploiting prior knowledge
1.4 Multitask learning (extra output learning)

In this course, we learn more on extensions. and prior knowledge.

Automatic dimensioning

We want to minimize the number of hidden nodes(=> less overfitting)

The approaches:

  1. start with a large networks and prune.
  2. start with a small networks and grow.

There are two example.

Weight decay (pruning)

Let each weight strive to zero during training.

w^{new}=(1-\varepsilon) w^{old}

Remove the 0-weights and retrain.

This is not only a pruning technique but also good to keep the weights small anyway.

Numerical reasons. Restricts the network. Reduces risk of overtraining.

The Upstart algorithm (growing)

Self dimensioning method for classification.

Idea: Create child nodes, trained separately to recognize when the parent node
makes mistakes.

An output node in a classifier network can make two types of mistakes:

E^+: The node’s value is 1, when it should be 0.

E^-: The node’s value is 0, when it should be 1.

For each output, create two children, x^+ and x^- , trained to recognize the cases
where the parent make mistakes of type E^+ and E^-, respectively. This is also a
classification problem (smaller).

x^+ is connected to the parent node with a large negative weight.
x^- is connected to the parent node with a large positive weight

If the children cannot solve this task, let them create their own children, etc.
Result: A finite “tree” of neurons. Severe risk of overtraining.

Tip! Fahlman’s cascade correlation algorithm Similar idea, but applicable to function approximation in general

Second order methods

Use second order information (how the slope changes over time).

Quickprop

Require Epoch learning

1.The error surface (the landscape) can be approximated locally by a
parabola(In Chinese:抛物线).

2.The change in the slope \frac{\partial E}{w_{ji}} from the previous step, is only due to
the change of w_{ji}.

Then, the current and previous slope together with the latest weight change \Delta w_{ji}
can be used to define a parabola. Jump directly to its minimum.

Very sensitive to choice of control parameters, but can be extremely fast

No free lunch theorem

(In Chinese: 没有免费午餐理论)

Averaged over all possible learning problems, no learning algorithm is better
than any other. They all perform the same. This includes random search.

  • There is always a catch. All modifications that seem to make things better,
    must have a drawback!
  • If your algorithm is worse than another on a subset of problems, you also
    know that there must exist another subset for which your algorithm is
    better.

Preprocessing

The choice of input and output representations is the problem. This is where the
user really solves the problem. (example: two-spirals)

spiral data set

  • Distribute your representations!
    Example: In classification, use as many output as there are classes. Trainusing target vectors where all elements are 0 except for the element corresponding to the current class (which is 1) One-hot. The network will approach a Bayesian classifier (output i will approximate P(C_i|x)
  • Any prior knowledge on statistical distributions, symmetries, etc, should
    be exploited in the pre-processing stage. Don't let NN learn something you already know.
    Normalization/scaling – see Engelbrecht. Remember, though, that this may
    affect performance (introduces bias).
  • Normalization example: Normalizing inputs makes the network more
    sensitive to fluctuations for some inputs over others. That bias may be the
    reason for normalizing (give small range values a chance) but it is easy to
    overcompensate.
  • Scaling example: Logarithms of target values make the network more
    sensitive (better) for low values than higher values.

Exploiting prior knowledge

Examples of prior knowledge types:

  1. Initial guess

    a. Choosing intial weight (or how to randomize them)

  2. Known decomposition of the problem into subproblems

    a. Preprocessing
    b. Network structure
    c. Multitask learning

  3. Constraints
    a. Modifying the objective function (and deriving a new learning rule)

  4. Regions
    a. Preprocessing
    b. Multitask learning

Multitask learning (extra output learning)

Have additional data which we think could help the networks.

Example, XOR is much easier (becomes linearly separable) if we add an extra
input ab. But the extra info then becomes required, also after training.

Solution: Add ab – the hint function – as an extra output instead! Network
trained to implement both functions (outputs) at once.

Multitask Learning for XOR + x

This figure is copyright by Olle Gällmo.

Restricts the freedom of the hidden layer, i.e. the number of models the network
can form of the original target function.

Effect: Faster training. Less overtraining. Less variance in both training time and
results. Required number of nodes closer to theory. Hint only needed during
training.

Example: XOR and function approximation experiments.

标签: machine learning Multitask Learning Natural Computation No free lunch
最后更新:2020年3月4日

Dong Wang

I am a PhD student of TU Graz in Austria. My research interests include Embedded/Edge AI, efficient machine learning, model sparsity, deep learning, computer vision, and IoT. I would like to understand the foundational problems in deep learning.

点赞
< 上一篇
下一篇 >

文章评论

razz evil exclaim smile redface biggrin eek confused idea lol mad twisted rolleyes wink cool arrow neutral cry mrgreen drooling persevering
取消回复

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理。

文章目录
  • Natural Computation Methods for Machine Learning Note 06
    • Automatic dimensioning
      • Weight decay (pruning)
      • The Upstart algorithm (growing)
    • Second order methods
      • Quickprop
      • No free lunch theorem
      • Preprocessing
    • Exploiting prior knowledge
    • Multitask learning (extra output learning)

COPYRIGHT © 2013-2024 nanguoyu.com. ALL RIGHTS RESERVED.

Theme Kratos Made By Seaton Jiang

陕ICP备14007751号-1