木叶下

  • 编程算法
  • 深度学习
  • 微小工作
  • 善用软件
  • 杂记
  • 诗人远方
南国羽说
文字记录生活
  1. 首页
  2. 深度学习
  3. 正文

Natural Computation Methods for Machine Learning Note 05

2020年2月9日 4249点热度 0人点赞 0条评论

Natural Computation Methods for Machine Learning Note 05

Contents hide
1 Natural Computation Methods for Machine Learning Note 05
1.1 Training set size
1.1.1 What to do if we have too little data?
1.2 Network size
1.3 Optimizing for speed
1.3.1 Resilient backpropagation (RPROP)

Let's continue talk about overtraining.

Training set size

The number of training set samples should be much larger than the number of weights roughly for ​ \(N^2\)(N = the number of weights).

What to do if we have too little data?

  • minimize the number of nodes and layers(weights)
  • noise injection and noise to the input
  • k-fold cross validation: split data in k parts (of equal size)

                             for all sub-sets:

                                   train all other sub-sets

                                   test on set-i

  • generalization measure: average error over the k test.

k=n, it is leave-one-out

Early stopping

  • training set (larger) "reset usually"
  • validation set (smallest)
  • test set (smaller) to test the generalized ability

Requires lots of data. This is one of many regularization techniques

Network size

With sufficiently many hidden layers and nodes, the MLP can approximate any function to any degree of accuracy.

How many layers? One hidden layer is sufficient in theory! In practice, for some problems the required number of nodes in this layer can be very large. Two hidden layers are therefore sometimes used, which drastically reduces the required number of nodes in each layer. To get a feeling for how many nodes we need in the hidden layer, we must get a feeling for what the hidden layer does.

How many nodes?

  • In classification, the hidden nodes form discriminants. With sigmoidal nodes, the lines (hyperplanes) of a classifier become fuzzy and the corners rounded. It is possible to approximate a circle with three hidden nodes.
  • In regression/function approximation, the hidden nodes correpond to monotonic regions in the function.

\TODO There should be a picture.

 

The ability to represent a function is not a guarantee that this function can be found by training! The required number of hidden nodes is greater in practice than in theory!

ANNs for classification should have sigmoidal outputs, while ANNs for functions approximation should have linear outputs (hidden layer still nonlinear).

Optimizing for speed

\eta \ \begin{cases} step \ length \\step \ size \\ learning \ rate \end{cases}

Idea: Increase training speed, by on-line adjustments of the step length, either globally or individually for each weight

Examples:

  1. Backprop with a momentum term.
  2. Start with large ​ and reduce it over time.
  3. Consider the gradient history. Has it changed a lot lately? If so, decrease the ​, else increase it.

Resilient backpropagation (RPROP)

Requires epoch learning.

Adaptive \(\eta\), local for each weight.

Idea: \(\frac{\partial E}{\partial w_{j i}}\) decides direction only(i.e we only consider its sign).

The step length is instead decided by a new parameter \(\Delta ji\) (individual for each weight, replacing \(\eta\)):

\Delta w_{j i}=-\Delta_{j i} \operatorname{sign}\left(\frac{\partial E}{\partial w_{j i}}\right)

\(\Delta j i\) is updated (within a specified interval) so that:

  • If E’ keeps its sign the step length \(\Delta\) is increased by a factor \(\eta^+\)
  • If E’ changes sign \(\Delta\) is reduced by a factor \(\eta^-\) (and the weight change is discarded)

Effect: Accelerate down slopes. Decelerate when we (would have) overshot a
minimum.

 

 

 

 

 

 

标签: Backprop machine learning Natural Computation overfitting overtraining RPROP
最后更新:2020年2月9日

Dong Wang

I am a PhD student of TU Graz in Austria. My research interests include Embedded/Edge AI, efficient machine learning, model sparsity, deep learning, computer vision, and IoT. I would like to understand the foundational problems in deep learning.

点赞
< 上一篇
下一篇 >

文章评论

razz evil exclaim smile redface biggrin eek confused idea lol mad twisted rolleyes wink cool arrow neutral cry mrgreen drooling persevering
取消回复

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理。

文章目录
  • Natural Computation Methods for Machine Learning Note 05
    • Training set size
      • What to do if we have too little data?
    • Network size
    • Optimizing for speed
      • Resilient backpropagation (RPROP)

COPYRIGHT © 2013-2024 nanguoyu.com. ALL RIGHTS RESERVED.

Theme Kratos Made By Seaton Jiang

陕ICP备14007751号-1