Natural Computation Methods for Machine Learning Note 13
Properties of EC
- EC is parallel search (many balls in the landscape).
- Want to prevent all individuals from converging to the same
solution (premature convergence – a common problem) - The task of mutation is to prevent this – to maintain diversity in the
population. To inject new material in the population.
- Want to prevent all individuals from converging to the same
- EC is unlikely to get stuck in local optima.
- due to the parallel search, and since it does not follow gradients
- mutation helps
- EC does not require the objective function (nor any other function) to be differentiable.
- The objective function (fitness) is used only in selection of individuals, not to decide how to change them.
- RL is perhaps more clever in its search (more directed), but EC compensates by its more global view.
EC (in particular GA) can be a very effective way to search for the best combination of parameters to control/define something.
An ANN is a parameterized system ...
EC and neural networks
EC (in particular GA) can be a very effective way to search for the best combination of parameters to control/define something.
An ANN is a parameterized system ==> Neuro evolution
- GA can be used to evolve, rather than train, artificial neural networks
- (crossover easily becomes destructive, though, explain why)
- Can evolve structure as well, not only weight values
-
Google this: NeuroEvolution of Augmented Topologies (NEAT)
- Could combine the two forms of learning – evolve initial network configurations and training parameters, then train the networks using neural learning algorithms (≈ Lamarckian evolution).
- EC does not have to follow natural laws – could have more than two parents, Lamarckian evolution (where learned experience is hereditary), etc.
EC Issues
- Generational or steady state? (last lecture taught the generational view)
- Population size, constant or variable (if so, how?)
- Introns, useful or not?
- The importance of mutation
- Is crossover constructive or just a form of macro mutation?
-
Crossover moves parts of genotypes – finding a good encoding under this operator can be difficult.
- Is a substring moved from A to B still meaningful in B?
- Evaluation (the fitness function) should be simple and efficient
- This is where EC spends most computation time
- Necessary to evaluate the whole population every generation?
Genetic programming
GP = EC where genotype = parse tree (of a programming language)
Example: The expression 5 + 3x
GP operates directly on (subsets of) computer programs => semi automatic programming.
Crossover and mutation in GP
Crossover: Swap randomly selected subtrees in the two parents
Many forms of mutation
- Function node: replace function with new of same arity
- Terminal node: replace terminal
- Swap: swap arguments of a function
- Grow: replace node by randomly generated subtree
- Gaussian: jog constants
- Trunc: replace node by terminal
Target languages
- Any language can be used, but Lisp is particularly suitable (explain why
- Assembler/machine code (AIM-GP by Nordin et al) (project idea)
- assembler instruction = integers
- store in an unsigned integer array
- cast to function pointer and call it
- Redcode (Corewars)
Most methods are very specialized for its particular target language
Decision trees (common)
Here, GP has been used to evolve computer network protocols
Other applications: See www.genetic-programming.org (not updated since 2007
though). geneticprogramming.com gives a better overview (but few applications)
Advanced topics
Automatically Defined Functions (ADF), how to deal with types, if-statements etc.
Grammatical evolution (GE) (Ryan & O'Neill)
GP methods often specific to their target language.
GE = language independent genetic programming
Takes a BNF grammar as input ==> can generate programs in any language!
Genetic operators, as in GA – the genotype is just a sequence of numbers
Notes
The meaning of codons (integers) is extremely context dependent.
Crossover destructive (variants that address this problem exist)
Possible application: Grammatical Swarm – Train the GE with a particle swarm instead of GA (has no crossover).
文章评论