next up previous contents
Next: A Combined Algorithm Up: Parallelism Previous: Parallel Architectures   Contents

Subsections

The Parallel Genetic Algorithm

This section describes the distributed memory version of a parallel genetic algorithm for the training of neural networks as it has been implemented for a transputer network of 16 T800's.

Distribution of the Workload

The genetic algorithm consists of two main parts, the evaluation of the fitness and the fitness based selection and generation of a child population. The fitness evaluations for each individual are totally independent and can be thus be done in parallel with each processor holding his share of the global population.

The selections and the generation of a child population by genetic operators are also independent from each other, but there are three problems to exploit this parallelism on a distributed memory system:

  1. The selection must be based on the fitness data of the whole population.
  2. The crossover operator works on individuals selected form the whole population.
  3. The selection is not necessarily equally distributed between the local populations on each processor and can unbalance the workload.

It is therefor necessary to collect the individuals with their associated errors from all processors to one master processor, which performs those steps and redistributes the child population again in equal portions. 3


The Computation/Communication Ratio

For a $n$-$m$-$o$ network with a training set of $t$ patterns and an encoding in chromosome strings of the length $l$, the computation/communication ratio $r$ is given by


\begin{displaymath}r={T^{\it comp} \over T^{\it comm}}=
{O(t((n+1)m+(m+1)o)) \over O(l)} =
{O(t(n+o)m) \over O((n+o)m)} = O(t) \end{displaymath}

The ratio between the (parallel) error calculation and the (serial) selection and genetic operations is also of order $(t)$. Since $t$ is normally sufficiently high, this should allow an efficient parallel implementation. Fig. 2 gives the speedup graph for the parallel implementation of the encoder/decoder problem (Section 6.2.1).

Figure 2: Speedup for parallel ENC/DEC-problem
\begin{figure}
\centerline {\fbox{\epsffile{speedup12.ps}}} \small\end{figure}

To further improve the computation/communication ratio, it is possible to perform the global selection not for every new generation, but only after a certain number of local selections. This possibility is provided as an option in the parallel implementation. However, due to the negative impact on global covergence, the use of this method brings hardly any speedup.

Finding a Topology

The genetic algorithm involves only communications between master and slaves, which should be reflected in an appropriate topology of the transputer network. If each processor can be directly linked to $n$ other processor, then the optimal topology is an $(n-1)$-ary tree with the master as root. Since the used T800's have four hardlinks, a ternary tree was used. To avoid unnecessary idle times, also the master works on his own local population. (In the parallel implementation for this project even the same sourcecode is used for master and slaves.)



Footnotes

... portions.3
With very low crossover rates, very long chromosome strings and a simple and fast fitness function, a more sophisticated distribution scheme with ``communication on demand'' might bring some additional speedup.

next up previous contents
Next: A Combined Algorithm Up: Parallelism Previous: Parallel Architectures   Contents

(c) Bernhard Ömer - oemer@tph.tuwien.ac.at - http://tph.tuwien.ac.at/~oemer/