A shared memory system consists of a number of processes with small local memory (cache), with are connected to common (shared) memory modules via a bus or a system of crossbar switches. All communication between processes is handled via the shared memory. Synchronisation is needed to avoid simultaneous writing access and to separate interdependent program parts.
A distributed memory system consists of a higher number of independent processors with sufficiently big local main memory, which are interconnected via hardlinks, a bus or any other interconnection system. All communication must be explicitly programmed, program data and initial parameters must be distributed and the results must be regathered.
Since the communication is significantly slower than in shared memory systems, the computation/communication ratio of a problem must be sufficiently high to allow an efficient parallel implementation.
Specialised hardware as neural chips reflect the structure and
the interdependencies of a certain algorithms in their design and
thereby minimise any synchronisation and communication overhead.
They are therefore able to apply parallelism at very low level as
e.g. at the update of a single neurone where the
computation/communication ratio is of the order .