The cover drawings are by Karin
Svozil.
This book was typeset using the TEX/LATEX typesetting system, in
particular PostScript-based LATEX (PS-LATEX) by Mario Wolczko.
The fonts used in this book include Times Roman, Helvetica, Courier, ITC Avant
Garde Gothic Book, Symbol, AMS-fonts & D. E. Knuth's Punk. The output from
TEX was converted into the PostScript page description
language with dvips by Tomas Rokicki. In this process, graphics were directly
added from Mathematica in Encapsulated PostScript form.
ITC Avant Garde Gothic is a registered trademark of
International Typeface Corporation.
Mathematica is a registered
trademark of Wolfram Research, Inc.
MS-DOS is a registered trademark of
Microsoft Corporation.
PostScript is a registered trademark of Adobe Systems
Incorporated.
Times Roman, Helvetica are registered trademarks of Linotype
AG.
TEX is a trademark of the American Mathematical
Society.
All other product names are trademarks of their producers.
Recent findings in the computer sciences, discrete mathematics, formal logics and metamathematics have opened up a via regia for the investigation of undecidability and randomness in physics. A translation of these formal and abstract concepts yields a fresh look into diverse features of physical modelling such as quantum complementarity and the measurement problem, but also stipulates questions related to the necessity of the assumption of continua.
Any physical system may be perceived as a computational process. One could even speculate that physical systems exactly correspond to, and indeed are, computations of a very specific kind; with a particular computer model in mind. From this point of view it is absolutely reasonable to investigate physical systems with concepts and methods developed by the computer sciences.
Conversely, any computer may be perceived as a physical system; not only in the immediate sense of the physical properties of its hardware. Computers are a medium to virtual realities. The foreseeable importance of such virtual realities stimulates the investigation of an ``inner description,'' a ``virtual physics,'' if you like, of these universes of computation. Indeed, one may consider our own universe as just one particular realisation of an enormous number of virtual realities, most of them awaiting discovery.
Besides these issues, the intuitive terms ``rational'' (human thought), ``conceivable'' and so on, have been made precise by the concepts of mechanic computation and recursive enumeration. The reader may find these developments sufficiently exciting to go on and study this new field.
The first part of this book introduces the fundamental concepts. Informally stated, recursive function theory is concerned with the question of whether an entity is computable in a very precisely defined way. Algorithmic information theory deals with the quantitative description of computation, in particular with the shortest program length. Coding and suitable algebraic representation of physical statements are the prerequisites for their algorithmic treatment.
One motive of this book is the recognition that what is often referred to as ``randomness'' in physics might actually be a signature of undecidability for systems whose evolution is computable on a step-by-step basis. Therefore the second part of the book is devoted to the investigation of undecidability.
To give a flavour of the type of questions envisaged: Consider an arbitrary algorithmic system which is computable on a step-by-step basis. Then it is in general impossible to specify a second algorithmic procedure, including itself, which, by experimental input-output analysis, is capable of finding the deterministic law of the first system. But even if such a law is specified beforehand, it is in general impossible to predict the system behaviour in the ``distant future.'' In other words: no ``speedup'' or ``computational shortcut'' is available. These statements are consequences of two classical theorems in recursion theory, the recursive unsolvability of the rule inference problem and of the halting problem. In this approach, classical paradoxes can be formally translated into no-go theorems concerning intrinsic physical perception.
Certain self-referential statements like ``I am lying'' are paradoxical and resemble the absurd attempt of Freiherr von Münchhausen to rescue himself from a swamp by dragging himself out by his own hair. Such paradoxes can only be consistently avoided by accepting restrictions to the expressive power and to the comprehension of the associated methods and systems - with undecidability and incompleteness as consequence.
Complementarity is a feature which can be modelled by experiments on certain finite automata. This is due to the fact that measurement of one observable of the automaton destroys the possibility to measure another observable of the same automaton and vice versa. Certain self-referential measurements pursuit a similar attempt: on the one hand they pretend to render the ``true'' value of an observable, while on the other hand they have to interact with the object to be measured and thereby inevitably change its state.
It is important to distinguish between the ``intrinsic'' view of an observer, who is entangled with and who is an inseparable part of the system, and the ``extrinsic'' perspective of an observer who is not entangled with the system via self-reference. Indeed, the recognition of the importance of intrinsic perception, of a ``view from within,'' might be considered as a key observation towards a better understanding of undecidability and complementarity.
The third, last part of the book is dedicated to a formal definition of randomness and entropy measures based on algorithmic information theory.
Help and discussions with Matthias Baaz, Norbert Brunner, Cristian Calude, Gregory Chaitin, Anatol Dvurecenskij, Günther Krenn, Michiel van Lambalgen, Otto Rössler, Martin Schaller, Christoph Strnadl, Johann Summhammer, Walter Thirring and Anton Zeilinger are gratefully acknowledged. Nevertheless, all misinterpretations and mistakes are mine. Thanks go also to Jennifer Gan from World Scientific, who managed to guide me through the production of this book both kindly and smoothly.
À0 | cardinality of a countable infinite set, e.g., \N or \Q | |
À1 | cardinality of a continuum, e.g., \R or \C | |
Cn | n times continuously differentiable | pageref, |
\SS | lattice of closed subspaces of a Hilbert space \" | |
\D | automaton propositional calculus | |
L\vdash j | j can be derived in L | |
Æ | empty set, empty input sequence | pageref |
False, F, 0 | absurd proposition | |
H(s) | algorithmic information of object s | , |
H(s,t) | joint algorithmic information of objects s,t | |
HD | computational complexity | |
h(N;M) | information theoretic uncertainty, missing information | |
h | average algorithmic entropy per symbol | |
[`h] | metric entropy | |
\" | Hilbert space | pageref |
I(N;M) | Shannon information gain after N-M experiments | |
generic interface | ||
interface by sybolic exchange | ||
¥ | infinity | |
|x | | length of string or chain x, cardinal number of set x | , pageref, |
||x || | length of chain x, norm of x | pageref, pageref |
log2* x | iterated logarithm | |
l | Lyapunov exponent | , |
MO2,MOn | ``Chinese latern'' lattice | , |
\N | set of integers | |
# (s) | code of symbol s | pageref |
O(f) | of the order of f | |
w | ordinal number infinity | |
2w | set of infinite binary sequences | |
W | halting ``probability'' | |
\PP | lattice of all projections of a Hilbert space \" | |
P(s) | algorithmic ``probability'' to produce object s | |
P(s,t) | joint algorithmic ``probability'' to produce objects s,t | |
\Q | set of rationals | |
\R | set of reals | |
S(n) | busy beaver function | |
[ \tilde], [\D \tilde] | extrinsic entity, extrinsic automaton logic | , |
True, T, 1 | trivial proposition | |
U(p, s)=t , U(p)=t | universal computer, program p, input s, output t | pageref |
v(s) | partition of automaton states from input s | |
V | set of partitions of automaton states | |
Ú | logical or | |
Ù | logical and | |
Ø | logical not |
``iff'' stands for ``if and only if;'' i.e., for a necessary and sufficient condition.
`So I wasn't dreaming, after all,' she said to herself,
`unless-unless we're all part of the same dream. Only I do hope it is
my dream, and not the Red King's! I dont like belonging to another
person's dream,' she went on in a rather complaining tone: `I have a great mind
to go and wake him, and see what happens!'
This chapter should be understood as a very brief review of algorithmics and
the theory of recursive functions. For an elementary introduction, see, for
instance, David Harel's book Algorithmics []. Comprehensive treatments of
the subject are Hartley Rogers' Theory of Recursive Functions and Effective
Computability [] and P. Odifreddi's Classical Recursion Theory [].
Other introductions are M. Davis, Computability & Unsolvability [],
M. L. Minsky, Computation: Finite and Infinite Machines [] and part C of
the Handbook of Mathematical Logic [].
The present concept of formal computation has been developed from the
experience of practical ``manual'' calculation, at least up to some limited
resource level. How could the intuitive concept of ``manual'' or ``mechanic
computation'' be formalised? (In the computer sciences, what is called
``mechanic'' is often referred to as ``deterministic.'' In the physical
context, such a terminology may give rise to confusion with the term
``deterministic system,'' which will be characterised by some ``mechanically''
computable evolution function but not necessarily ``mechanically'' computable
parameter values.) Take, for instance, an intuitive, pragmatic understanding of
``mechanic computation,'' put forward by Alan M. Turing [] in the 30's: ``whatever can (in principle) be calculated on a
sheet of paper by the usual rules is computable.'' The question whether or
not this - or any other (possibly more sophisticated) - understanding of
``mechanic'' computation is adequate, has (at least) two aspects: (i)
syntax, investigated by formal logic, mathematics and the computer sciences and
(ii) physics.
One important result of syntactic arguments is the concept of ``universal
computation,'' as envisioned by K. Gödel, J. Herbrand, St. C. Kleene, A.
Church and, most influentially, by A. M. Turing. Universal computation is
usually developed by introducing a very precise (elementary) model of
computation, e.g., the Turing machine, a Cellular Automaton et cetera. It
is then shown that, provided it is ``sufficiently complex,'' this model
comprises all ``reasonable'' instances of computation. I.e., adding additional
components to such a machine does not change its computational capacity
qualitatively. (Yet it may change its efficiency in terms of time, storage
requirements, program length et cetera.) Furthermore, all such
``universal'' models of computation turn out to be equivalent in the sense that
there is a one-to-one translation of one into the other; i.e., every calculation
that can be performed by any one of them can be done by all of the others as
well. It is often assumed that universal computation remains unaffected by the
specific physical realisation of the computing device. I.e., it does not really
matter (in principle) what physical base you use for computing - doped silicon,
light, nerve cells, billiard balls - as long as your devices work
``mechanically'' and are ``sufficiently complex'' to support universal
computation.
Nevertheless, the syntax of universal computation is based on
primary intuitive concepts of ``reasonable'' instances of ``mechanic''
computation, which refer to physical insight; i.e., which refer to the
types of processes which can be performed in the physical world. In this sense,
the level of physical comprehension sets the limits to whatever is acceptable as
valid method of computation. If, for instance, we could employ an ``oracle''
(the term ``oracle'' will be specified later), our wider notion of ``mechanic''
computation would include oracle computation. Likewise, a computationally more
restricted universe would intrinsically imply a more restricted concept of
``mechanic'' computation. For an early and brilliant discussion of this aspect
the reader is referred to A. M. Turing's original work []. As D. Deutsch puts it
([], p. 101),
Consider, for example, electronic computing, which has transformed our
comprehension of computation dramatically, shifting it from ``number-crunching''
to symbolic calculation, visualisation, automation, artificial intelligence and
the creation of ``virtual realities.'' Yet, as long as electronic circuits act
as ``mechanic'' devices, electronics will not change our concept of universal
``mechanic'' computing. Thus electronic computing is not an example for the type
of physical process which would make necessary a revision of concepts of
computation.
The question remains whether such revisions will become necessary as our
abilities to stimulate and control physical processes advance. It remains to be
seen whether new capacities will become feasible by the use of the continuum in
classical physics (see below) or by quantum effects [,,,,,,,,] or more remote
possibilities. It can for instance be conjectured that quantum computation will
not enlarge the scope of computation because the enhanced physical capacities of
quantum mechanics will inevitably be washed away by noise; i.e., by the
randomness of quantum events and by the amplification of noise when information
in the quantum domain gets interpreted classically. This resembles the
``peaceful coexistence'' between quantum theory and relativity (cf. section , p.
).
We do not know whether our own self is limited by the ``mechanic''
computational capacities of the part of the world we presently cope with
scientifically. Kurt Gödel (Collected Works II [], p. 305) seemed to have
believed that the mind transcends mechanical procedures. See also P. Odifreddi
[], p. 113, and the remark by H. Weyl, quoted on page pageref.
Thus, if the term ``mechanic'' is a label referring to rules which completely
determine some ``reasonable'' actions of computation, then whatever is
considered as ``reasonable'' action of computation has to be specified also by
physics.
The following notion of ``algorithm'' or ``effective computation'' is
motivated by those types of tasks which can actually be performed in the
physical world. Therefore, its features are bounded (from above), excluding
computations in the limit of infinite time, storage space, program length et
cetera.
[Algorithm, effective computability] An effective computation, or
effective procedure C, is some ``mechanic'' procedure which can be
applied to any of a certain class of symbolic inputs s and which will eventually
yield, for some inputs, a symbolic output t. The process is written similar to
functional notation C(s)=t.
Remarks:
(i) An algorithm needs not to be defined for all input
values. This corresponds to a partial function, whose domain is not all
of \N (for details, see 1.2, p. pageref). In such a
case, the procedure might go on forever, never halting.
(ii) Although the input and output sequences as well as the size of
the ``scratch paper'' (internal memory) have to be finite, no
restriction is imposed on their actual sizes. The same is true for the length of
the description of the algorithm as well as its execution time. This renders the
definition hard to operationalise, since, for instance, relatively ``small''
programs may take a ``large'' time to execute and may produce ``huge'' output
(for details, see , p. ).
(iii) The above definition of algorithm and effective
computability is informal. The question if there exist formal notions
corresponding to this heuristic approach is nontrivial. In particular, the term
``mechanic'' has to be specified formally and sufficiently comprehensive. The
Church-Turing thesis (see below) gives a positive answer which, due to the
principal incompatibility between the informal, intuitive (``algorithm'') and
the formal (``recursive function'') approach, remains conjectural. To specify
this connection, a brief characterisation of the term ``recursive function''
will be given in the next section.
(iv) Analogue computation is not effective in the
above sense, since its feature of continuity does not correspond to any
meaningful ``mechanical'' process.
(v) The input can be eliminated by substituting a constant for each
input parameter, thereby converting, say, a FORTRAN statement such as
READ (*,*) A into
A=¼. In that way, input code can be
converted to program code, and vice versa. See also type-free
coding in chapter 4, p. pageref.
Since the present understanding of ``mechanic'' computation has been
developed from the computations which can be actually realised, it is
tautological that, indeed, there exist physical systems which are universal
computers (at least up to some finite computational resources). The circularity
of this argument is evident. A constructive ``proof'' of this conjecture is the
assembly of a ``universal'' computer (here the quotes refer to the finiteness of
the computer's resources) such as a personal computer workstation (insert your
favourite brand here:) ``¼'', Charles Babbage's
Analytical Engine [] et cetera. These computers are ``universal''
because it is possible to implement a program which simulates any other
(not necessarily universal) computer such as a Turing machine (with finite
tape). Their universality is only limited by the finiteness of their resources.
The notion of effective computation will now be formalised by
defining a specific model of computation which is (hopefully) comprehensive
enough to correspond properly to the heuristic concept of algorithm. [Partial
function, total function] The notion of
partial and total function is defined as follows: A partial
function f(x)=z, with x,z Î \N is a
singe-valued function, i.e., the functional value z is determined
uniquely. The domain of f is domain(f)={ x |
"z = f(x)} Ì \N . The term
partial expresses the fact that the domain of f may not be all
of \N. If domain(f)=\N, f is called a total
function. f is defined (convergent) at x if x Î domain(f), expressed by f(x)¯ ; otherwise f is undefined (divergent) at x,
expressed by f(x) .
It would have been more elegant to use the l-notation for characterising C (for details, see H. Rogers [], p. 6), but as
this notation is difficult to interpret for the unfamiliar reader, the standard
functional notation was used.
Example:
The following primitive recursive definition yields the Fibonacci
sequence Yet, primitive recursion, as introduced in 1.2.1, does not comprehend
the full realm of effective computability. It can be shown that there exist
algorithmic functions of the natural numbers which grow faster than any
primitive recursive function. This has been, for instance, discussed in depth in
Róza Péter's book Rekursive Funktionen [], p. 86, by C. Calude, S. Marcus
and I. Tevy [] and by H. Rogers [], p. 8. The first example of a recursive
function which is not primitive recursive was obtained independently by W.
Ackermann [] and the Romanian of Swiss origin G. Sudan []. It is interesting
that these examples are not just mathematical curiosities (they are very useful
in compiler testing, for instance), but as has been pointed out by C. Calude [],
most recursive functions (see definition 1.2, p. pageref) are
not primitive recursive, just as most reals are not recursive. A
particular example will be given next.
Counterexample:
One strategy for obtaining counterexamples of evidently recursive functions
which are not primitive recursive is to try to construct functions
which grow ``very fast.'' Consider, for instance, the function f(x)=A(x,x,x),
where the Ackermann generalised exponential A is defined by the
successive iteration Another, strategy is diagonalization (see below). Diagonalization is actually
a very similar strategy. Consider functions f1(n) <
f2(n) < f3(n) < ¼ and
g(n)=fn(n)+1. g grows faster than all f's. For, if g(n) would
correspond to some fi(n), then for n=i the contradiction
fn(n)=g(n)=fn(n)+1 (wrong) would follow.
In summary, the notion of primitive recursive functions turns out to be
too weak to correspond to the algorithmic notion of effective
computability.
Assume now that h were primitive recursive. If this would be the case, h
would turn up in the enumeration of the primitive recursive functions somewhere;
say at the yth place. Hence, One possible way to avoid this contradiction (and at the same time maintain
functional enumeration) is to assume that the functions gi are
partial functions. Thereby, if in the above construction,
gy(y) needs not be defined, the contradiction
``gy(y)=gy(y)+1'' needs not show up.
An equation is deducible from P if it is obtained in the course of
some computation from P. Imagine a standard list of all
equations deducible from P. Then given any P and any numerical input, the
principle output is defined to be the first numerical output
in the standard list generated from the set of instructions P. The resulting
relation between input and principal output defines a partial recursive
function fP, which can be associated with the original set of
instructions P. In this way one obtains the class of partial recursive
functions.
Remarks:
(i) The class of primitive recursive functions can be embedded into
the class of recursive functions. The extension of recursive functions over
primitive recursive functions consists in the addition of the operation of
seeking indefinitely through some standard list for the suitable equation.
(ii) One may suspect that the course of computation is not uniquely
determined by the input and by the recursive relations P. Moreover, two
different outputs may be obtained from the same input by different computations.
These difficulties are circumvented by defining the value of the function to be
the first occurrence in an effectively computable standard list.
(iii) The type-free representation of the recursive functions
by combinatory logic and, more specifically, by the lambda
calculus. In this representation, no difference is made between a
``function'' and its ``argument'' and ``value(s).''
(iv) There ``exist'' non recursive functions which grow too fast to be
effectively computable. One example is the busy beaver function SU(n) which, roughly speaking, is defined to be
the greatest number producible on a computer U by programs of program length
less than or equal to n. (For a precise definition, see , p. .) SU(n) is a non recursive function; nevertheless it
can also be used for upper bounds on the runtime of such programs; see , p. .
That this function grows ``very fast'' indeed can be seen from a program
evaluating the Ackermann generalised exponential A(n,n,n); with SU(n) >> A(n,n,n). A computer implementation
of A requires programs of ``small'' length, yet, for large n, A(n,n,n) is
considerably ``large.'' This applies to computer programs which are permitted to
have sufficient size to implement the Ackermann generalised exponential and
other ``fast growing'' functions. (On the contrary, a restriction to ``very
small size'' program codes of only a view bits length would not allow the
implementation of A(n,n,n) or similar, fast growing algorithms.) Indeed, if the
busy beaver function would be ``growing slowly,'' one would be able to solve all
halting problems of finite-size programs (p. ).
The set of primitive recursive functions is one-to-one enumerable. In the
same way, it is possible to enumerate all recursive functions, or, equivalently,
the sets of instructions P. This statement will not be proved here; informally
it may be seen as a direct consequence of an effective enumeration of all sets
of instructions P. I.e., every integer x can be associated with a partial
recursive function yx (corresponding to a
set of instructions Px) which is at the (x+1)'st place of this
enumeration. [Gödel number of recursive function] We shall use the method of Gödel numbers to derive the cardinality of the
class of recusive functions. The following statements
hold:
Informally stated, the non recursive enumerability of the total recursive
functions originates in the impossibility to recursively determine which
function is a partial function and which one is a total function. This is a
consequence of the recursive unsolvability of the halting problem (cf. section ,
p. ).
Proof:
Since all constant functions (with values in \N) are recursive and |\N | = À0, there are at least À0 constant functions. The Gödel numbering shows
that there are at most À0 partial
recursive functions. By a similar argument as in the section dealing with
diagonalization (p. pageref), the assumption of the
enumeration of all total recursive functions yields a contradiction.
A probably more intuitive, algorithmic, proof is the enumeration of all
Turing machines. (For a definition of a Turing machine, see 1.3.1, p. pageref.) Such an explicit
enumeration of Turing machines is, for instance, outlined in M. Davis' [],
chapter 4.)
Remarks:
(i) Unlike the primitive recursive functions (which are
total functions), the class of recursive functions contains
partial functions. Therefore, the argument against the enumeration of
all algorithmically definable functions using diagonalization (p. pageref) does not apply here.
Moreover, as will be seen from the recursive unsolvability of the halting
problem (see chapter , p. ), it is in general undecidable if a recursive
function converges or diverges at some particular argument.
(ii) The inverse of the Gödel function # may be partial or total. In
the above definition, #-1 is total. For an
example of a construction of the Gödel number # (t) corresponding to a term t,
see definition 4.3, p.
pageref. This
definition can be used to enumerate all recursive function with a slight change
in notation, i.e., whenever we have to code ``+'' or some other algebraic
operation allowed in a recursion we use the code for `` ~
'' or some other symbol which we dont use in this context. In this
example, #-1 is a partial function.
A somewhat related subject are mathematical proof methods which do not have
any ``constructive'' or algorithmic flavour, for instance the principle of the
excluded middle. One example is a proof of the following theorem: ``There
exist irrational numbers x,y Î \R - \Q with xy Î \Q.''
Proof: case 1: Ö2 Ö2 Î \Q ; case 2: Ö2 Ö2 Ï \Q, then Ö2 Ö2 Ö2 = 2 Î \Q. The question which one of the two cases is correct;
i.e., which number is rational, remains unsolved in the context of the proof.
Informally, recursive reals may be described as the real numbers whose
expressions as a decimal are calculable ``bit by bit'' by finite means, i.e., by
effective computations (c.f., []). This sloppy definition may give rise to
confusions; in particular the term ``bit by bit'' needs an explanation: One
might, for instance, conclude correctly that all rationals, all rational powers
of rationals (e.g., Ö2), Euler's number e, the number
p et cetera are computable. But then, one might
be tempted to wrongly infer that the halting probability W (p. ) is computable too, since G. Chaitin has published an
algorithm for computing W in the limit of infinite time
[]. Yet, one necessary condition for a real to be computable is that it
converges effectively (i.e., that there exists a computable radius of
convergence). - This is what is meant by the phrase ``bit by bit.'' This
criterion (effectively computable radius of convergence) is not satisfied by any
computation for W. For a detailed definition of
recursive reals, see definition 1.6, p. pageref, as well as
M. B. Pour-El and J. I. Richards, Computability in Analysis and Physics
[], as well as the forthcoming book Computability - A Mathematical
Sketchbook by D. S. Bridges [], among others.
The following statements hold:
Proof:
ad (i) Since all constant functions (with values in \N) are recursive
and |\N | = À0, there are at least À0 recursive reals. By theorem 1.2.4, there are at
most denumerable many (i.e., À0)
partial recursive functions (one may think of an enumeration of all Turing
machines).
ad (ii) Proof by diagonalization: assume that there exists an
effectively computable enumeration of the decimal reals in the interval [0,1] of
the form In contradistinction, there exists an effectively computable enumeration of
the rational numbers [ i/j], for instance by taking the counter diagonals
in ad (iii) By theorem 1.2.5(i), the set of
recursive reals is countable. The measure of a countable set of points on the
real line is zero. Therefore, the measure of the set of all uncomputable reals
is one. I.e., in the measure theoretic sense, ``almost all'' reals are
uncomputable. This can be demonstrated by the following argument: Let M={
ri} be an infinite point set (i.e., M is a set of points
ri) which is denumerable and which is the subset of a dense set.
Then, for instance, every ri Î M can be
enclosed in the interval Other denumerable point sets of reals are the rationals \Q and the
algebraic reals. (Algebraic reals x satisfy some equation
a0xn +a1xn-1+ ¼+an=0, where
ai Î \N and not all ai's vanish.)
Consequently, their measures vanish. The complement sets of irrationals
\R - \Q and transcendentals (non algebraic
reals) are thus of measure one [].
Despite such vague and probabilistic ideas as to ``pull out'' an arbitrary
element of the ``continuum urn'' (you might have to assume the axiom of choice
for doing that) which is then non recursive with probability one, it is hard to
come up with a concrete elementary example of a provable non recursive real. One
such example is the algorithmic halting probability W, which has been introduced by G. Chaitin [,] and which is
even provable random, has been defined by equation (), p. (see also , p. ).
Indeed, the algorithmic probability P(S), defined by equation (), p. ,
associated with an arbitrary recursively enumerable set S is provable non
recursive.
Another ``example'' is a (binary) real
r=0.r1r2r3¼ generated
by infinitely many coin tosses of a fair coin and by identifying
ri=0 or 1 for the outcome ``head'' or ``tail'' of the i'th coin toss,
respectively. r is non recursive with probability one and recursive with
vanishing probability.
Since the 1930's, other formal characterisations of functions corresponding
to computational processes have been proposed (see again Rogers' book [], p. 18
for details). Probably the most prominent definition of recursive functions has
been given by Alan Turing, who explicitly constructed a machine model (the
Turing machine) to formalise the notion of computation. So far, all these
definitions have turned out to be equivalent (i.e., there exist ``translations''
from one functional characterisation to another). We may therefore suspect that
the class of the recursive functions defined above is inclusive enough to be a
suitable characterisation of ``mechanical'' procedures; at least syntactically.
Likewise, a wide variety of functions, each agreed to be intuitively
algorithmic, have been studied. Each one of these functions turned out to be a
partial recursive function. The conjectural evidence of a connection between the
informal, heuristic notion of effective computability and the precise
notion of partial recursive function culminates in the Church-Turing
thesis (sometimes referred to as ``Church thesis''): [Church-Turing thesis]
(Vice versa, by our understanding of ``mechanic'' computation, every
partial recursive function corresponds to an algorithm. In other words: a
(partial) recursive function is a mathematical entity which can be defined by an
algorithm.
There is an equivalence between effectively computable algorithms
and partial recursive functions.) The terms effectively
computable, mechanically computable, computable and
recursively enumerable will be used synonymously.
Remarks:
(i) The technical (methodological) importance of this equivalence is
that proofs (sometimes referred to as ``proofs by Church's thesis'')
can be carried through by convenient informal methods associated with
algorithmics. At the same time, they have a concrete mathematical meaning,
since, by the Church-Turing thesis, they refer to the class of partial recursive
functions. Thereby, the formalised models of computation (such as the Turing
machine) may remain ``on the shelf.'' Indeed, most proofs in this and later
chapters will be ``proofs by Church's thesis.''
(ii) The nontrivial claim of the Church-Turing thesis is that
every conceivable effectively computable process corresponds to a
partial recursive function. The input corresponds to the function argument and
the output corresponds to the function value. The converse statement relating
recursive functions to effective computations is trivial, given our
understanding of ``mechanic'' effective computation. In summary, (iii) As has been already emphasized, the Church-Turing thesis
includes a physical as well as a syntactic claim. In particular, it specifies
which types of computations are physically realisable. As physical statements
may change with time, so may our concept of effective computation.
First we define the notion of diophantine equation: [Diophantine
equation] A predicate P(a1,¼,an) is called recursively enumerable if
there is an algorithm which, given the non-negative integers a1,¼,an, will eventually, e.g., by generating a list
of all n-touples satisfying P, discover that these numbers have the property P.
P is recursive if, in addition to that, an algorithm exists which will
eventually discover that these numbers do not have the property P.
The predicate P(a1,¼,an) is
called exponential / polynomial diophantine if P holds if and only if
there exist non-negative integers x1,¼,xm such that The following result is mentioned without proof; for a proof, see J. P. Jones
and Y. V. Matijasevic [], which is reviewed G. Chaitin []. A predicate is
polynomial / exponential diophantine if and only if it is recursively
enumerable. By the Church-Turing thesis, the following equivalence holds:
``diophantine equation'' º ``effectively
computable process.''
A set A Ì \N is called recursive if
both A as well as its complement \N -A are
recursively enumerable. We only mention here one important result of recursion
theory. There exists a set A Ì \N which is recursively
enumerable but not recursive. An example is the set of theorems of a
``sufficiently complex'' formal system arithmetic, or, equivalently, the outputs
of ``sufficiently complex'' infinite computations. See also chapter on page .
Furthermore, [Recursive inseparability] There exists a recursively
inseparable pair of sets; i.e., A,B Ì \N such that
In section 1.5, p. pageref, the ``mechanic'' (in
the computer sciences termed ``deterministic'') devices are followed by a
specific, conceptually important non deterministic device called
oracle. Oracles are capable of solving problems, such as the halting
problem (see , p. ), which are recursively unsolvable. A ``model'' for an oracle
problem solver will be given which, due to ``Zeno squeezing'' of its intrinsic
time scale, may perform computations in the infinite time limit.
As has already been mentioned before, universal computation is invariant
under changes of ``sufficiently complex'' machine models. In J. von Neumann's
words (see Theory of Self-Reproducing Automata, ed. by A. W. Burks [], p.
50):
Consider a set of instructions P of the type introduced for the definition of
recursive functions. Assume further some particular listing of the set
of instructions. Let Px be the set of instructions associated with
the Gödel number x=# (Px) (for details, see 1.2.4, p. pageref). Let jx stand for the function associated with
Px.
The following consideration yields an algorithmic definition of a
universal algorithm and, by the Church-Turing thesis, of a
universal recursive function u(x,y): given any two numbers x,y Î \N, find Px, for instance by going through the
enumeration of all sets of instructions until the (x+1)st place; then apply
Px to input y to calculate jx(y).
If jx(y) gets an output, take this output
value for u(x,y). Having done so, we have obtained an effectively computable
algorithm (associated with u) which yields jx(y) on inputs x,y. By this definition of u,
u(x,y) effectively imitates any jx(y). By the Church-Turing thesis, there exists a
z Î \N, such that the algorithm u corresponds to the
partial recursive function jz(x,y). This can
be stated as follows: [Existence of universal function]
from ``Through the Looking
Glass'' by Lewis Carroll []
Chapter 1
Algorithmics and recursive function
theory1.1 Algorithm and effective
computability
``The reason why we find it possible to construct, say,
electronic calculators, and indeed why we can perform mental arithmetic,
cannot be found in mathematics or logic. The reason is that the laws of
physics `happen to' permit the existence of physical models for the operations
of arithmetic such as addition, subtraction and multiplication. If they
did not, these familiar operations would be non-computable functions. We might
still know of them and invoke them in mathematical proofs (which
would presumably be called `non constructive') but we could not perform
them.''
For another discussion of this topic, see R. Rosen []
and M. Davis' book [], p. 11, where the following question is asked:
`` ¼ how can we ever exclude the
possibility of our presented, some day (perhaps by some extraterrestrial
visitors), with a (perhaps extremely complex) device or ``oracle'' that
``computes'' a non computable function?''
Indeed, the concept
of Turing degree [] yields an extension of universal computation to
oracle computation. In a way, the syntactic aspect of computation could thus be
``tuned'' to wider concepts of (physically realisable) computation if that turns
out to be necessary. As has been pointed out before, such a change cannot be
decided by syntactic reasoning; it has to be motivated by physical arguments.
Thus, at least at the level of investigation of ``reasonable'' instances of
computation, the theory of computation is part of physics.
An
algorithm is an abstract recipe of finite size, prescribing a process
which might be carried out ``mechanically'' by some computing agent, such as a
human or a computer and which terminates after some finite time.
1.2 Recursive functions
1.2.1 Primitive recursive
function
A first attempt towards a formal
characterisation of effectively computable functions is motivated by the
construction of the natural numbers \N by the iterated process of ``adding 1''
[]. By generalising this approach, one may consider a class of functions
obtained by recursive definitions, i.e., an arbitrary function is
itself a function of ``simpler'' functions. Thereby, the notion of ``simpler''
function is specified. Usually the ``simplest'' functions, among others, are
defined to be the constant functions, the successor functions, et cetera.
The corresponding class of primitive recursive functions can be defined
as follows: [Primitive recursive function]
The
class of primitive recursive functions is the smallest class C of functions f, g,¼ from
\N to \N such that
0,1,1,2,3,5,8,¼:
The
corresponding algebraic expression is given by
f(0)
=
1
f(1)
=
1
f(x+2)
=
f(x+1)+f(x) .
f(x)=
1
é
ë æ
è 1+Ö5
ö
ø x+1
-
æ
è 1-Ö5
ö
ø x+1
ù
û .
Another
recursive representation of the Ackermann function is (see [], p. 8):
A(0,x,y)
=
y+x (1.1)
A(1,x,y)
=
yx (1.2)
A(2,x,y)
=
yx (1.3)
A(3,x,y)
=
yy···y ü
ý
þ
x times (1.4)
:
A(z+1,x+1,y)
=
A(z,A(z+1,x,y),y) .
(1.5)
A(x,x,x)
is defined by using double nesting. Functions which grow even faster can be
obtained by higher order nesting. It can be shown that for sufficiently large x
Î \N, every primitive recursive function of x is less
than f(x) []. Nevertheless, the above definition of A is within the scope of a
reasonable definition of algorithm.
A(0,0,y)
=
y
A(0,x+1,y)
=
A(0,x,y)+1
A(1,0,y)
=
0
A(x+2,0,y)
=
1
A(z+1,x+1,y)
=
A(z,A(z+1,x,y),y) .
1.2.2 Diagonalization
Another example of an
algorithmically definable function which is not primitive recursive can be
constructed by a method called diagonalization (see also Cantor's
diagonalization method, chapter , p. ). In principle, it is possible to list and
label all primitive recursive functions by a finite algorithm. In doing so, one
could, for instance, derive all strings corresponding to primitive recursive
functions of length 1, 2, 3, and so on, and then label these functions by a
natural number. In this way, an effectively computable one-to-one function
between the natural numbers \N and the class C of primitive recursive functions can be defined.
Let gx be the xth function in this enumeration. Now define a
diagonalization function h by
Notice that for defining h at the
argument x we have used the xth function gx in the functional
enumeration. (This ``trick'' is the reason why the method is called
``diagonalization.'') Notice further that, since the addition of one is an
effectively computable task and since gx(x) can be obtained by an
effective computation, h must be effectively computable.
h(x)=gx(x)+1 .
(1.6)
But then, if we took h at y, by
combining equations (1.6)
and (1.7), we would
arrive at the contradiction
h=gy (1.7)
Hence, h cannot be primitive
recursive - diagonalization leads to the conclusion that the class of primitive
recursive functions does not include all algorithmically definable functions.
gy(y)=h(y)=gy(y)+1
(wrong)
. (1.8) 1.2.3 Partial recursive function
The following
notion of recursive function was proposed by St. C. Kleene [] (see also
Roger's book [], p. 16). Another, equivalent, definition has been proposed by A.
Turing [] on the basis of Turing machines (p. pageref). Turing himself
motivated its definition by arguing that it is a one-dimensional analogue of a
sheet of paper on which calculations could be performed by the usual rules.
Other equivalent definitions have been put forward by A. Church in the context
of lambda calculus and by K. Gödel (see also M. Davis [], p. 10).
[Partial recursive function]
Consider a set of instructions P,
consisting of recursive relations of a general kind. A computation is a
finite sequence of equations, beginning with P, where each of the following
equations is obtained from preceding equations by one of the three procedures:
1.2.4 Enumeration of partial recursive
functions
Px is the set of
instructions associated with the integer x in the fixed enumeration of all sets
of instructions. x=# (Px) is called index or Gödel
number of Px. # is the associated recursive Gödel function.
1.2.5 Existence of uncomputable
reals
One major goal of this book is the presentation of mathematical and
physical entities which are non recursive. Such entities may be non recursive
functions, numbers, sequences and problems which are unsolvable by recursive
functions.
Consider the real number formed by
the diagonal elements 0.r11r22r33¼. Now change each of these digits, avoiding zero and nine.
(This is necessary because reals with different digit sequences are identified
if one of them ends with an infinite sequence of nines and the other with zeros,
for example 0.0999¼ = 0.1000¼.) The result is a real r¢=0.r1¢r2¢r3¢¼ with rn¢ ¹ rnn which thus differs from each of the original
numbers in at least one (i.e., the ``diagonal'') position. Therefore there
exists at least one real which is not contained in the original enumeration.
Despite the assumption of the recursive enumerability of the recursive reals,
all the operations involved in the argument are straightforwardly computable.
Therefore, if this assumption were correct, r¢ should
be computable as well. Therefore, r¢ should show up
somewhere in the original enumeration. The fact that r¢
is not contained therein and the resulting contradiction allows only
one consistent consequence: an effectively computable enumeration of the
computable reals does not exist; any algorithmic attempt to enumerate the
recursively enumerable reals is incomplete.
r1=0.r11r12r13r14
¼
r2=0.r21r22r23r24
¼
r3=0.r31r32r33r34
¼
r4=0.r41r42r43r44
¼
:
(1.9)
starting from
the top left.
1
1
1
¼
2
2
2
¼
3
3
3
¼
4
4
4
¼
:
:
:
···
where d
may be arbitrary small (we choose d to be small enough
that all intervals are disjoint). Since M is denumerable, the measure m of these intervals can be summed up, yielding
I(i,d) =
[ri-2-i-1d , ri+2-i-1d] , (1.10)
From d® 0 follows m(M)=0.
å
i m( I(i,d)) = d
¥
å
i=1 2-i=d . (1.11) 1.2.6 Church-Turing thesis
Any algorithm corresponds to a partial
recursive function. In other words: whatever is felt to be effectively
computable can be brought within the scope of the class of partial recursive
functions.
recursive
trivial
Þ
nontrivial
Ü
effectively
computable
. 1.2.7 Computation = polynomial
equation
This section reviews a result by M. Davis, H.
Putnam and J. Robinson [], which has been strengthened by J. P. Jones and Y. V.
Matijasevic []. This result, given the Church-Turing thesis, can be informally
stated as ``any computation can be encoded as polynomial.'' More
precisely, any computation can be encoded in an exponential diophantine
equation.
A(n) polynomial (exponential) diophantine equation
L(x1,¼,xn)=R(x1,¼,xn) is build up from non-negative integer
variables x1,¼,xn and from
non-integer constants by using the operation of addition A+B, multiplication A·B
(and exponentiation AB). An example for an exponential diophantine
equation is an+bn=cn with a,b,c,n Î \N.
L(a1,¼,an,x1,¼,xm) = R(a1,¼,an,x1,¼,xm) .
1.2.8 Recursively enumerable ¹ recursive
The predicates recursively
enumerable and recursive will be defined for sets: A set A Ì \N is called recursively enumerable if A=Æ or A is the range of a (partial) recursively enumerable
function.
1.3 Universal computers
In this section certain classes of automata
will be specified which have become important for historic and theoretic
reasons. If not stated otherwise, the terms ``computing agent,''
``computer'' and ``automaton'' are synonyms. Universal
computers will be introduced as the class of automata on which it is
possible to implement recursive functions. Turing machines and
Cellular Automata are examples of this class. Indeed, these computer
models (e.g., the Turing machine model) can be used for a definition of
recursive functions. They provide one of the alternative definitions of the
partial recursive functions (c.f. section 1.2.3, p. pageref).
The importance of Turing's research is just this: that if you
construct an automaton [[A]] right, then any additional requirements about the
automaton can be handled by sufficiently elaborated instructions. This is only
true if A is sufficiently complicated, if it has reached a certain minimum
level of complexity. In other words, a simpler thing will never perform
certain operations, no matter what instructions you give it; but there is a
very definite finite point where an automaton of this complexity can, when
given suitable instructions, do anything that can be done by automata at all.
Assume arbitrary recursive functions jx. Then there exists an index z and an associated
universal partial function jz(x,y)=u(x,y) such that for all x and y
jz(x,y)=u(x,y)=
ì
í
î
jx(y)
if jx(y)¯
(convergent)
divergent
if jx(y) (divergent)
.
One immediate consequence of this theorem is that there exists a critical degree of ``computational strength'' of the P's beyond which all further complexity can be absorbed into increased program size (i.e., algorithmic information), increased use of memory and increased computation time (i.e., computational complexity).
[Universal Computer]
Any physical system or any device on which a
universal function u=jz can be implemented
is called universal computer, or universal machine. A
universal computer can compute all computable functions.
The notation U(p, s)=t will be used for a universal computer U with program p, input (string) s and output (string) t. Æ denotes the empty input or output (string). Furthermore, U(p,Æ)=U(p). Concatenation of programs and input/output strings is allowed. E.g., given two input strings s1=s11s12s13¼s1i and s2=s21s22s23¼s2j, a term denoted by ``s1,s2,'' or ``s1s2'' is treated by the automaton as the string s11s12s13¼s1is21s22s23¼s2j.
There exist several examples of universal computers, all being fictious because no finite machine can be universal. The most prominent device is the Turing machine [,,] consisting of a finite memory unit and an infinite tape (which is tessellated into a sequence of squares) on which information is read or written. As has been noted before, A. Turing motivated the Turing Machine by the one-dimensional version of a sheet of paper on which arbitrary calculations can be performed according to ``usual rules.''
[Turing machine] Assume discrete time cycles, labelled by 0,1,2,¼. A Turing machine is an agency or automaton with the following features:
At each moment of time, the machine situation is defined by (i) a particular machine state ai, by (ii) a particular position of the tape in the machine (i.e., which square is scanned), and (iii) by a particular printing of the whole tape.
In the active state, the Turing machine performs an act between the time t and t+1, consisting of three possible parts: (i) the writing of a tape symbol si¢ as the state of the square scanned at time t; subsequently accompanied by (ii) a tape shift, so that at the time t+1 the machine is either one square left of, or in the same, or one square right of the position scanned at t, denoted by L,C,R, respectively, and (iii) the choice of a new internal state aj¢ for the next time cycle t+1. Any machine act can therefore be described by the triple si¢ {
| |
| |
|
| |
| |
|
The pair siaj, or ij is called the machine configuration. The machine configuration determines the next act. If the configuration calls for a left move L but the scanned square is already the leftmost square of the tape, the machine goes into the passive state a0. There are (l+1)k active configurations. A particular Turing Machine is therefore specified by a machine table, showing for each active machine configuration which act has to be performed. (A scheme of a Turing machine is drawn in Fig. 1.1.)
Example:
The following example is taken from St. C. Kleene's review article []. Consider the Turing machine specified by the machine table 1.1 with 11 internal states and two square symbols 0 for blank and 1 for s1, respectively.
ai | 0 | 1 |
0 | HALT | HALT |
1 | 0C0 | 1R2 |
2 | 0R3 | 1R9 |
3 | 1L4 | 1R3 |
4 | 0L5 | 1L4 |
5 | 0L5 | 1L6 |
6 | 0R2 | 1R7 |
7 | 0R8 | 0R7 |
8 | 0R8 | 1R3 |
9 | 1R9 | 1L10 |
10 | 0C0 | 0R11 |
11 | 1C0 | 1R11 |
At time t=0, the Turing machine is in internal state 1 and scans the 3rd position of the tape, whose squares are in the states 0110 ¼ (``¼'' denotes blank states here). The time evolution of this machine is enumerated in table 1.2.
time / tape symbolinternal state | 1 | 2 | 3 | 4 | 5 | 6 | 7 | ¼ |
0 | 0 | 1 | 11 | 0 | 0 | 0 | 0 | ¼ |
1 | 0 | 1 | 1 | 02 | 0 | 0 | 0 | ¼ |
2 | 0 | 1 | 1 | 0 | 03 | 0 | 0 | ¼ |
3 | 0 | 1 | 1 | 04 | 1 | 0 | 0 | ¼ |
4 | 0 | 1 | 15 | 0 | 1 | 0 | 0 | ¼ |
5 | 0 | 16 | 1 | 0 | 1 | 0 | 0 | ¼ |
6 | 0 | 1 | 17 | 0 | 1 | 0 | 0 | ¼ |
7 | 0 | 1 | 0 | 07 | 1 | 0 | 0 | ¼ |
8 | 0 | 1 | 0 | 0 | 18 | 0 | 0 | ¼ |
9 | 0 | 1 | 0 | 0 | 1 | 03 | 0 | ¼ |
10 | 0 | 1 | 0 | 0 | 14 | 1 | 0 | ¼ |
11 | 0 | 1 | 0 | 04 | 1 | 1 | 0 | ¼ |
12 | 0 | 1 | 05 | 0 | 1 | 1 | 0 | ¼ |
13 | 0 | 15 | 0 | 0 | 1 | 1 | 0 | ¼ |
14 | 06 | 1 | 0 | 0 | 1 | 1 | 0 | ¼ |
15 | 0 | 12 | 0 | 0 | 1 | 1 | 0 | ¼ |
16 | 0 | 1 | 09 | 0 | 1 | 1 | 0 | ¼ |
17 | 0 | 1 | 1 | 09 | 1 | 1 | 0 | ¼ |
18 | 0 | 1 | 1 | 1 | 19 | 1 | 0 | ¼ |
19 | 0 | 1 | 1 | 110 | 1 | 1 | 0 | ¼ |
20 | 0 | 1 | 1 | 0 | 111 | 1 | 0 | ¼ |
21 | 0 | 1 | 1 | 0 | 1 | 111 | 0 | ¼ |
22 | 0 | 1 | 1 | 0 | 1 | 1 | 011 | ¼ |
23 | 0 | 1 | 1 | 0 | 1 | 1 | 10 | ¼ |
[Cellular Automaton]
Assume discrete time
cycles, labelled by 0,1,2,¼. A Cellular
Automaton (CA) is an infinite collection of locally connected finite
automata (cells) of the tessellated \ZD, D Î
\N. Let ai,[n\vec](t) stand for the i'th internal state of the
[n\vec]'th automaton (cell), [n\vec]=(n1,n2,¼, nD), at time t. Let { [n\vec]} characterise the
D-dimensional neighbourhood of [n\vec], including [n\vec], and let ai,{
[n\vec]} (t) stand for the internal states of the automaton cells around
[n\vec], including the origin [n\vec]. The internal state
ai,[n\vec](t+1) of the automaton at [n\vec] at time t+1 is given by a
computable function f of the state and its
neighbourhood states at time t, i.e.,
|
Remarks:
(i) Cellular Automata feature parallel processing, as opposed to the sequential Turing machine concept. Instead of the infinite tape, they operate with an infinite number of finite automata.
(ii) There exist several neighbourhood declarations, the most prominent ones being the von Neumann neighbourhood and the Moore neighbourhood, with 1+2D neighbours (including the centre) for D > 0 and 1+2D+2D neighbours (including the centre) for D > 1, respectively. For D=2, they are drawn in Fig. 1.2. Another neighbourhood declaration is the Margolus neighbourhood, which is obtained by partitioning the cell array into finite, disjoint and uniformly arranged pieces (blocks), by block rules updating the block as a whole; and by a change of the partition with every time step (for details, see T. Toffoli and N. Margolus, Cellular Automata Machines [], chapter 12).
A |
(iii) Universality of a specific CA in the sense of the Church-Turing thesis can be proved by embedding a Turing machine into a suitable Cellular Automaton structure [,,]. One particular ``fancy'' example, E. Fredkin's Billiard Ball Model [], mimics universal computation via the ``elastic scattering'' of CA ``billard balls''.
(iv) CA's perform optimally for physical configurations which can be decomposed into locally connected parallel entities. For more information and for recent developments, see T. Toffoli and N. Margolus, Cellular Automata Machines [], St. Wolfram's Theory and Application of Cellular Automata [] as well as references [,,]. For technical realisations, see, among others, references [,,].
The statement that commercially available general-purpose computers are in some sense ``universal computers'' is quite obvious []. Otherwise they would not be very helpful for general algorithmic tasks, such as commercial applications, evaluation of particular functions (e.g., solutions of differential equations) and so on. It is, for instance, possible to simulate an arbitrary Turing machine with finite tape on them.
As available general-purpose computers are finite automata, i.e., as they are limited by the finiteness of memory size, tape length, runtime and virtually everything one can think of, they cannot compute all effectively computable algorithms. In terms of the Church-Turing thesis, the corresponding functional class is a subset of the recursive functions.
It is evident that tasks which are uncomputable on universal computers (such as the halting problem, see on p. ) cannot be solved on available general-purpose machines either. In this respect, they share some features with the fictious but truly universal abstractions, in particular with respect to undecidability.
Finite automata are computational devices which are finite in all of their features. They have a finite number of internal states, a finite number of input and output symbols, finite tape et cetera. Conceived as computational universes, they create environments which are more restricted than universal computers or, even more so, than oracles.
Every automaton is a universe of its own, with a specific ``flavour,'' if you like. Programmers may create ``cyberspaces'' (a synonym for automaton universes) of their imagination which are, to a certain extend, not limited by the exterior physical laws to which they and their hardware obey. Seen as isolated universes, some of these animations might have nothing in common with our physical world. Yet, others may serve as excellent playgrounds for the physicist.
Why should physicists investigate finite automata? There is a modest and a radical answer to this question. The modest finite automata thesis is this: There exist algorithmic features of automaton universes such as complementarity (see chapter , p. ) or undecidability (see chapter , p. ) which translate into physics and which are difficult to analyse by non algorithmic means.
The radical finite automata thesis is this: Because the physical universe actually is a finite automaton. This viewpoint has probably been most consequently put forward by E. Fredkin [] in the context of Cellular Automata. (Although a universal Cellular Automaton is no finite machine - it has infinite extension - its cell space is discrete, leaving room to only a finite, although ``very large,'' number of cells per unit volume element of whatever is considered as the fundamental scale.) The radical finite automaton thesis contains two claims, each of which should in principle be testable, at least to some extend: (i) the ``laws of nature'' are mechanistic; i.e., computable in the usual Church-Turing sense (although intrinsically there might not exist any effective procedure to find these laws and although forecasts corresponding to halting problems in general may not be computable); and (ii) under certain ``mild'' assumptions, the computational capacities of physical systems are finite. See also chapter 3, p. pageref for a discussion of related topics.
In the following, an automaton shall be characterised algebraically. For a more detailed treatment, see, among others, E. Moore's original article [], as well as the books by J. H. Conway [], J. E. Hopcroft & J. D. Ullman [], J. R. Büchi [] and W. Brauer []. Readers less interested in formal definitions may skip these sections. For them, it suffices to keep in mind that a (i,k,n)-automaton has i internal states, k input and n output symbols; it is characterised by its transition and output functions d and o. Two states are called distinguishable if there exists at least one experiment which distinguishes between them; i.e., which yields non identical output. Two automata are isomorphic if there exists a one-to-one translation between them and if corresponding output symbols are identical.
[k-algebra, (i,k,n)-automaton] A k-algebra A=(A,e,I,d) is a system consisting of
A (i,k,n)-automaton A=(A,e,I,O,d,o) is a k-algebra A=(A,e,I,d) with
This type of an automaton is often called Moore automaton, since its output function o only depends on the internal state of the automaton. Another, more general automaton definition is the Mealy automaton, which is defined similarly to the Moore automaton, except that the output function o:A×I® O depends also on some particular input (instead of merely the internal automaton state). Whereas for the Moore automaton type one gets ``the first output free,'' any output from the Mealy automaton is obtained only after investment of one input symbol. If one discards the first output symbol from a Moore automaton, the class of Moore automata and the class of Mealy automata are equivalent; i.e., any Moore automaton can be translated into an isomorphic Mealy automaton and vice versa.
Any finite k-algebra can be represented by a transition table, with the present internal states enumerated in the first row, the input symbols enumerated in the first column and the future internal states in the remaining matrix. A finite (i,k,n)-automaton is additionally characterised by the output function. This function may be represented by an output table, listing all internal states (and the input symbols for Mealy-type automata) and their corresponding output symbol(s). A transition graph is obtained by drawing circles for every internal state and by drawing directed arrows to indicate the transition function on some input. For examples, see chapter , p. .
Transitions are indicated in the following way: let v Î A, then the transition from v with input i is vdi. This notation translates into the standard notation. Let U denote a universal computer and pA stand for the program which simulates a (i,k,n)-automaton on U, such that initially the (i,k,n)-automaton is in internal state e. Then U(pA,s)=t, with the input sequences s generated by concatenation of input symbols s1¼sn, and with the output sequences t generated by concatenation of output symbols [t0]t1¼tn ([t0] is optional for Moore-type automata).
The ``halting state'' can be simulated by the introduction of a special state, say h. If the automaton is in this state, no input will cause it to leave that state.
[Distinguishable internal states] Let w=s1 s2 ¼sk be an input sequence (of length k), then let
|
An arbitrary number of states {a1,a2¼, ai} are called distinguishable if and only if there exists at least one experiment performed on A of which the outcome depends on which one of these states the automaton was in at the beginning of the experiment.
An internal state ai of an automaton A is called
indistinguishable from a state aj of A if and only if every
experiment performed on A starting in the state ai produces the same
output as if it would start in state aj. I.e., two states
ai and aj are indistinguishable if and only if
|
[Oracle (Turing [])] An oracle is some agent, or ``black box,'' which, upon being consulted, supplies the true (correct) answers about mathematical or algorithmic or physical entities. The ``magical insight'' characterises the non deterministic feature of oracles. The corresponding problem may or may not be decidable by any effective computation, the latter case being more interesting. An example for such a problem is the halting problem (for details see , p. ).
A very similar setup has been introduced in Hermann Weyl's book Philosophy of Mathematics and Natural Science [], which has been discussed by A. Grünbaum in Philosophical Problems of Space and Time [], p. 630. H. Weyl raised the question whether it is kinematically feasible for a machine to carry out an infinite sequence of operations in finite time. H. Weyl writes ([], p. 42):
¼ Yet, if the segment of length 1 really consists of infinitely many sub-segments of length 1/2, 1/4, 1/8, ¼, as of `chopped-off' wholes, then it is incompatible with the character of the infinite as the `incompletable' that Achilles should have been able to traverse them all. If one admits this possibility, then there is no reason why a machine should not be capable of completing an infinite sequence of distinct acts of decision within a finite amount of time; say, by supplying the first result after 1/2 minute, the second after another 1/4 minute, the third 1/8 minute later than the second, etc. In this way it would be possible, provided the receptive power of the brain would function similarly, to achieve a traversal of all natural numbers and thereby a sure yes-or-no decision regarding any existential question about natural numbers! ¼The considerations below concentrate on algorithmics aspects. For paradoxical usages of Zeno squeezed oracles, see section , p. .
The argument is similar to a paradox which is often referred to as ``Achilles and the Tortoise,'' or ``Achilles and Hector,'' ascribed to Zeno of Elea (5th century B.C.). For a detailed treatment, see H. D. P. Lee's Zeno of Elea [], as well as G. S. Kirk's and J. E. Raven's The Presocratic Philosophers [], p. 292. It describes a race between Hector and Achilles as follows: for simplicity, assume that Hector is one unit of distance ahead of Achilles, and that Achilles runs twice as fast as Hector. Will Achilles ever catch up with and overtake Hector? Obviously not, if one argues as follows: Achilles runs to Hector's new position; however in the time that it took Achilles to run one unit of distance, Hector has advanced 1/2 unit of distance. Achilles runs to Hector's new position; however in the time that it took Achilles to run 1/2 unit of distance, Hector has advanced 1/4 unit of distance. Achilles runs to Hector's new position; however in the time that it took Achilles to run 1/4 unit of distance, Hector has advanced 1/8 unit of distance. ¼ ad infinitum. According to Aristotle in Physics Z9, Zeno seems to have argued that, by assuming infinite divisibility of space and time, one arrives at the conclusion that Achilles never catches up with Hector, which, given every-day experience that a faster body overtakes a slower one, is an absurd conclusion; Hence, if Zeno is interpreted correctly, he seems to have argued that infinite divisibility of space and time contradicts experience.
Zeno's argument has been formalised and interpreted in terms of undecidability by E. G. K. López-Escobar [] (see also E. W. Beth, The Foundations of Metamathematics [], p. 492). In a sense, Zeno may have anticipated Cantor's method of diagonalization: E. G. K. López-Escobar constructs an expression for the points successively reached by Achilles; then he constructs a new point, the ``meeting point'' of Achilles and Hector, for which this expression does not hold and which is nevertheless reached by Achilles. This contradicts the assumption that the expression describes the relative motion of Achilles and Hector completely. A sketch of the formal argument goes as follows: Let POS(rA(t),rH(t)) stand for the binary predicate that represents the simultaneous positions of Achilles rA(t) and Hector rH(t) at time t. The time will be measured discretely, i.e., t=0,1,2,¼ . As introduced here, the time parameter t is not Achilles' and Hector's ``proper time,'' but rather a ``squeezed'' time parameter. Intuitively speaking, the higher is t, the smaller is the amount of ``proper time'' corresponding to one unit of t-time (from t to t+1). In the limit of t® ¥, one unit of t-time corresponds to a ``proper time'' of measure zero. (For more details, see the oracle problem solver described below.)
Assume again that Hector is one unit of distance ahead of Achilles and that
Achilles runs twice as fast as Hector. From the ``axiom''
|
(1.13) |
|
|
(1.15) |
Let us come back to the original goal: the construction of a ``Zeno squeezed oracle,'' or, in A. Grünbaum's terminology, of an ``infinity machine.'' It can be conceived as follows: Consider two time scales t and t.
(i) The proper time t measures the physical system time by clocks in a way similar to the usual operationalisations; whereas
(ii) a discrete cycle time t Î \N characterises a sort of ``intrinsic'' time scale for a process running on an otherwise universal machine.
For some unspecified reason we assume that this machine would allow us to ``squeeze'' its intrinsic time t with respect to the proper time t by a geometric progression. Let k < 1, then any time cycle of t, if measured in terms of t, is squeezed by a factor of k with respect to the foregoing time cycle (see Fig. 1.3), i.e.,
|
With such a device, countable many intractable and uncomputable problems,
such as the halting problem, would become solvable. One could, for instance
``effectively compute'' the algorithmic halting probability W in the limit from below [cf. ref. [] and (), p. ] by
identifying the time of computation with t. In the limit t® ¥, for k < 1 one obtains
limt® ¥ wt=W at the finite proper
time t < ¥. With such a
Zeno squeezed oracle, many other problems would be within the reach of a
``constructive solution.'' Take, for instance, Fermat's conjecture, which could
be ``(dis)proved'' by the following algorithm (in FORTRAN-style
pseudocode): DO LABEL1
a=1, 1, ¥;
DO LABEL1 b=1, 1, ¥;
DO LABEL1 c=1,
1, ¥;
DO LABEL1 n=3, 1, ¥;
IF
a**n+b**n=c**n THEN PRINT (*,*) a,b,c,n;
PRINT (*,*) `FERMAT WAS WRONG!!!';
STOP 1;
ELSE GOTO LABEL1;
LABEL1:
CONTINUE;
PRINT (*,*) `FERMAT
WAS RIGHT!!!'; STOP 2;
END;
In the spirit of Zeno, the non existence of oracle problem solvers could be viewed as a further indication of the non existence of infinite divisibility. Indeed, it is not evident why such an oracle problem solver can be excluded in classical physics, e.g., in continuum mechanics. [This is related to infinite (measurement) precision in classical physics.] It is not completely unreasonable to put forward the provocative statement that, at least in principle, oracles of the above type are allowed in classical physics, and that classically the Church-Turing thesis 1.2.6, p. pageref, could be falsified by the actual construction of an oracle problem solver. (One might speculate that a simple process corresponding to diagonalization might cause the inconsistency of such a classical universe. The choice seems to be unlimited computational capacity versus consistency; cf. section , p. ).
A real number can be represented as a Cauchy sequence of rationals. Following
Pour-El & Richards, a translation into recursion theory requires (i)
a definition of a ``computable sequence of rationals,'' and (ii) a
definition of ``effective convergence.'' [Computable sequence of
rationals]
A sequence { rn} of
rationals is computable if there exist three recursive functions a(n), b(n),
c(n) from \N to \N such that b(n) ¹ 0 for all n and
|
|
Remarks:
(i) A complex number is computable if both its real and imaginary parts are computable; A vector is computable if each of its components is computable; a sequence { xk } of real numbers is computable if all reals xk are computable.
(ii) Let {xk} and {yk} be computable sequences of real numbers. Then the following sequences are computable: xk±yk, xkyk, xk/yk (yk ¹ 0 "k), min (xk,yk), max(xk,yk), exk, sinxk, logxk (xk > 0 " k), ¼. These and similar results can be proven by Taylor series or other expansions which converge effectively.
(iii) Differentiation and integration can be properly reformulated in terms of recursion theory (for details see [], p. 33).
[Non recursive enumerability of the waiting time]
Let a:\N ® A Ì \N be a one-to-one recursive
function generating a recursively enumerable but non recursive set A. Let w(n)
denote the waiting time, defined by
|
|
|
``Pedestrian versions'' of this statement and other important findings whose proofs are rather involved are listed below:
(i) There exist non computable reals corresponding to convergent sequences of computable rationals with non effective convergence. (C.f. G. Chaitin's W []; see also , p. .)
(ii) The maximum (minimum) value of a computable function is computable, but the point(s) where the maximum occurs need(s) not be (see, for instance, [], p. 42, and []). A necessary condition for this to occur is that there are infinitely many maximum (minimum) points. If a computable function takes on a local maximum (minimum) at an isolated point, then this point is computable.
(iii) Differential equations may have uncomputable solutions from computable initial values, but these solutions must be ``weak solutions'' in the sense of distributions. For the wave equation in 3+1 dimensions, that means that weak solutions are not even in C1, the class of differentiable functions whose derivations are continuous ([], p. 73). Differential equations with computable initial values may also have uncomputable non unique solutions; see [] and G. Kreisel [].
(iv) Bounded linear operators in Banach space preserve computability, and unbounded operators do not ([], chapter 3).
(v) Under mild side conditions, a self-adjoint operator has computable eigenvalues, although the sequence of eigenvalues need not be computable ([], chapter 4).
The question still remains if ``pathologies'' such as non recursive solutions from evolution equations with recursive initial values have a physical meaning. Personally, the author believes that they seem to indicate the necessity for further restrictions of the class of solutions by physical assumptions, such as uniqueness, finiteness and continuity. This could be done on operational, on practical, on aesthetic and, in a broader sense, on metaphysical grounds. Yet, if the non recursive solutions are interpreted as physically meaningful, these solutions would correspond to physical events which would have been generated by a computer agent ``much more'' powerful than any universal computational device.
The (recorded) formalisation of theory began with Euclid and evolved to the concept of a formal (axiom) system, which will be introduced next. The goal is that, after specifying certain theoretical symbols, all relevant theoretical entities have to be expressed by axioms and rules of deduction. Then it should be possible to ``mechanically'' derive theorems by string processing or by other purely syntactic means. (In its extreme consequence, the program of formalisation can be understood as the ``elimination of the necessity of meaning and intuition.'') [Formal system] A formal system L is a system of symbols together with rules for employing them. The individual symbols are elements of an alphabet. Formulas are sequences of symbols. There shall be defined a class of formulas called well-formed formulas, and a class of well-formed formulas called axioms (there may be a finite or infinite number of axioms). Further, there shall be specified a list of rules, called rules of inference. If such a rule is called R, it defines the relation of immediate consequence by R between a set of well-formed formulas M1,¼,Mk called premises, and a formula N called conclusion or theorem.
Examples of formal systems are the Peano axioms, Zermelo-Fraenkel set theory (with or without the axiom of choice), certain geometries et cetera.
The essence of inference, at least for a formalised theory, is string processing []. String processing can be performed by any (universal) computer. Conversely, any effectively computable process can be identified with some syntactic activity which is associated with ``inference.'' (Nothing needs to be said about the semantic aspect of such a formalism, such as the ``meaning'' of some string or of some process.)
Therefore, from a purely syntactic point of view, every effectively computable process U(p,s) can be identified with a formal system and vice versa. Indeed, as stated by K. Gödel in a Postscriptum, dated from June 3rd, 1964 ([], p. 369-370):
¼ due to A. M. Turing's work [[reference[]]], a precise and unquestionably adequate definition of the general concept of formal system can now be given, the existence of undecidable arithmetical propositions and the non-demonstrability of the consistency of a system in the same system can now be proved rigorously for every consistent formal system containing a certain amount of finitary number theory.Turing's work gives an analysis of the concept of ``mechanical procedure'' (alias ``algorithm'' or ``computation procedure'' or ``finite combinatorial procedure''). This concept is shown to be equivalent with that of a ``Turing machine.'' A formal system can simply be defined to be any mechanical procedure for producing formulas, called provable formulas.
Let thus the ``input'' s of the computation correspond to ``axioms,'' the ``program'' p correspond to the ``rules of inference,'' and the ``output'' correspond to ``provable theorems,'' respectively (see table 2.2, p. pageref, for a translation of terminology). It should be stressed again that from the point of view of abstract coding and information theory, input and program are interchangeable, since, given a program p and a specific input s, it is always possible to write another program p¢ such that U(p¢,Æ)=U(p,s). One may object that computations may halt while one may derive theorems from the axioms of formal systems forever. This critique can be met by considering only computations which never halt. If a computation halts, it can be identified with a similar computation which differs from the original one only by the feature that instead of the halting mode it goes into an infinite loop with no output (i.e., by the substitution HALT ® LABEL 1: GO TO LABEL 2; LABEL 2: GO TO LABEL 1).
Another equivalence scheme between algorithms and formal systems has been introduced by G. Chaitin []. It again starts with the observation that one essential feature of formal systems is the existence of objective ``derivation'' rules, representable by some effective computation U¢(p,t), and (by the Church-Turing thesis) an associated recursive function which, applied to the axioms p, yields all theorems which can be derived from proofs with less than or equal to t characters length. In this scheme, the terms ``computer'' and ``rules of inference'', as well as ``program'' and ``axioms'', as well as ``output up to cycle time t'' and ``theorems with proof size £ t'' are synonyms. For fixed cycle times t, Chaitin's scheme and the above scheme can be brought into a one-to-one correspondence by recalling that for a universal computer C there exists a program q which simulates U¢ on U and an input (p,t) such that U¢(p,t)=U(q,(p,t)) for all p and t.
With recursion theory being a relatively young discipline, the notion of ``physical determinism'' in classical physics lets unspecified the exact recursion theoretic status of the physical entities. In this chapter, such a specification is attempted, re-interpreting the classical meaning recursion-theoretically. It is not unreasonable to assume that a recursive evolution seems to be ``at the heart'' of the classical notion of ``determinism.'' Usually, these evolution functions are defined on continua, such as \Rn. In particular, the initial values and the solutions are defined in \Rn. Yet, by theorem 1.2.5, p. pageref, ``almost all'' elements of the continuum are uncomputable. In particular, the assumption of an exact (i.e., effectively computable) description of the initial value(s) becomes ridiculed.
``Strong determinism'' or what henceforth will be called ``mechanism'' could alternatively be understood as a synonym for ``total causality,'' which might be translated into the language of recursion theory by ``total computability,'' or ``computability in all aspects.'' (For a philosophical treatment of this and related topics, see for instance Ph. Frank, Das Kausalgesetz und seine Grenzen [].) This should not be confused with the above requirement of a recursive evolution. It is a nontrivial substitution, since it requires all theoretical entities to be effectively computable.
For example, classical continuum physics is ``deterministic'' but not ``mechanistic'' in the above sense, since its evolution equations are computable, whereas its domain is the continuum. Here one could resort to the notion of ``arbitrary but finite accuracy parameter values,'' or ``finitely computable parameter values.'' Since finite numbers are recursively enumerable, this would restore effective computability.
Therefore, in what follows (if not denoted otherwise) the term
``deterministic'' refers to merely an effectively computable evolution
function, whereas a system which is totally computable in all of its
aspects is called ``mechanistic.'' In a way, this is a translation of Laplace's
demon into the language of recursion theory. [Deterministic, mechanistic
theory]
A deterministic
theory has an evolution function which is effectively computable / recursive.
A mechanistic theory is effectively computable / recursive in total. I.e., all theoretical entities are effectively computable / recursive. In particular, all initial values, laws and solutions of a mechanistic theory are recursively enumerable.
The terminology ``mechanistic'' is due to G. Kreisel []. According to G. Kreisel, a theory is mechanistic
if every sequence of natural numbers or every real number which is well defined (observable) according to the theory [[is]] recursive or, more generally, recursive in the data.The definition given above does not directly refer to predicates such as ``well defined'' or ``observable.'' Also, the question remains whether it is possible to obtain physically meaningful (i.e., not excludable on by present physical theory) non recursive solutions of recursive initial values and recursive evolution equations. As has been shown by G. Kreisel [] and M. B. Pour-El and J. I. Richards [,], in principle, such solutions exist. (See also remark (ii) below and remarks (i)-(v) on page pageref.) In order to avoid them, it might therefore be necessary to impose further restrictions on physical solutions. Such restrictions should be ultimately motivated by physical reasoning. (One could also decide to take such solutions seriously; with whatever outcome being the consequence.)
Further Remarks:
(i) Physical determinism in the defined sense does not imply that the initial and/or the solution (the final state) can be represented by an effective computation.
(ii) As has been pointed out before, computability of the equation of motion and the initial value does not guarantee computability of the solution, at least not if the solution is non unique [], if it is obtained by unbounded linear operators ([], chapter 3), or if its a weak solution ([], p. 73). For mechanistic theories, uncomputable solutions from computable initial values and computable equations of motions are excluded by definition.
(iii) Since, from the point of view of coding and information theory, the distinction between the ``evolution'' and ``initial value'' (i.e., some algorithm and its input) is rather arbitrary, the above distinction between ``determinism'' and ``mechanism'' is rather arbitrary. A more radical but clearer distinction would be between non recursive and recursive (``mechanistic'') theories.
(iv) Uncomputability does not imply randomness (see chapter p. ).
Table 2.1 summarises the computational aspects of physical theories.
computable | uncomputable | |
evolution | evolution | |
computable | mechanistic | indeterministic |
initial value | physical theory | physical theory |
(computable | ||
solution | ||
uncomputable | deterministic | indeterministic |
initial value | physical theory | physical theory |
By identifying the evolution functions of mechanistic physics with the partial recursive functions, one obtains a correspondence which can be represented in table 2.2 (The table includes an extension to formal systems). The term correspondence between structures in physics, algorithmics and mathematics should be understood as a one-to-one translation between such structures, which become identical if only their elements are renamed. Note that, technically speaking, a mechanistic physical system should be perceived as a never-ending computational process. Such an algorithm which does not halt could alternatively be viewed as a continuing derivation of theorems of a formal system,
physics | algorithmics | mathematics | formal logic |
mechanistic system | algorithm | partial | formal |
recursive function | axiomatic system | ||
initial | input | argument(s) | axiom(s) |
state(s)/observable(s) | |||
evolution | computation | evaluation | derivation |
(final) | output | value | theorem |
state(s)/observable(s) | |||
Table 2.3 gives a brief overview of the algorithmic features of physical theories.
physical theory | mathematical entity | algorithmic status | machine model |
classical mechanics | |||
space, time, | continuum | deterministic but | oracle |
forces, ... | non mechanistic | ||
electrodynamics | |||
space, time, | continuum | deterministic but | oracle |
fields, ... | non mechanistic | ||
quantum mechanics | |||
wave function | continuum | deterministic but | oracle |
space, time, | continuum | non mechanistic | |
fields, ... | |||
randomness of | indeterministic | oracle | |
singele events | |||
phase space | discretum | ||
discrete physics | discretum | computable | finite and |
(see chapter 3) | infinite automaton | ||
All previously introduced models of computation have been discrete in the sense that the processes involved are discontinuous with respect to space and time, storage capacity, program length and so on. Effective computations can be decomposed into distinct parts; e.g., execution steps. Any algorithmic model of a physical system inherits this feature. Discreteness is opposed to the assumption of continuity by classical physics.
In what follows, several features of discrete versus continuous models of motion and evolution will be discussed, partly by taking up the arguments of Zeno of Elea (5th century B.C.) on motion. For further considerations and references, see H. D. P. Lee's Zeno of Elea [] and A. Grünbaum's Modern Science and Zeno's paradoxes [] and Philosophical Problems of Space of Time, Second, enlarged edition, p. 159; Ibid., p. 808 [].
The arguments will be phrased in terms of Cellular Automaton models. This can be done without loss of generality as long as the context is universal computation. The same holds for discretization of space-time versus discretization of other variables; e.g., action-angle. For any universal computer can simulate any other universal computer, the only difference being the complexities required for one system to simulate the other.
In what is now often referred to as the paradox of ``dichotomy,'' Zeno argues that (in Simplicius' interpretation, quoted from [], p. 45) if space is infinitely divisible, and if
``¼ there is motion, it is possible in a finite time to traverse an infinite number of positions, making an infinite number of contacts one by one; but this is impossible, and therefore there is no motion.''[Infinite divisibility of a set does not imply that its members are not recursive. For example, the set of the rational numbers \Q and the set of the real numbers \R are both infinitely divisible (dense), yet every rational is computable and ``almost all'' reals are uncomputable.] A similar argument is put forward in the paradox of ``Achilles and the Tortoise,'' which has been described in detail in chapter 1, p. pageref.
A ``resolution'' of this kind of paradox can be obtained by assuming an infinite divisibility not only of space but also of time. I.e., one finite (proper) time interval can be divided into an infinitely many time steps in such a way that a body in motion traverses an infinite number of positions, making an infinite number of contacts, one position by one time step. A proper formalisation of this resolution can be expressed in terms of analysis. For further discussions along different lines see A. Grünbaum [,].
Yet, as this resolution (and even constructive analysis []) operates by
dividing space-time into infinitely many subdivisions, finite processes of
computation are insufficient models of the situation. Recall Zeno's argument, p.
pageref. If one actually
wants to simulate, for instance, successive positions of ``Achilles and the
Tortoise,'' one may start by giving the following table:
position of the Tortoise: | 1 | 3/2 | 7/4 | [ 15/8] | [ 63/32] | ¼ | 2-[ 1/(2n)] |
position of Achilles: | 0 | 1 | 3/2 | 7/4 | [ 15/8] | ¼ | 2-[ 1/(2n-1)] |
Stated differently, by assuming infinite divisibility of space and time, an infinity of space-time points is approached by Achilles' and the Tortoise's body on a step-by-step basis in finite proper time and space. This would require a capacity of the system to compute ``the limit'' (of infinitely small space and time scales).
It is not completely unreasonable to speculate that in some way or another, this capacity of a physical system to compute these limits could be utilised; e.g., for the construction of an oracle which might be able to solve the halting problem. (For details, see p. pageref.) This would require a dramatic revision of our concept of ``effective'' or ``mechanic'' computation in the way discussed in chapter 1. - Oracle computation by continuous systems might be seen as one aspect of the fact that they do not correspond to any process of effective computation.
If one excludes limit computations of the above kind, one has to explain why. It is to be expected that an explicit statement of exclusion will effectively render a discrete mathematical model of motion, as it is discussed below. This may be seen as a translation of Zeno's paradoxes of ``dichotomy'' and ``Achilles and the Tortoise'' into a more contemporary form; using terminology from continuum physics and recursion theory.
In the paradox of the ``arrow,'' according to Aristotle [Phys. Z 9], Zeno argues as follows ([], p. 53):
¼ For if, he says, everything is either at rest or in motion, but nothing is in motion when it occupies a space equal to itself, and what is in flight is always at any given instant occupying a space equal to itself, then the flying arrow is motionless.This paradox may be derived from practical experiences: During an arbitrary but finite observation period [t0,t1] a flying arrow gets ``smeared out'' as its ``extension'' LENGTH is measured by the position of the arrow tail r0 at t0 and the position of the arrow head r1 at t1. For finite time intervals of length Dt=t1-t0, LENGTH (Dt) = r1-r0 is always greater than the length [`(LENGTH)] of the arrow at rest; i.e., for Dt > 0,
¼
This conclusion follows from the assumption that time is composed of instants.
|
(3.1) |
|
(3.2) |
One concretion of the assumption that time is composed of instants or atomic ``nows'' is the requirement that time is discrete. Assume for the moment that space is discrete as well.
Motion in a discrete space-time can be realised by Cellular Automata. For a vision of Cellular Automaton models for space-time, see for instance, the articles Digital Mechanics by E. Fredkin [], Cellular Vacuum by M. Minsky [] and the book Cellular Automata Machines by T. Toffoli and N. Margolus []. In what follows, one-dimensional Cellular Automaton models of arrow motion are discussed.
The simplest arrow motion can be realised by a right shift; i.e., given a cell with its surrounding two neighbours, the transition rule {l,X,X}® l (X stands for any state) yields the state l as the next value for the center state. Let the initial state be ``¼ ->____________ ¼.'' Then, the time evolution is given by (see p. for a Mathematica program simulating this evolution):
... --->_________________________________________________________ ... ... _--->________________________________________________________ ... ... __--->_______________________________________________________ ... ... ___--->______________________________________________________ ... ... ____--->_____________________________________________________ ... ... _____--->____________________________________________________ ... ... ______--->___________________________________________________ ... ... _______--->__________________________________________________ ... ... ________--->_________________________________________________ ... ... _________--->________________________________________________ ... ... __________--->_______________________________________________ ... ... ___________--->______________________________________________ ... ... ____________--->_____________________________________________ ... ... _____________--->____________________________________________ ... ... ______________--->___________________________________________ ... ... _______________--->__________________________________________ ... ... ________________--->_________________________________________ ... ... _________________--->________________________________________ ... ... __________________--->_______________________________________ ... . . .
The velocity of the arrow is 1 cell per cycle time, independent of the length of the arrow. A less trivial example can be found in M. Minsky []. There, the transition rule is given by {-,-, > }® > , {X,-, > }® X, {X, > , > }® *, {*,X,X}® > , {*, > ,X}® *, {X,*,X}®- (X stands for any state). An arrow in this cellular array propagates with a velocity which depends on its length. E.g., for an arrow of length 3,4,5 and 6 cells, the velocity is 1 cell per 5,8,9 and 11 cycles, respectively. Explicitly,
... ->__________________________________________________________ ... ... ->>__________________________________________________________ ... ... _>>__________________________________________________________ ... ... _*>__________________________________________________________ ... ... _-*__________________________________________________________ ... ... _->_________________________________________________________ ... ... _->>_________________________________________________________ ... ... __>>_________________________________________________________ ... ... __*>_________________________________________________________ ... ... __-*_________________________________________________________ ... ... __->________________________________________________________ ... ... __->>________________________________________________________ ... ... ___>>________________________________________________________ ... ... ___*>________________________________________________________ ... ... ___-*________________________________________________________ ... ... ___->_______________________________________________________ ... ... ___->>_______________________________________________________ ... ... ____>>_______________________________________________________ ... ... ____*>_______________________________________________________ ... . . . (period 5) ... --->________________________________________________________ ... ... --->>________________________________________________________ ... ... ->>>________________________________________________________ ... ... ->>>>________________________________________________________ ... ... _>>>>________________________________________________________ ... ... _*>>>________________________________________________________ ... ... _-*>>________________________________________________________ ... ... _-*>________________________________________________________ ... ... _---*________________________________________________________ ... ... _--->_______________________________________________________ ... ... _--->>_______________________________________________________ ... ... _->>>_______________________________________________________ ... ... _->>>>_______________________________________________________ ... ... __>>>>_______________________________________________________ ... ... __*>>>_______________________________________________________ ... ... __-*>>_______________________________________________________ ... ... __-*>_______________________________________________________ ... ... __---*_______________________________________________________ ... ... __--->______________________________________________________ ... . . . (period 9) ... ---->_______________________________________________________ ... ... --->>_______________________________________________________ ... ... --->>>_______________________________________________________ ... ... ->>>>_______________________________________________________ ... ... ->>>>>_______________________________________________________ ... ... _>>>>>_______________________________________________________ ... ... _*>>>>_______________________________________________________ ... ... _-*>>>_______________________________________________________ ... ... _-*>>_______________________________________________________ ... ... _---*>_______________________________________________________ ... ... _---*_______________________________________________________ ... ... _---->______________________________________________________ ... ... _--->>______________________________________________________ ... ... _--->>>______________________________________________________ ... ... _->>>>______________________________________________________ ... ... _->>>>>______________________________________________________ ... ... __>>>>>______________________________________________________ ... ... __*>>>>______________________________________________________ ... ... __-*>>>______________________________________________________ ... . . . (period 11)
One could speculate that, if the motion of a body in the physical vacuum can be described similarly, then there is a remote possibility to increase this velocity by proper algorithmic manipulations; e.g., by re-programming the vacuum motion or by re-shaping the body.
There is an obvious contradiction with Zeno's argument: in the listed Cellular Automaton configurations the arrow in flight occupies a cell region equal to its size, and yet it moves. This may be explained from different perspectives.
One may argue that the shape of the arrow (i.e., the initial state) and the transition rules contain information about the evolution; such that any ``snapshot'' of the position of the arrow should be accompanied by another quantity called ``momentum,'' which completes a description of the arrow's motion. In this way, phase space is introduced.
Another explanation would be that the shape of the arrow (i.e., the initial
state) and the transition rules contain ``hidden'' information about the
evolution which reveals itself after successive time periods. This argument
somewhat resembles the construction of a reversible second-order system
(computation) proposed by E. Fredkin ([], p. 141) by
|
In the paradox of the ``stadium,'' Zeno argues that an attempt to
define space and time distances by the motion of rigid bodies results in a
contradiction. Assume again that space and time are uniformly partitioned into
cells and instants (time cycles). Consider identical bodies marked by
A |
B |
C |
A |
A |
A |
A |
B |
B |
B |
B |
C |
C |
C |
C |
A |
A |
A |
A |
B |
B |
B |
B |
C |
C |
C |
C |
B |
A |
C |
This argument is somewhat similar to A. Einstein's critique of the classical notion of simultaneity put forward in relativity theory, resulting in a transformation of space and time scales []. In discrete physics, any velocity will be characterised by rationals, the ``natural'' extrinsic unit being one cell per cycle time. Without loss of generality it is assumed that the maximal velocity a body can move is one cell per cycle time, denoted by c. In analogy to relativity theory, this velocity is used to define synchronised events. Flows of this kind will be called rays.
The synchronisation convention of two clocks at two points A and B can be
adopted from relativity theory ([], p. 892): [Einstein synchronisation []]
Assume two clocks at two arbitrary points A and B which are ``of similar kind.''
At some arbitrary A-time tA a ray goes from A to B. At B it is
instantly (without delay) reflected at B-time tB and reaches A again
at A-time tA¢. The clocks in A and B are
synchronised if
|
The two-ways ray velocity is given by
|
It is identical for all frames, irrespective of whether they are moving with respect to the rest frame of the cellular space or not.
Example:
Synchronisation by ray exchange can be simulated on a Cellular Automaton using the following rules for inertial arrow motion (X stands for any state): { > ,_,X} ® > , {X,_, < } ® < , {_,_,_} ® _ , {X,_, > } ® _ , { < ,_,X} ® _ , {_, > ,_} ® _ , {_, < ,_} ® _ and for reflection (rigid mirror I): {X, > ,I} ® < , {I, < ,X} ® > , {X,I,X} ® I , {_,_,I} ® _ , {I,_,_} ® _ , {I, > ,_} ® _ , {_, < ,I} ® _. (The evolution can, for instance, be realised by the Mathematica program on p. .) With two rigid mirrors at A and B spaced 7 cells apart, the evolution is given by
A B ... _____________________I>______I______________________________ ... ... _____________________I_>_____I______________________________ ... ... _____________________I__>____I______________________________ ... ... _____________________I___>___I______________________________ ... ... _____________________I____>__I______________________________ ... ... _____________________I_____>_I______________________________ ... ... _____________________I______>I______________________________ ... ... _____________________I______<I______________________________ ... ... _____________________I_____<_I______________________________ ... ... _____________________I____<__I______________________________ ... ... _____________________I___<___I______________________________ ... ... _____________________I__<____I______________________________ ... ... _____________________I_<_____I______________________________ ... ... _____________________I<______I______________________________ ... ... _____________________I>______I______________________________ ... ... _____________________I_>_____I______________________________ ... ... _____________________I__>____I______________________________ ... ... _____________________I___>___I______________________________ ... ... _____________________I____>__I______________________________ ... ... _____________________I_____>_I______________________________ ... ... _____________________I______>I______________________________ ... ... _____________________I______<I______________________________ ... ... _____________________I_____<_I______________________________ ... ... _____________________I____<__I______________________________ ... ... _____________________I___<___I______________________________ ... ... _____________________I__<____I______________________________ ... . . .
Synchronisation can be defined in all frames, irrespective of whether they
are moving with respect to the cell space or not. However, it cannot be expected
that two frames which are in motion against each other have identical
synchronisations. Indeed, the assumption of identical synchronisations yields a
contradiction. Consider the following Gedankenexperiment, which is again
due to Einstein ([], p. 895): Let the points A and B be the endpoints of a rigid
rod. Assume that the rod is moving with constant velocity with respect to the
cellular space. Assume four observers; two observers travelling with the rod at
points A and B, and two observers in the rest frame of the cellular space. The
latter observers synchronise their clocks and measure the length [`AB] of the moving rod in the rest frame of the cellular
space (at equal rest frame times). Assume further that if two events are
synchronised in the rest frame of the cellular space, then they are synchronised
in the rest frame of the moving rod (wrong). An attempt to verify the
synchronisation of the clocks in the moving frame yields to a refusal, since if
the array is first emitted in the direction of motion and reflected afterwards,
then
|
(3.5) |
Of course, one could define a global time synchronisation; for instance by counting the number of cycle times of the array. This view, however, is an extrinsic (see below) view, which yields a preferred frame of reference, the frame at rest with respect to the cellular space. Such a definition is an alternative to the above definition of synchronisation given by Einstein. Einstein synchronisation is suitable for an intrinsic (see below) definition. It does not operate with a preferred frame of reference; the ``ideological preference'' being the relativity principle (symmetry). The according time will sometimes also be called intrinsic time. Intrinsic time is discrete. But, just as in relativity theory, for moving frames the elementary unit of intrinsic cycle time becomes dilatated. By a similar argument, the elementary unit of length becomes dilatated.
How far can one go to reconstruct relativity theory? Not very far if one does not take into account dispersion (e.g., the energy-momentum relation) and other dynamical features. Because even for relativistic kinematics, an alternative synchronisation could be defined by signals of arbitrary velocity or by defining a preferred frame of reference, such as the one at rest with respect to the cosmic background blackbody radiation. The physical content of Einstein synchronisation is the absence of any criterion by which one inertial frame might be preferred over the other. This is not the case for a cellular array, where - at least extrinsically - a preferred frame is the frame at rest with respect to the cellular array. Whether this preference holds for intrinsic perception is questionable.
Stated differently, one criterion for Einstein synchronisation is the invariance of the physical laws, such as electromagnetism, in arbitrary inertial frames. This is one reason why, for instance, synchronisation by sound is no good if one attempts to describe physical motion governed by electromagnetism. A detailed discussion of related topics is given in [,,], where it is argued that, depending on dispersion relations, creatures in a ``dispersive medium'' would develop a theory of coordinate transformation very similar to relativity theory.
Physical entities such as experimental measurement results, time series et cetera as well as physical theories can be symbolically treated on an equal footing - as an information source of symbols. (The notion of the term ``symbol'' remains intuitive, since any definition of ``symbol'' has to be done in terms of symbols.)
The first sections are devoted to general source coding schemes, which have been developed in the context of the foundations of quantum mechanics [,] and for the coding of time series, which has been introduced as symbolic dynamics []. We then consider the Gödel numbering of axiomatised physical theories. For a more detailed account, see, for instance, R. W. Hamming's book Coding and Information Theory [], R. J. McEliece, The Theory of Information and Coding [], or R. Ash, Information Theory [].
[System] A system is anything on which experiments can be performed.
Examples:
(i) Any finite deterministic automaton is a system. For instance, your PC / workstation (insert your favourite brand here:) ``¼'' is a system. Another example is the Moore automaton defined on p. .
(ii) Any mechanical or quantum device is a system.
(iii) The great beyond is no system, unless one succeeds to perform experiments on it. This statement needs not be entirely ridiculous, because what is called the great beyond depends on scientific knowledge and believes which transform(s) in history. For instance, what is presently called ``electromagnetic phenomena'' was mostly obscure and beyond anybody's experimental limits only 300 years ago.
[Experiment, manual, outcome, event]
A physical
system is characterised by experiments performed on it. The collection
of all experiments \M is called manual or propositional
calculus.
The i'th experiment will be denoted by ei. Any experiment can be decomposed into elementary TRUE- FALSE-experiments, called propositions and denoted by eij.
The members of an experiment are called outcomes. Associated with every proposition eij are the two elementary outcomes TRUE and FALSE.
An event pi Ì \M is a subset of an experiment ei, associated with particular outcome(s). In terms of elementary experiments, pij corresponds to the jth experiment eij of ei. Every event is also called element of physical reality.
Remarks:
(i) Fig. 4.1 shows the hierarchy ``manual ® experiment ® outcome ® event'' of perception defined by 4.1.
(ii) The terminology manual suggests a collection or catalogue of events. It defines an empirical universe and represents the ``admissible'' elements of physical reality; insofar the manual is the physical primitive.
(iii) It is easiest to imagine an empirical universe consisting of propositions, i.e., statements which are either TRUE or FALSE (see the following example).
(iv) Throughout this book a notion of element of physical reality is used which refers only to outcomes of actual measurements. This concept is more restricted than the EPR-terminology []:
``If, without in any way disturbing a system, we can predict [[!]] with certainty (i.e., with probability equal to unity) the value of a physical quantity, then there exists an element of physical reality corresponding to this quantity.''The most important difference is that Einstein does not deny the existence of elements of physical reality for physical entities which are only indirectly reconstructed from measurements of other entities and which have not actually been measured. E.g., assume an two entangled electrons in a singlet state. Then, according to Einstein, due to conservation of angular momentum, measurement of the spin of one electron fixes the spin the other one in that particular direction. Hence, according to Einstein, an element of physical reality could be ascribed to the spin in that particular direction of the other electron from the entangled pair. - No actual experiment needs to be performed to measure its spin directly. - This view of ``results from unperformed experiments'' is not adopted here.
Example:
Consider an automaton consisting of i internal states labelled by numbers 1 £ k £ i. Assume that all internal states can, informally speaking, be ``experimentally distinguished'' (for a definition, see chapter 1.4.2, p. pageref). Then, statements of the form ``the automaton is in state k,'' 1 £ k £ i, are propositions. The manual is the union of propositions. The possible experimental outcomes are TRUE and FALSE, respectively. After performing an actual experiment, an event is defined by one of these values. If defined, propositions can be constructed with logical ``or'' and ``and'' or ``not'' operations.
[Coding]
Let Èj=1M pij = pi be
an event, decomposed into elementary events pij. For elementary
events, the source alphabet consists of just two symbols s1
and s2, corresponding to TRUE and FALSE,
respectively. The code # ( pij) of pij is thus
defined by
|
Remarks:
(i) if one identifies s1=0 and s2=1, the event
pi can be coded as a binary rational number as follows:
|
(ii) Note that, instead of taking some (natural) numbers as symbols, one could have taken any other entity which could be identified as symbol, such as the letters of the English or Greek alphabet, or apples & oranges et cetera.
(iii) So far we have not dealt with the translation of these source code symbols into an alphabet which is algorithmically recognisable by some computable process. This is discussed in the section on encoding, p. . For information sources with source alphabet s1,s2,¼, sq with q symbols, the generalisation to radix ³ q notation is straightforward. If the code alphabet consists of less symbols than the source alphabet, then more sophisticated encoding techniques are necessary. These will be discussed below.
(iv) For technical reasons it often will be assumed that with any number p occurring in the system also ``elementary functions'' (recursive functions) g(p) thereof are codable within the system. This is equivalent to postulating the existence of a universal computer in the system.
(v) If the context is unambiguous, the code signs ``# ( )'' can be dropped. Any event pi is then written as pi=0.pi1pi2pi3¼pin.
(vi) For fractal source coding, see section , p. .
|
A partition x = { E1,¼,Eq} is a set of subsets of X such that
|
[Sequence of pointer readings]
To every Ei Î x, associate a pointer
reading si. A sequence of n pointer readings,
representing experimental outcomes, is denoted by
|
Notice that superscripts are used to denote the i'th symbol si, whereas subscripts are used to denote the place of the symbol in the sequence of pointer readings Y. The si's in Y are characters of an arbitrary alphabet S with q symbols. (This motivates the view of Y as a word, which is the starting point of linguistic analysis, in particular with respect to the Chomsky hierarchy of languages [].)
The terminology of symbolic dynamics is related to the general coding scheme: identify a time sequence Y obtained from a restricted alphabet S={ 0,1} with a particular event, say pi, by setting sj=pij.
Syntactically, any physical theory is representable by symbols. (Let us, for the moment, disregard the semantic content of a theory, i.e., its ``meaning'' et cetera.) These symbols can be ``read'' by some ``observer'' or agent by performing a series of experiments. Examples are the sensory perceptions associated with the reading of a book on theoretical physics, or the reading of a computer tape containing an effective procedure for solving an evolution equation et cetera; more generally, by experimentally observing some representations of theories. E.g., the reading of the term ``E=mc2'' can be seen as an optical scanning experiment which yields the (successive) events ``E,'' ``=,'' ``m,'' ``c'' and ``2.''
These symbols may be interpreted as input for some effective computation producing ``predictions.'' In what follows, we assume that physical theories are mechanistic, i.e., that they can be represented by an algorithm.1 For the time being, no attention is thus paid to theories which cannot be brought into such a form, as well as to the question, ``how can such theories be created?''
The enumeration of programs discussed in section 1.2.4, p. pageref, can be used to
generate a unique code of such a theory. An alternative would be the use of
instantaneous (prefix) codes for entities of formal systems; see section , p. .
Still another (probably not very practical) coding scheme is Gödel's original
construction of Gödel numbers, for which Gödel uses the uniqueness of prime
factors of whole numbers. The following definition deals with a very particular
language (alphabet), which is appropriate for the task of deriving the
incompleteness theorems []; a generalisation to more general languages
(alphabets) is straightforward. [Gödel numbers]
Assume an alphabet consisting of the symbols (, ), ~ ,® , ",
variables xk, constants ak, functions
fkn and relations Akn. It is
possible to map terms containing these symbols injectively onto the odd,
positive integers by the Gödel number function #: # (()=3; # ())=5; #
(,)=7;# ( ~ )=9; # (®)=11; #
(")=13; # (xk)=7+8k; #
(ak)=9+8k;#
(fkn)=11+8×(2n×2k); #
(Akn)=13+8×(2n×3k).
Every well-formed formula F=s1¼sk consisting of the symbols of the alphabet,
functions and relations can be injectively mapped via
|
Every deductive proof consists of a sequence of well-formed formulas
F1¼Fk can be uniquely mapped
via
|
In a binary Gödel numbering of the type just introduced, the axioms of a theory can be coded in bit strings (for instance by rewriting the values # (F) in binary notation). All such bit strings can be merged in a single finite bit string. This string is the object of investigation if one is interested in the algorithmic information content of a theory (see and , p. , ). The rules of inference can then be envisioned as a computable algorithm for enumerating the bit strings corresponding to provable theorems from the axiom bit string (for details, see table 2.2, p. pageref).
[Relation, equivalence relation]
Assume two
sets M,N. Every subset of the Cartesian product M×N is a (binary)
relation fRg, f Î M, g Î N. There are as many relations as there are subsets of M×N.
Let M=N. An equivalence relation satisfies the following
properties:
[Equivalence class, quotient] The subset fR={ g | fRg} is called the equivalence class of f modulo R.
The set M/R={fR | f Î M}, consisting of all equivalence classes modulo R is called the quotient of M by R. R yields a partitioning of M.
Every equivalence relation R corresponds to a function j such that j(f)=fR for
all f Î fR. Conversely, to every function
j:M® N corresponds an
equivalence relation `` º j '' on M, defined by
|
The elements in M/ º j , the quotient of M by º
j , correspond one-to-one to elements of N.
This amounts to renaming the elements of N by elements of the quotient
M/ º j . One can
thus define a map
|
In general, an isomorphism between two algebraic structures
(admitting certain ``similar'' operations) is a one-to-one element-to-element
correspondence which preserves all combinations. The following definition
specifies the operation to (binary) relations. [Isomorphism, automorphism] Let M1
be an algebraic structure with a (binary) relation R1 and let
M2 be another algebraic structure with a (binary) relation
R2. An isomorphism @ is a relation
defined by a one-to-one map (``translation'') I from M1 into
M2 which preserves the relations, i.e., M1 @ M2 with a one-to-one map I satisfying
|
If M1=M2, then @ is called an automorphism.
Remarks:
(i) Informally, the concept of an isomorphism is that two algebraic structures ``look much the same'' and become identical if only their entities are renamed.
(ii) An isomorphism defines an equivalence relation.
[Partial ordering, poset] A partially ordered set (poset) is a system M in which a binary order relation ``\im'' (inverse ``\fo'') is defined, which satisfies
The ``immediate superiority'' of f with respect to g will be defined next. By ``f covers g,'' it is meant that f\im g and that f\im x\im g is not satisfied by any x Î M, x ¹ f, x ¹ g.
Any finite partially ordered set M can be conveniently represented graphically by a Hasse diagram in the following way. [Hasse diagram] Let M be a partially ordered set. The Hasse diagram of M is a directed graph obtained by drawing small filled circles representing the elements of M, so that f is higher than g whenever f\im g. A segment is then drawn from f to g whenever f covers g.
Remarks:
(i) As the direction is always from the bottom to the top, Hasse diagrams are drawn undirected.
(ii) Any finite partially ordered set is defined up to isomorphism by its Hasse diagram. I.e., two isomorphic partially ordered sets must have a one-to-one relation between their highest & lowest elements, between elements just above lowest elements, and so on; corresponding elements must be covered equally.
[Linearly ordered set] If for all elements f,g of a partially ordered set M either the relation ``f\im g'' or the relation ``g\im f'' is satisfied, M is called a linearly ordered set.
[Chain] A chain in a partially ordered set M is a subset N Ì M which is a linearly ordered set.
[Length] Let N be a chain of a partially ordered set M. The length |N| of N is the cardinal number (i.e., the number of elements) of N. The length of the partially ordered set |M | is the supremum over the length of all chains in M minus 1. M has finite length if M < ¥.
[Atom,coatom] Assume a partially ordered set with a least element 0 and a greatest element 1.
Examples:
(i) Fig. 5.1(a) shows the Hasse diagram of a linearly ordered set. Fig. 5.1(b) shows the Hasse diagram of a non linearly ordered set.
(ii) Figs. 5.1(a)& (b) show the Hasse diagrams of atomic sets. Fig. 5.1(c) shows the Hasse diagram of a non atomic set.
Next the lattice concept will be introduced.
Lattice theory is the theory of partially ordered sets with the property that two arbitrary elements have a common upper and lower bound. It provides a ``generic'' framework for the investigation of important algebraic structures occurring, for instance, in Hilbert space theory and logic. [Lattice, version I] A partially ordered system \L = (\L , \fo ) with order relation ``\fo'' (inverse ``\im'') is a lattice if and only if any pair f,g of its elements has
inf(f,g) \fo f
inf(f,g) \fo g
h\fo f and h\fo g imply h\fo inf(f, g);
sup(f, g) \im f
sup(f, g) \im f
h\im f and h\im g imply h\im sup(f, g);
Instead of the definition 5.3, the following axioms
characterise a lattice alternatively ([], page 18). [Lattice, version II] A
lattice is an algebraic structure \L = (\L , \sqcap , \sqcup ) with two
operations ``\sqcap'' and ``\sqcup'' satisfying
|
Remarks:
|
(ii) A lattice is finite if the number of elements is finite.
(iii) The upper or lower bound of a lattice satisfy
|
(iv) Two lattices \L 1 and \L2 are isomorphic if there exists a one-to-one map I:\L 1® \L 2 of the lattice \L 1 into the lattice \L 2 such that the (binary) relations \sqcap and \sqcup are preserved; i.e., I(f\sqcap\L 1g)=I(f)\sqcap\L 2I(g) and I(f\sqcup\L 1g)=I(f)\sqcup \L 2I(g) for all f,g Î \L 1 (see also definion 5.1).
[Orthoganal complement, orthocomplement] f¢ is a orthogonal complement, or orthocomplement of f if
[Orthocomplemented lattice] A lattice is called orthocomplete, if for all f Î \L there exists a f¢ Î \L . I.e., an orthocomplemented lattice also contains the complements.
[Subalgebra] A subalgebra of an orthocomplemented lattice \L is a subset which is closed under the operations ¢,\sqcup ,\sqcap and which contains 0 and 1. Usually a distinction is made between a subalgebra and a sublattice of a lattice. The latter one is required to be closed under the operations \sqcup ,\sqcap and not necessarily under the orthocomplement ¢.
[Finite lattice] A lattice \L is called finite if its cardinal number |\L| < ¥ is finite; i.e., if it contains only a finite number of elements.
[Exchange axiom] A lattice \L satisfies the exchange axiom if for all a,b Î \L, if a covers a\sqcap b then a\sqcup b covers b.
[Complete lattice] A lattice \L is complete if, for every subset \L ¢ Ì \L, there exists a ``meet'' \sqcap \L¢ and a ``join'' \sqcup \L ¢ in \L.
Remark:
By induction it can be shown that any finite lattice is complete.
The set operations of union & intersection ``È'' and ``Ç'' satisfy the distribution laws. I.e., let A,B and C be three subsets of a set, let ``\sqcap = Ç'' and ``\sqcup = È,'' then the two distributive laws AÇ(BÈC)=(AÇB)È(AÇC) and AÈ(BÇC)=(AÈB)Ç(AÈC) are always satisfied, one implying the other. General lattice structures do not necessarily satisfy the distributive laws (for example, see Fig. on p. ).
[Distributive lattice] A lattice is called
distributive, if
|
Remarks:
(i) Every linearly ordered set is a distributive lattice.
(ii) In a distributive lattice the orthogonal complement of an element is uniquely defined.
(iii) A criterion whether or not lattices satisfy the distribution
laws is
|
|
Example:
The set of subsets of a set is a Boolean lattice with the identifications summarised in table 5.1.
lattice operation | set of subset of set |
order relation \fo | subset relation Ì |
``meet'' \sqcap | intersection Ç |
``join'' \sqcup | union È |
``complement'' ¢ | set complement ¢ |
Remark:
If f\sqcap g=0, then one may write f ^g; in words ``f is orthogonal to g.''
|
(5.15) |
The following theorem is stated without proof (see, for instance, G. Birkhoff, ref. [], p. 66): Any non-modular lattice contains the lattice of Fig. 5.2 as a subalgebra.
|
(5.16) |
The following implications are valid:
|
(5.17) |
[Commutator]
Two elements a and b of an orthomodular lattice
\L are commuting, denoted by aCb, iff a=(a\sqcap b)\sqcup (a\sqcap
b¢), or a=(a\sqcup b)\sqcap (a\sqcup b¢), or a\sqcap(a¢\sqcup b) = a\sqcap
b. Let
|
|
[Centre] The centre \Lc of an orthomodular lattice \L is the set of all elements commuting with all elements of \L.
[Irreducibility] An orthomodular lattice is irreducible if \Lc = { 0,1}.
Examples:
(i) The centre of every Boolean lattice is the original Boolean lattice; i.e., if A is a Boolean lattice, Ac=A.
(ii) Every Hilbert lattice is irreducible; i.e., \SSc = { 0,1}. For details, see, for instance, G. Kalmbach, Orthomodular Lattices [], chapters 1& 4, or Measures and Hilbert Lattices [], chapter 1.
[Prime ideal] A prime ideal of an orthomodular poset L is an ideal P, P ¹ L such that a ^b implies a Î P or b Î P.
[Prime] An orthomodular poset L is called prime if, for all a,b Î L, a ¹ b, there exists a prime ideal P of L such that a Î P, b Ï P or a Ï P, b Î P.
Remarks:
(i) Let P be a prime ideal. Then x Î P or x¢ Î P;
(ii) Let P be a prime ideal. aCb and a \sqcap b Î P implies a Î P or b Î P.
[State] A two-valued state on an orthomodular poset L is a mapping s:L®{0,1} such that
Remark:
Let L be an orthomodular poset. Then the mapping j:S(L)® P(L), j(s) = {x Î L | s(x) = 0} is bijective [].
Informally speaking, the term ``maximal'' refers to the greatest possible number of atoms. The construction of orthomodular lattices from a union of Boolean algebras is the ``inverse'' problem to the task of finding the block decomposition (i.e., finding the maximal Boolean subalgebras) of a given orthomodular lattice.
[Pasting, { 0,1}-pasting]
Let { \Li} be a
collection of orthomodular (Boolean) lattices such that, for all \Li
¹ \Lj, the following conditions are
satisfied:
In particular, if \Li Ì \Lj={0,1} for i ¹ j, then \L = Èi \Li is called the { 0,1}-pasting of the \Li's as the horizontal sum.
Every orthomodular lattice is a pasting of its blocks. For a detailed discussion, see G. Kalmbach, Orthomodular Lattices [], chapter 4, in particular remark 12, p. 50, as well as M. Navara and V. Rogalewicz [].
Every Hilbert lattice (for a definition, see 5.4.6, p. pageref) is an irreducible pasting of (not necessarily disjoint) blocks. While the intersection of all blocks of a Hilbert lattice contains only the two elements 0 and 1, the intersection of two arbitrary blocks of a Hilbert lattice may contain several atoms which are common to these blocks. The pasting of its blocks forming the structure of an arbitrary Hilbert lattice is schematically drawn in Fig. 5.4.
The inverse question of whether any pasting of Boolean algebras results in an orthomodular or Hilbert lattice has been investigated by R. J. Greechie and M. Dichtl, among others. In what follows we shall introduce notations and techniques which can be used to construct orthomodular lattices from Boolean algebras. No attempt is made here to extensively review these efforts. We shall deal only with the most simple cases, i.e., with almost disjoint systems of blocks. More general pasting techniques are reviewed in G. Kalmbach, Orthomodular Lattices [], chapter 4.
[Almost disjoint system of Boolean subalgebras]
Let B be a system of Boolean algebras. B is almost disjoint if for any pair A,B
Î B at least one of
the following conditions is satisfied:
[Loop of order n]
A finite sequence B0,¼,Bn-1 of a system of
blocks B is a loop of order n (n
³ 3) if (equality is understood as modulo n):
[Loop lemma (R. J. Greechie)]
Let B={Bi} be an
almost disjoint system of Boolean algebras. \L = ÈBi Î B Bi is an orthomodular partially
ordered set iff B does not contain a loop of
order 3. \L = ÈBi Î B Bi is
an orthomodular lattice iff B does not
contain a loop of order 3 or 4.
[Greechie lattice]
An orthomodular
lattice is a Greechie lattice if the following conditions are
satisfied:
A Greechie diagram consists of points `` ° '', representing the atoms. Lines linking the points (atoms) belong to the same block. Two lines are crossing in a common atom. For more general results on block pasting, see chapter 4 in G. Kalmbach's monography Orthomodular Lattices [].
Examples & remarks:
(i) Greechie diagram and Hasse diagram of
22:
Picture
Omitted
Figure
(ii) Greechie diagram and Hasse diagram of
23:
Picture
Omitted
Figure
(iii) The following lattice characterised by its Greechie and Hasse
diagrams is obtained by the pasting of two 23 with one common atom.
Picture
Omitted
Figure
(iv) This Greechie lattice is an example of an orthomodular lattice
which is not modular. It is a pasting of 22 and 23 and
contains the lattice drawn in Fig. 5.2, p. pageref.
Picture
Omitted
Figure
(v) Greechie diagram and Hasse diagram of an almost disjoint system of
blocks of 23 with a loop of the order of 3. According to the loop
lemma the resulting ``pasted'' structure is no orthomodular lattice (this can
also be seen by direct inspection).
Picture
Omitted
Figure
(vi) Two-dimensional case: The {0,1}-pasting (horizontal sum) of À1-many copies of 22 (À1 is the cardinality of the continuum) yields the Hilbert space C(\"2 ), where the dimension (i.e., the maximal number of linear independent vectors of \") is two.
Partition logics are introduced here to identify the experimental logics of generic (finite) automata, in particular of automata of the Mealy type.
Example:
Let M={1,2,3,4,5,6} and P={P1,P2} with P1={{1,4,5},{2},{3,6}} and P2={{1,2,4},{5},{3,6}}. The Greechie and Hasse diagrams of this logic are shown in Fig. , p. . For much more examples, see chapter .
Lattices defined by 5.5(d) and 5.5(e) are not
distributive, since for 5.5(d),
|
|
p1 | p2 | (p1 ® p2) | (p1 Ùp2) | (p1 Úp2) | Øp1 | p1=p2 |
TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE |
TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE |
FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE |
FALSE | FALSE | TRUE | FALSE | FALSE | TRUE | TRUE |
The identification of relations in lattice theory with relations in the propositional calculus is represented in table 5.3.
lattice operation | propositional calculus |
order relation \fo | implication ® |
``meet'' \sqcap | disjunction ``and'' Ù |
``join'' \sqcup | conjunction ``or'' Ú |
``complement'' ¢ | negation ``not'' Ø |
Remarks:
(i) The implication relation p1®
p2 can be composed from the other relations Ù,Ú, Ø (and
vice versa) by (Øp1 Úp2), or by
|
p1, | p2, | (p1 | = | (p1 | Ù | p2)) | Ú | (p2 | = | (p1 | Ú | p2)) |
T | T | T | T | T | T | T | T | T | T | T | T | T |
T | F | T | F | T | F | F | F | F | F | T | T | F |
F | T | F | T | F | F | T | T | T | T | F | T | T |
F | F | F | T | F | F | F | T | F | T | F | F | F |
(ii) p1 = p2 can be composed as (p1Ùp2)Ú(Øp1 ÙØp2).
(iii) The lattice defined by the propositional calculus is
distributive, i.e., the following pairs of formulas (separated by
``='') are equivalent:
|
(iv) The lattice is orthocomplemented; i.e.,
|
(v) By (iii) & (iv), the classical propositional calculus is Boolean.
[Field] A set of scalars K or (K,+,·) is a field if
Examples: The sets \Q , \R , \C of rational, real and complex numbers with the ordinary sum and scalar product operators ``+, ·'' are fields.
[Linear space] Let M be a set of objects such as vectors, functions, series et cetera. A set M is a linear space if
Examples:
(i) vector spaces M=\Rn with K=\R or \C;
(ii) M=l2, K=\C, the space of all infinite sequences
|
(iii) the space of continuous functions, complex-valued (real-valued) functions M=C(a,b) over an open or closed interval (a,b) or [a,b] with K=\C (K=\R);
[Metric, norm, inner product]
A metric, denoted by , is a binary function
which associates a distance of two elements of a linear vector space and which
satisfies the following properties:
Remarks:
(i) With the identifications
|
M has an inner product
|
(ii) The Schwarz inequality
|
(5.31) |
[Separability, completeness]
A linear
space M is separable if there exists a sequence {fn | n
Î \N , fn Î M}
such that for any f Î M and any e > 0, there exists at least one element fi of
this sequence such that
|
|
|
[Hilbert space, Banach space]
A Hilbert space \" is a
linear space, equipped with an inner product, which is
separable & complete.
A Banach space is a linear
space, equipped with a norm, which is separable &
complete.
Example:
l2,\C [see linear space example (ii)] with \l f | g\r = åi xi* yi.
[Subspace, orthogonal subspace]
A subspace § Ì \" of a Hilbert space is a subset of \" which is closed
under scalar multiplication and addition, i.e., f,g Î
H, a Î KÞ af Î §, f+g Î §, and which
is separable and complete.
An orthogonal subspace § ^ of §
is the set of all elements in the Hilbert space \" which are orthogonal to
elements of § , i.e.,
|
Remarks:
(i) (§ ^)^ = § ^^ = § ;
(ii) every orthogonal subspace is a subspace;
(iii) A Hilbert space can be represented as a direct sum of orthogonal subspaces.
[Linear functional] A map F:\"® K is a linear functional on \" if
[Dual Hilbert space] There exists a one-to-one map between the elements f of
a Hilbert space \" and the elements of F of the set \"f of bounded linear functionals on \", such that
|
Remarks:
(i) \"={ f,g,h, ¼} and \"f = { Ff,Fg,Fh, ¼} are isomorphic; instead of h º Fh, one could write hF º F;
(ii) (\"f )f = \".
[Isomorphism of Hilbert spaces] All separable Hilbert spaces of equal dimension with the same field K are isomorphic.
[Compatibility] Two subspaces § 1 and § 2 of a Hilbert
space are called compatible, denoted by §1 « § 2, if
|
Remarks:
(i) §1 « § 2 Û § 1 \sqcup (§ 2 \sqcap § 1^ )=§ 1 \sqcup § 2 = § 2 \sqcup (§ 1 \sqcap § 2^ ).
(ii) The relation « is symmetric, i.e., §1 « § 2 Û §2 « § 1 .
[Hilbert lattice]
A Hilbert lattice
is the lattice of all closed subspaces of a Hilbert space \"; it is denoted by
\SS . The ``meet'' \sqcap is identified with the
closure of the linear span, the ``join'' \sqcup is identified with
the set theoretic union and the ``complement'' of a subspace with its orthogonal
subspace. The identification of relations and operations in lattice theory with
relations and operations in Hilbert space is represented in table .
lattice operation | Hilbert space operation |
order relation \fo | subspace relation Ì |
``meet'' \sqcap | intersection of subspaces Ç |
``join'' \sqcup | closure of subspace spanned by subspaces Å |
``orthocomplement'' ¢ | orthogonal subspace ^ |
Remarks:
(i) \SS is an orthocomplemented lattice.
(ii) In general, § is not distributive. Let for instance § ¢,§ ,§ ^ be subsets of a
Hilbert space \" with § ¢ ¹
§, § ¢ ¹ §
^ , then (see Fig. , drawn from J. M. Jauch
[], p. 27)
|
(iii) A finite dimensional Hilbert lattice is modular.
(iv) Since Hilbert lattices are orthomodular lattices, they can be constructed by the pasting of blocks (blocks are maximal Boolean subalgebras); the blocks need not be (almost) disjoint. This fact will be used for the construction of automata yielding arbitrary finite subalgebras of Hilbert lattices as propositional calculi. See chapter , p. for details.
(v) In Hilbert lattices, the orthoarguesian law is satisfied.
For a definition and details, see J. R. Greechie [], G. Kalmbach [] and R.
Giuntini [], p. 138. The orthoarguesian law is not satisfied
by general orthomodular lattices. I.e.,
|
(5.34) |
An infinite dimensional \SS is a complete, atomic, irreducible, orthomodular lattice satisfying the exchange axiom (i.e., if a covers a\sqcap b then a\sqcup b covers b). In some notations these criteria are the defining features of Hilbert lattices (e.g., G. Kalmbach, Measures and Hilbert Lattices [], p. 11). Yet, they are also satisfied by lattices of Keller spaces [,,]; Keller spaces are different from Hilbert spaces in important aspects.
A complete axiomatisation for lattices of separable complex Hilbert spaces has been given by W. J. Wilbur []; see also the review by R. Piziak []: [(W. J. Wilbur [])] A lattice \L is isomorphic to a complex Hilbert lattice iff it satisfies the following conditions (i)-(vii).
However, as already conceded by W. J. Wilbur, the above characterisation, in particular axiom (ii) and countable completeness, is not purely algebraic. One may ask if it is possible to develop an axiomatisation of Hilbert lattices in purely algebraic terms. An answer to this question is unknown. For related discussions, see, for instance, G. Takeuti [] and G. Kalmbach [,]. It might be conjectured that there is no recursive enumeration of the axioms of Hilbert lattices [].
[Projection] A projection is an operator E defined on a Hilbert
space \" which is self-adjoint and idempotent, i.e.,
|
There is an isomorphism between the set of projections, denoted by \PP and all closed subspaces \SS of \": given a projection E, the corresponding subspace is § = E(\"); given a closed subspace §, any f Î \" can be decomposed uniquely as a sum f=g+h, where g Î § and h Î §^; the projection corresponding to § is then the operator E determined by Ef=g. This one-to-one correspondence allows a translation of the lattice structure of the subsets of Hilbert space discussed before into the algebra of projections. §1 « §2 Û E1 E2 = E2 E1 , where Ei is the projection onto §i, i=1,2. (This is a reason why quantum logic is also said to correspond to a non-commutative probability theory.)
The identification of relations and operations in lattice theory with the relations and operations in the lattice of projections is represented in table ([], p. 37).
lattice operation | lattice of projections |
order relation E1\fo E2 | E1E2=E1 |
``meet'' \sqcap | limn® ¥(E1E2)n |
``join'' \sqcup | E1+E2-E1E2 |
``orthocomplement'' ¢ | orthogonal subspace |
In what follows the mathematical formalism of quantum mechanics is very
briefly reviewed. No attempt is being made to give a complete set of axioms. See
also, for instance, J. von Neumann [], A. Messiah [], L. E. Ballentine []. Its
primitive concepts are those of state and observable. An
observable is represented by a self-adjoint operator O on a Hilbert
space \". In the spectral representation, O can be written as
|
(5.35) |
The projections En correspond to the physical properties of a quantum system. In J. von Neumann's words ([], p. 249; in our notation, E is a proposition):
Apart from the physical quantities Â, there exists another category of concepts that are important objects of physics - namely the properties of the states of the system S. Some such properties are: that a certain quantity  takes the value l - or that the value of  is positive - ¼To each property E we can assign a quantity which we define as follows: each measurement which distinguishes between the presence or absence of E is considered as a measurement of this quantity, such that its value is 1 if E is verified, and zero in the opposite case. This quantity which corresponds to E will also be denoted by E.
Such quantities take only the values of 0 and 1, and conversely, each quantity  which is capable of these two values only, corresponds to a property E which is evidently this: ``the value of  is ¹ 0.'' The quantities E that correspond to the properties are therefore characterized by this behavior.
That E takes on only the values 0,1 can also be formulated as follows: Substituting E into the polynomial F(l)=l-l2 makes it vanish identically. If E has the operator E, then F(E) has the operator F(E)=E-E2, i.e., the condition is that E-E2=0 or E=E2. In other words: the operator E of E is a projection.
The projections E therefore correspond to the properties E (through the agency of the corresponding quantities E which we just defined). If we introduce, along with the projections E, the closed linear manifold M, belonging to them (E=P M), then the closed linear manifolds correspond equally to the properties of E.
In the case of nondegenerate eigenvalues rn and eigenvectors |nñ, O can be written as
|
(5.36) |
|
(5.37) |
|
(5.38) |
|
(5.39) |
|
(5.40) |
G. Birkhoff and J. von Neumann suggested [], that, roughly speaking, the ``logic of quantum events'' - or, by another wording, quantum logic or the quantum propositional calculus - should be obtainable from the formal representation of physical properties. They conjectured that the Hilbert space formalism of quantum mechanics [] is an appropriate theory of quantum events. Since, in this formalism, projection operators correspond to the physical properties of a quantum system, quantum logics is modelled in order to be isomorphic to the lattice of projections \PP of the Hilbert space \", which in turn is isomorphic to the lattice \SS of the set of subspaces of a Hilbert space. I.e., by assuming the physical validity of the quantum Hilbert space formalism, the corresponding isomorphic logical structure is investigated. Since, in this approach, quantum theory comes first and the logical structure of the phenomena are derived by analysing the theory, this could be considered a ``top-down'' method. In the case of the automata propositional calculus (see below, chapter , p. ) one proceeds ``bottom-up'', i.e., by analysing the structure of elementary processes first and conjecturing a corresponding linear space structure afterwards.
The order relation p1\fo p2 is identified with ``whenever p1 is true it follows that p2 is true, too.'' It is also written as ``p1 ® p2.''
The proposition p1Ùp2 will be
identical to the ``and'' operator of the ordinary (classical) propositional
calculus; i.e.,
|
(5.41) |
In J. M. Jauch's interpretation, p1\sqcap p1 is realised by an infinite sequence of alternating pairs of filters for the propositions p1 and p2, respectively. The proposition p1\sqcap p1 is TRUE if the system passes this filter, and it is not true otherwise.
The ``or'' proposition p1Úp2
satisfies only the following relation:
|
(5.42) |
Because of (42), the
lattice structure does not imply the distributive laws. Indeed, since \SS is non
distributive, the quantum propositional calculus has to be non distributive as
well. G. Birkhoff and J. von Neumann [] suggested the weaker modular
identity
|
(5.43) |
|
(5.44) |
generic lattice | order relation \fo | ``meet'' \sqcap | ``join'' \sqcup | ``complement'' ¢ |
lattice | subset Ì | intersection Ç | union È | complement |
of subsets | ||||
of a set | ||||
propositional | implication | disjunction | conjunction | negation |
calculus | ® | ``and'' Ù | ``or'' Ú | ``not''Ø |
Hilbert | subspace | intersection of | closure of | orthogonal |
lattice | relation | subspaces Ç | subspace linear | subspace |
Ì | span Å | ^ | ||
lattice of | E1E2=E1 | limn® ¥ (E1E2)n | E1+E2-E1E2 | orthogonal |
projection | if E1,E2 | subspace | ||
operators | commute | |||
Epistemologically, the intrinsic/extrinsic concept, or, by another naming [,], the endophysics/exophysics concept, is related to the question of how a mathematical or a logical or an algorithmic universe is perceived from within/from the outside. The physical universe (in O. E. Rössler's dictum, the ``Cartesian prison''), by definition, can be perceived from within only.
Extrinsic or exophysical perception can be conceived as a hierarchical process, in which the system under observation and the experimenter form a two-level hierarchy. The system is laid out and the experimenter peeps at every relevant feature of it without changing it. The restricted entanglement between the system and the experimenter can be represented by a one-way information flow from the system to the experimenter; the system is not affected by the experimenter's actions. (Logicians might prefer the term meta over exo.)
Intrinsic or endophysical perception can be conceived as a non-hierarchical effort. The experimenter is part of the universe under observation. Experiments use devices and procedures which are realisable by internal resources, i.e., from within the universe. The total integration of the experimenter in the observed system can be represented by a two-way information flow, where ``measurement apparatus'' and ``observed entity'' are interchangeable and any distinction between them is merely a matter of intent and convention. Endophysics is limited by the self-referential character of any measurement. An intrinsic measurement can often be related to the paradoxical attempt to obtain the ``true'' value of an observable while - through interaction - it causes ``disturbances'' of the entity to be measured, thereby changing its state. Among other questions one may ask, ``what kind of experiments are intrinsically operational and what type of theories will be intrinsically reasonable?''
Imagine, for example, some artificial intelligence living in a (hermetic) cyberspace. This agent might develop a ``natural science'' by performing experiments and developing theories. It is tempting to speculate that also a figure in a novel, imagined by the poet and the reader, is such an agent.
Since in a virtual reality only syntactic structures are relevant, one might wonder if concerns of this agent about its ``hardware basis,'' e.g., whether it is ``made of'' billiard balls, electric circuits, mechanical relays or nerve cells, are mystic or even possible (cf. H. Putnam's brain-in-a-tank analysis []). I do not think this is necessarily so, in particular if the agent could influence some features of this hardware basis. One example is a hardware damage caused by certain computer viruses by ``heating up'' computer components such as storage or processors. I would like to call this type of ``back-reaction'' of a virtual reality on its computing agent ``virtual backflow interception.'' Intrinsic phenomenologically, the virtual backflow could manifest itself by some violation of a ``superselection rule;'' i.e., by some virtual phenomenon which violates the fundamental laws of a virtual reality, such as symmetry & conservation principles.
No attempt is made here to (re-)write a comprehensive history of related concepts; but a few hallmarks are mentioned without claim of completeness. Historically, Archimedes conceived ``points outside the world, from which one could move the earth.'' Archimedes' use of ``points outside the world'' was in a mechanical rather than in a metatheoretical context: he claimed to be able to move any given weight by any given force, however small. The 18'th century physicist B. J. Boskovich [] realised that it is not possible to measure motions or transformations if the whole world, including all measurement apparata and observers therein, becomes equally affected by these motions or transformations (cf. O. E. Rössler [], p. 143). Fiction writers informally elaborated consequences of intrinsic perception. E. A. Abbot's Flatland describes the life of two- and onedimensional creatures and their confrontation with higher dimensional phenomena. The Freiherr von Münchhausen rescued himself from a swamp by dragging himself out by his own hair. Among contemporary science fiction authors, D. F. Galouye's Simulacron Three and St. Lem's Non Serviam study some aspects of artificial intelligence in what could be called ``cyberspaces.'' Media artists such as Peter Weibel [] create virtual realities and are particularly concerned about the interface between ``reality'' and ``virtual reality,'' both practically and philosophically. On the forefront of interface designs are cochlear implants, which restore some degree of hearing in clinical patients with severe hearing impairment []. Finally, by outperforming television & computer games, commercial virtual reality products might become very big business. From these examples it can be seen that concepts related to intrinsic perception may become fruitful for physics, the computer sciences, business and the arts as well.
Already in 1950 (19 years after the publication of Gödel's incompleteness theorems), K. Popper has questioned the completeness of self-referential perception of ``mechanic'' computing devices []. Popper used techniques similar to Zeno's paradox (which he called ``paradox of Tristram Shandy'') and ``Gödelian sentences'' to argue for a kind of ``intrinsic indeterminism.''
In a pioneering study on the theory of (finite) automata, E. F. Moore has presented Gedanken-experiments on sequential machines []. There, E. F. Moore investigated automata featuring, at least to some extend, similarities to the quantum mechanical uncertainty principle. In the book Regular Algebra and Finite Machines [], J. H. Conway has developed these ideas further from a formal point of view without relating them to physical applications. Probably the best review of experiments on Moore-type automata can be found in W. Bauer's book Automatentheorie [] (in German).
In the context of system science, T. Toffoli [] and E. Fredkin [,] conceive an observer together with the interaction mechanism, embedded in the same medium as the observed object. They discuss also observers which are external to the medium.
D. Finkelstein [,] has considered Moore's findings from a more physical point of view, introducing an ``experimental logic of automata'' and the term ``computational complementarity.'' An illuminating account on endophysics topics can be found in O. E. Rössler's article on Endophysics [], as well as in his book Endophysics (in German) []; O. E. Rössler is a major driving force in this area. H. Primas has considered ``endophysical'' and ``exophysical'' entities [,], which, very roughly speaking, correspond to E. O. Rössler's and the author's terminology by the exchange ``exo (extrinsic) « endo (intrinsic).'' H. Primas' approach is ``top-down,'' i.e., theoretical models form an (``Platonic'') exo-world, in which the observed system is perceived by an endo-description (cf. D. Finkelstein [,]). These concepts should not be confused with the present analysis, which is ``bottom-up'' and procedural; i.e., which concentrates on the specification of the measurement act, in particular ``from within'' a system. A forthcoming collection of articles on related topics is edited by H. Atmanspacher and G. Dalenoort [].
The terms ``intrinsic'' and ``extrinsic'' appear in the author's studies on intrinsic time scales in arbitrary dispersive media [,,], very much (unknowingly) in the spirit of B. J. Boskovich. There, the intrinsic-extrinsic concept has been re-invented (probably for the 100'th time, and, I solemnly swear) independently. It is argued that, depending on dispersion relations, creatures in a ``dispersive medium'' would develop a theory of coordinate transformation very similar to relativity theory. Another proposal by the author was to consider a new type of ``dimensional regularisation'' by assuming that the space-time support of (quantum mechanical) fields is a fractal []. In this approach one considers a fractal space-time of Hausdorff dimension D=4-e, with e << 1, which is embedded in a space of higher dimension, e.g., \Rn ³ 4. Intrinsically, the (fractal) space-time is perceived ``almost'' as the usual fourdimensional space.
Besides such considerations, J. A. Wheeler [], among others, has emphasised the role of observer-participancy. In the context of what is considered by the Einstein-Podolsky-Rosen argument [] as ``incompleteness'' of quantum theory, A. Peres and W. H. Zurek [,] and J. Rothstein [] have attempted to relate quantum complementarity to Gödel-type incompleteness.
In what follows, the intrinsic-extrinsic concept will be made precise in an algorithmic context, thereby closely following E. F. Moore []. The main reason for the algorithmic approach is that algorithmic universes (or, equivalently, formal systems) are the royal road to the study of undecidability. The intrinsic-extrinsic concept will be applied to investigate computational complementarity (chapter , p. ) and intrinsic indeterminism (chapter , p. ); both again in the algorithmic context. Other tasks, such as the setting of space-time coordinates [,,], may require other specifications, in particular of the interface.
(i) only the input and output terminals of the automaton are accessible. The experimenter is allowed to perform experiments via these interfaces in the form of stimulating the automaton with input sequences and receiving output sequences from the automaton. The experimenter is not permitted to ``open up'' the automaton, but
(ii) the transition and output table (diagram) of the automaton (in its reduced form) is known to the experimenter (or, if you prefer, is given to the experimenter by some ``oracle'').
The most important problem, among others, is the distinguishing problem: it is known that an automaton is in one of a particular class of internal states: find that state.
In the first kind of experimental situation, only a single copy of the automaton is accessible to the experimenter. The second type of experiment operates with an arbitrary number of automaton copies. Both cases will be discussed in detail below.
If the input is some predetermined sequence, one may call the experiment a preset experiment. If, on the other hand, (part of) the input sequence depends on (part of) the output sequence, i.e., if the input is adapted to the reaction of the automaton, one may call the experiment an adaptive experiment. We shall be mostly concerned with preset experiments, yet adaptive experiments can be used to solve certain problems with automaton propositional calculi; see chapter , p. .
Research along these lines has been pursued by S. Ginsburg [], A. Gill [], J. H. Conway [] and W. Brauer [].
In the first kind of Gedankenexperiment, only one single automaton copy is presented to the experimenter. The problem is to determine the initial state of the automaton, provided its transition and output functions are known (distinguishing problem). In a typical experiment, the automaton is ``feeded'' with a sequence of input symbols and responds by a sequence of output symbols. An input-output analysis then reveals information about the automaton's original state.
Assume for the moment that such an experiment induces a state transition of the automaton. I.e., after the experiment, the automaton is not in the original initial state. In this process a loss of potential information about the automaton's initial state may occur. In other words: certain measurements, while measuring some particular feature of the automaton, may make impossible the measurement of other features of the automaton. This irreversible change of the automaton state is one aspect of the ``observer-participancy'' in the single-automaton configuration. (This is not the case for the multi-automaton situation discussed below, since the availability of an arbitrary number of automata ensures the possibility of an arbitrary number of measuring processes.)
In developing the intrinsic concept further, the automaton and the experimenter are ``placed'' into a single ``meta''-automaton. One might think of the experimenter as of a human being or an automaton. Thereby, a theoretical modelling pursued by the experimenter, such as a(n) (algorithmic) description of the automaton et cetera, is placed inside the experimenter, or at least at the same (hierarchical) level as the experimenter. If the experimenter reacts mechanically, the setup can be readily constructed by simulating both the original finite deterministic ``black box'' automaton as well as the experimenter and their interplay by a universal automaton. One can imagine such a situation as one subprogram checking another subprogram, also including itself. For an illustration, see Fig. .
In certain cases it is necessary to iterate this picture in the following way. Suppose, for instance, the experimenter attempts a complete intrinsic (algorithmic) description. Then, the experimenter has to give a complete description of his own intrinsic situation. In order to be able to model the own intrinsic viewpoint, the experimenter has to introduce an or system which is a replica of its own universe. This amounts to substituting the ``meta''-automaton for the automaton in Fig. 6.1. Yet, in order to be able to model the intrinsic viewpoint of a new experimenter in this new universe, this new experimenter has to introduce another system which is a replica of its own universe, ¼, resulting in an iteration ad infinitum. In analogy to the process of considering subsystems relative to our system, one may assume supersystems and superobservers, for which our system and our observers are just models and metaphors of perception. By that reasoning one arrives at a hierarchy of intrinsic/extrinsic-descriptions; the only relevant entity being the relative position therein. One may conjecture that an observer in a hypothetical universe corresponding to the ``fixed point'' or ``invariant set'' of this process has complete self-comprehension; see Fig. .
Of course, in general this observer cannot be a finite observer: a complete description would only emerge in the limit of infinite iterations (cf. K. Popper's ``paradox of Tristram Shandy'' and chapter p. ). Finite observers cannot obtain complete self-comprehension. In psychology, the above setup is referred to as the observing ego. In experiments of this kind - e.g., imagine a vase on a table; now imagine you imagining a vase on a table; now imagine you imagining you imagining a vase on a table; now imagine you imagining you imagining you imagining a vase on a table; now imagine you imagining you imagining you imagining you imagining a vase on a table - humans may concentrate on 3-5 levels of iteration.
The second kind of experiment operates with an arbitrary number of automaton copies. One automaton is a copy of another if both automata are isomorphic and if both are in the same initial state. With this configuration the experimenter is in the happy condition to apply as many input sequences to the original automaton as necessary. Again, any theoretical modelling pursued by the experimenter is placed within the experimenter, or at least at the same hierarchical level. In the extrinsic case, theoretical (algorithmic) descriptions are in the ``outside'' of the automaton copies. In a sense, the observer is not bound to ``observer-participancy,'' because it is always possible to ``discard the used automaton copies,'' and take a ``fresh'' automaton copy for further experiments. For an illustration, see Fig. .
In the foregoing section, important features of the extrinsic-intrinsic concept have been isolated in the context of finite automata. A generalisation to arbitrary physical systems is straightforward. The features will be summarised by the following definition. (Anything on which experiments can be performed will be called system. In particular, finite automata are systems.) [Extrinsic, intrinsic] An intrinsic quantity is associated with an experiment
In what follows, the term experimenter is a synonym for experimenter-theoretician.
Remarks:
(i) One may ask whether, intuitively, the extrinsic point of view might be more appropriately represented by, stated pointedly, the application of a ``can-opener'' for the ``black box'' to see ``what's really in it.'' Yet, while the physical realisation might be of some engineering importance, the primary concern is the phenomenology (i.e., the experimental performance of the system) and not how it is constructed. In this sense, the technological base of the automaton is irrelevant. For the same reason, i.e., because this is irrelevant to phenomenology, it is not important whether the automaton is in its minimal form.
(ii) The requirement that in the extrinsic case an arbitrary
number of system copies is available is equivalent to the statement that no
interaction takes place between the experimenter and the system. (The
reverse information flow from the observed system to the experimenter is
necessary.) This results in a one-way information flow in the extrinsic case:
|
|
(iii) The definition applies to physical systems as well as to logic and (finite) automata. Automaton worlds provide an ideal ``playground'' for the study of certain algorithmic features related to undecidability, such as ``computational complementarity'' and ``intrinsic indeterminism.''
(iv) In an extreme case, the input has no effect on the object; the object is ``just telling its story.''
(v) The extrinsic-intrinsic problem is the interrelation between extrinsic and intrinsic entities.
It is important to realise that an experimenter-theoretician or, by another naming, an observer, not only consists of primary senses to express and receive information by the interface (see below), but is also the location of theoretical modelling. Such theoretical models may, for instance, be contained and represented by sequences of symbols in books (on theoretical physics, theoretical chemistry et cetera); or stored in computers. Therefore, one should not perceive the observer as a homogeneous entity, such as a human being devoid of any (cultural) context, but by a structure of primary senses and a theorical complex.
An interface is also denoted by the terms filter or cut, the latter one deriving its name from the Cartesian cut (mind/body interface, if it exists []) and the Heisenberg cut, referring to complementarity. As suggested by O. E. Rössler [,], the generic interface is denoted by a swinging double line ``.'' Throughout this book the interface is modeled by the exchange of symbols, denoted by ``(0.5,0)\rightleftarrows.'' I.e., the interface is reduced to its syntax.
In what follows, the Einstein-Podolsky-Rosen concept of completeness of physical theories will be adapted []:
Whatever the meaning assigned to the term complete, the following requirement for a complete theory seems to be a necessary one: ¼ every element of physical reality must have a counterpart in the physical theory.This can be translated into the new terminology. [Completeness] A theory is complete if there is a (computable/recursive) one-to-one correspondence to the manual of a system.
Whereas in this book the term completeness is used in close analogy to Einstein's original approach, a different, more restricted concept is used for an element of physical reality; cf. section 4.1, p. pageref.
One may ask, ``does there exist an intrinsically defined theory about its own system which is in one-to-one correspondence with system?'' Or, stated pointedly, ``could we ever have a complete theory of the world?'' These questions will be dealt with in chapter , p. .
Informally speaking, complexity is some kind of measure of the computational resources necessary to perform a calculation. These resources have been grouped into two categories:
(i) Static complexity, subdivided into algorithmic information, which is a measure of the smallest program length to perform a given task, and loop-depth complexity; and
(ii) Dynamic or computational complexity, which can be subdivided into time complexity or depth and storage capacity. Table schematically shows the various complexities discussed below.
complexity | static | algorithmic information |
(program size) | ||
(loop depth) | ||
dynamic | computational complexity | |
(execution time) | ||
(storage size) | ||
Several attempts have been made in the literature to propose complexities which grasp the intuitive notion of ``organisation.'' These measures shall not be discussed here. Ch. Bennett's notion of ``logical depth'' [] will be reviewed in the context of computational complexity (chapter , p. ). A notion of ``self-generating'' complexity was proposed by P. Grassberger [].
The physical importance of algorithmic information will be shortly discussed next. One criterion of selection among alternative theories is the (minimal) length of their representation. This amounts to one form of ``Occam's razor,'' a principle according to which the best explanation of an event is the one that is the ``simplest.'' In the case of algorithmic information theory, the informal term ``simplest'' is specified by ``shortest.'' I.e., some physical theory qualifies over others with an equal amount of ``predictive power'' if it can be expressed in shorter terms. (This criterion, judged by history, does in general not apply to the pursuit of the natural sciences [,,,,]. One may also ask why the ``simplest theory'' should in priciple be the ``true theory,'' if in this context both ``simplicity'' and ``truth'' can be given a meaning whatsoever.)
Another application of algorithmic information is this: The ``natural laws'' governing the evolution of a mechanistic system are representable by effectively computable processes or recursive functions. Algorithmic information is a measure of the length of such a description. - Yet, there exist (mathematical) objects (such as the Mandelbrot set [,,]) which have a ``very short'' description (of a few hundred bits of FORTRAN code) and intuitively ``look complex.''
There is still another application of algorithmic information: A notion of randomness based on algorithmic information (see chapter , p. ) is equivalent to statistical-based notions of randomness. A chaotic system could be defined by the requirement that its description cannot be ``compressed'' into any program code (corresponding to a ``natural law'' or an algorithm) of ``shorter length'' than the original description. (An alternative notion of randomness, based on ``computational irreducibility,'' or dynamical ``runtime'' complexity is introduced in section , p. .)
|
Recall that, according to the Church-Turing thesis, recursive ``natural laws'' are equivalent to algorithms. Such algorithms representing ``natural laws'' have to be formulated in a specific language or code which is recognisable by some computational device. Stated differently, the algorithm contains information (the ``law'') for some automaton, which deciphers this information and outputs a data stream. This data stream should be identical with the source data stream or the output of the corresponding physical system (Fig. ).
In other words, we are concerned with a suitable representation of some experimental ``message.'' We are not concerned with the question of a ``meaning'' of the ``message'' or an ``underlying law,'' but we shall investigate the technical question of how to encode source symbols in a ``reasonable'' way, such that the encoding is unique, compact, easy to recognise and transmit, and so on.
For further reference, see R. W. Hamming's book Coding and Information Theory [], as well as R. J. McEliece, The Theory of Information and Coding [] and R. Ash, Information Theory [], among others. A detailed consideration of code word length is also given by S. K. Leung-Yan-Cheong and T. M. Cover [], J. Rissanen [], and P. Elias []. A detailed treatment of algorithmic information theory can also be found in C. Calude's forthcoming book Information and Randomness - An Algorithmic Perspective [].
Let us call a string of symbols x a prefix of
another string of symbols y if y=xz for some string of symbols z. In the
following, the terms instantaneous, prefix, prefix-free and
self-delimiting will be used synonymously. A motivation for
instantaneous codes comes from the concatenations of source symbols.
Consider the following example. Assume, for the moment, q=4 and r=2, and two
encoding strategies #1 and #2, represented by
|
|
Another undesirable feature of the encoding is that the original source
information can be translated only after the whole message has been
sent. This is, for instance, the case for #2. Therefore, another
criterion for a code is its instant decodability. I.e., the symbols
should be decodable even before the whole code has been transmitted.
This can be achieved by the requirement that no code word is a prefix of
another code word. Take, for instance,
|
Instantaneous codes need no ``end-markers'' to indicate when a message ends. The requirement that no code word is the prefix of another code word yields a construction of such codes by a ``coding tree.'' Starting from a ``root node,'' every node of the tree branches into r ``leafs.'' Every node which is occupied by a code word is a terminal node; i.e., in order to avoid that this code word is the prefix of another code word, the tree gets pruned at that node. This is illustrated in Fig. , which contains two instantaneous encoding schemes. Fig. (a) represents one possible instantaneous coding scheme for q=8 and r=2, whereas (b) lists one possible instantaneous coding scheme for q=4 and r=2.
(For arbitrary q and r, a generalisation is straightforward.) The statement ``the price for instant decodability is the length of the code'' will be quantified next.
[Kraft inequality []] An instantaneous code of a q-symbol source
alphabet
{s1,s2,¼,sq} with encoded word length l1 £ l2 £ ¼ £ lq satisfies the
Kraft inequality
|
(7.1) |
Proof:
For simplicity assume r=2. We shall prove the Kraft inequality by induction. Consider first the binary tree of length one, represented in Fig. (a).
We can encode with this tree a source code containing one or two symbols, yielding a Kraft sum 1/2 £ 1 or 1/2+1/2 £ 1, respectively.
Next assume that the Kraft inequality is true for all trees of length n. Now
consider in Fig. 7.3(b) a
tree of maximum length n+1. The first ``root'' node (the tree is drawn upside
down) leads to at most a pair of subtrees (of length at most n), for which we
have the inequalities K¢ £ 1
and K¢¢ £ 1, where K¢ and K¢¢ are the values of the respective
Kraft sums. Each length li in a subtree is increased by one when the
subtree is joined to the main tree as specified. Therefore, for binary encoding,
i.e., for r=2, an extra factor 1/2 appears, and one
obtains
|
Example:
Binary programs p of unbounded length which have instantaneous codes and run
on universal computers U correspond to r=2 and q=¥. In
this case the Kraft inequality becomes (``|p|'' stands for the length of p)
|
(7.2) |
Remark:
Notice that if every terminal node is identified with one code word si, then the Kraft sum (1) is exactly one. Only for inefficient coding, i.e., if one or more terminal nodes are not used, the inequality sign applies. The codes of Fig. 7.2 are both efficient, and the Kraft inequality (1) holds with equality. The optimal encoding strategy (in the sense of ``shortest encoding of messages'') depends also on the relative frequency of occurrence of the source symbols. It is evident that ``more frequent source symbols'' should be identified with ``shorter'' code words, thereby enabling (on the average and compared to other encoding schemes) a shorter length of the message as a whole. For a recent account of the algorithmic version of the Kraft inequality, see C. Calude and Eva Kurta [].
Intuitively similar formulations of (preliminary concepts of) algorithmic information (content) (also known as program size complexity, Kolmogorov complexity, algorithmic complexity, Kolmogorov-Chaitin randomness et cetera) have (on the basis of absence of proof to the contrary independently) appeared in the writings of G. J. Chaitin [,,], R. J. Solomonoff [] and A. N. Kolmogorov []. The following brief expose of algorithmic information theory follows Chaitin's approach [,,] and adapts his conventions: O(1) is read ``of the order of 1'' and denotes a function whose absolute value is bounded by an unspecified positive constant. I.e., j(x) = O(1) means |j| < A , where A is a constant independent of the arguments x of j. ``|s|'' denotes the length of an object s coded in binary notation. If not stated otherwise, only universal computers, denoted by U, are considered. For a more detailed treatment, the reader is referred to G. Chaitin's books Algorithmic Information Theory [] and Information, Randomness and Incompleteness [], the latter one containing the collection of his works, including some (easy-to-read) introductory articles. One of G. Chaitin's intentions is to extend Gödel's incompleteness theorems [] to algorithmic information theory. Informally speaking, in this view algorithmic information of an entity is a sort of quantification or measure for ``mathematical truth'' captured by that entity. G. Chaitin is also pointing to the fact that there exists ``randomness in arithmetic'' (i.e., arithmetic statements which are just as ``random'' as the outcome of the flipping of a fair coin). This approach has sparked a great number of reactions, mostly positive [,,], some sceptical [,]. A further (independent) account can be found in a review article by A. K. Zvonkin and L. A. Levin []. For more references, see also the review article by M. Lie and P. M. B. Vitányi [] and M. van Lambalgen's dissertation, published in []. The forthcoming beook by C. Calude [] will surely be a major source of reference. [In a note in 1972, K. Gödel introduced a notion of program size complexity in Some remarks on the undecidability results, reprinted in the Collected Works, Volume II [], p. 305 (see ``Another version of the first undecidability theorem''), which was based on the (not necessarily shortest) number of symbols or the system of axioms necessary to solve a problem. In the context of attempts to formulate and evaluate a unified theory, Einstein introduced a measure of strength of field equations, based upon the number of free parameters (see [], p. 138).]
Informally speaking, the basic idea is the characterisation of a mathematical object by the length of the shortest program which outputs a code of that object. This measure is denoted by H¢ in honour of L. Boltzmann's function proportional to entropy []. The apostrophe sign `` ¢ '' indicates unspecified program code. In what follows, H¢ is measured in bits. The original approach is ambiguous: consider for instance a binary sequence x(n) of length n. At first glance it seems that the information content H¢ of x(n) could not exceed the length of that sequence [plus O(1) from additional program statements like PRINT x(n)]; i.e. H¢(x(n)) £ n+O(1).
So far, no specific encoding of the program generating the sequence x(n) has
been specified. Therefore, H¢ will be defined
ambiguously. (This is where the encoding of programs discussed in the previous
sections comes in.) Assume, for instance, that specific symbols (such as the
blank symbol `` _ ''), which we shall call end-markers, are allowed to end-mark
an enumeration. By very efficiently utilising these end-markers, a program may
scan through all digits of x(n), determine its length in real-time execution,
print out x(n) as well as n, and finally halt. Notice that the length n
of bits may represent an additional valuable piece of information,
which may be worth H¢( n ) bits. As a consequence of
tricky programming it might be possible to squeeze H¢(
n ) more irreducible information into x(n) than is contained in the n
bits of x(n), resulting in a total information of
|
The above argument can be iterated. (Let ``|x |'' of an object encoded as binary string stand for the
length of that string.) Indeed, by allowing end-markers, the length |x| of a string x could be subject
to more information than just ||x ||,
since, by the same argument as before, it could contain a total information of
H¢(|x|) £ ||x||+
H¢( ||x||)+O(1).
Therefore, by recursion,
|
|
|
|
Consequently, the allowance of end-markers results in the undesirable feature that there exist sequences of length n - and thus of algorithmic information content of the order of n - with ``overall'' information of the order of n+log2n+¼ bits. Another, more technical problem, concerns the subadditivity of algorithmic information [i.e., H¢(x,y) £ H¢(x)+H¢(y)+O(1)].
One could speculate that it might be possible to squeeze even more than n+H¢(n) bits of information into a sequence of length n by counting the computation time t on some specified computer. This is impossible [], because the run time t of a program p that halts can be computed from p by an algorithm which simulates p and in addition counts the execution time t. Such an algorithm would at most take O(1) bits more than p. Thus H¢(p, t) = H¢(p) + O(1).
G. Chaitin [] (see also L. A. Levin []) has proposed a modification of the original definition, which has eliminated both deficiencies and has restored the subadditivity property of algorithmic information. This modification is based upon the restriction to instantaneous (prefix) program codes. Programs should be self-delimiting; i.e., they should, for instance, do not contain any end-markers. One way to achieve this is the use of symbols which can be instantaneous decoded; this results in prefix coding techniques discussed in section 7.1.1, p. pageref. Another, probably more intuitive, way to achieve instant decodability is the requirement that the program must somehow be informed about the length n of a sequence x(n) beforehand. This requirement necessitates an additional program statement indicating n without end-markers, yielding an increase in program length up to lg2* (n) bits. With this definition, a string of size n can have an algorithmic information content of n+H(n)+O(1). Consequently, the length - and thus the algorithmic information - of a binary program producing a binary string is not bounded by the length of that string, but rather by (length) +log2(length )+log2(log2( length))+¼+O(1).
In general and on the average, instantaneous decodability will result in longer program codes than codes of programs which are non uniquely or non instantaneous decodable. This is the reason why the Kraft sum (1), which can be interpreted here as the exponentially weighted sum over the length of all allowed programs, converges. (A program coded by symbols which are instantaneously decodable is instantaneously decodable itself.)
Another motivation for the requirement of instantaneous program codes comes from the perception of a computer as a decoding equipment [,]: its programs correspond to the encoded message and its output corresponds to the decoded message. A ``reasonable'' requirement is that the coded messages (i.e., the programs) are ``instantaneously readable'' and therefore should not depend on other encoded messages or on other parts or the rest of the transmitted message. This translates into the requirement that no program is a prefix of another (or, in other words: no extension of a valid program is a valid program). In the foregoing section, such a code has been called instantaneous or prefix code.
The static complexity, more specifically the program size complexity or
algorithmic complexity or algorithmic information will be denoted by H and can
be defined as follows. [Algorithmic information [,,]]
Assume
instantaneous (prefix) encoding.
The canonical program associated with an object s representable as
string is denoted by s* and defined by
|
(7.5) |
Let ``|x |'' of an object
encoded as (binary) string stand for the length of that string. The static
or algorithmic complexity H(s) of an object s representable as string is
defined as the length of the shortest program p which runs on a computer U and
generates the output s:
|
(7.6) |
The joint algorithmic information H(s,t) of two objects s and t representable as strings is the length of the smallest-size binary program to calculate the concatenation of s and t simultaneously.
The relative or conditional algorithmic information H(s|t) of s given t is the length of the smallest-size binary
program to calculate s from a smallest-size program for t:
|
(7.7) |
One may also ask, ``what is the probability that a valid program producing a specific object s, or any object at all, will be obtained by the flipping of a fair coin?'' - Of course, the sequence obtained by this random process has to be ``arbitrary long'' to assure that one has not just gotten the first bits of a valid program producing s. The following definitions are motivated by these questions. For an early definition of probability measures on the set of output sequences of automata, see the article by K. de Leeuw, E. F. Moore, C. E. Shannon and N. Shapiro, in Automaton Studies [].
[Algorithmic probability, halting probability, version I]
Let s be
an object encodable as binary string and let S={ si} be a set of such
objects si. then the algorithmic probability P is defined by
|
Remarks:
(i) P is, strictly speaking, no probability:
Due to the Kraft inequality (1), i.e., for inefficient coding and due to the fact that not all programs converge, W = åsP(s) £ 1 needs not be exactly 1.
Let x(n) denote a binary sequence of length n and let x(n)0
and x(n)1 denote sequences of length n+1 which are obtained from x(n) by
appending the symbol 0 and 1, respectively. Let for the moment the symbols
``x(n)\fo U(p)'' denote that x(n) is an initial segment of U(p). If one slightly
modifies the definition of P such that
|
|
But even if one assumes the original definition of P, two independent, random
objects s and t yield
|
With these provisos, P will be called ``probability'' nevertheless, because P(s) is a reasonable measure of the frequency that some prefix-free program produces the object s on the standard universal computer U.
(ii) W is the halting probability (with null free data), i.e., the probability that an arbitrary program (with no input) halts. W is random and - in the limit of infinite computing time - can be obtained in the limit from below by a computable algorithm [,]. For details, see , p. .
(iii) The set of all true propositions of the form ``H(s) £ n < ¥'' or ``P(s) > 2-n'' is recursively enumerable because it is possible to ``empirically'' find H(s) by running all programs of size less than or equal to n and by observing whether they output s. However, one would have to wait ``very long,'' i.e., longer than any recursive function of n, though for finite length strings |s| < ¥ not eternally, to recognise that; see , p. for details.
(iv) PC¢(s) and WC¢ can be defined for any (not necessarily universal) computer U¢ by substituting U¢ for U in (8) and (10). We shall use the isomorphy between deterministic physical systems and effectively computable processes to relate the algorithmically defined probabilities (8) to physical entropy in chapter , p. .
There are infinitely many programs contributing to the sum in (8), but the
greater the size of any such program is, the more it gets (exponentially)
suppressed. Therefore, the dominating term in the sum for P(s) stems from the
canonical program s*. Indeed, it can be
shown [,] [cf. (), p. ] that there are ``few'' minimal programs contributing
substantially to the sums (8),
(9) and (10). Thus the probabilities to
produce a specific object s as well as the halting probability can also be
defined by taking into account only the canonical (i.e., shortest-size)
programs. I.e., if in (8) the sum over
all programs is reduced to a single contribution from the canonical program
s* for which |s* | = H(s), then one can define P* (s) and W* as follows: [Algorithmic probability, halting
probability, version II]
Let s be an object encodable as binary number
and let S={ si} be a set of such objects si. then the
algorithmic probability P* is
defined by
|
If we would not have restricted the allowed program codes to instantaneous
ones, ``much more'' allowed program codes could have contributed to the sums (8), (9) and (10). As a result, these sums might
not have been bounded from above by 1 and might have even diverged. Due to
instantaneuos (prefix) encoding, the Kraft inequality (1)
|
|
(7.14) |
|
|
A proof of 7.2.4 is straightforward, since any effectively computable function can be coded into a finite program of length cy=O(1).
More precisely, the machine independence of algorithmic information can be obtained if one assumes two suitable universal machines U and U¢ such that there exists a translation program y:p¢® p of constant length O(cy ) with U(y(p¢))=U¢(p¢) for all p¢ and vice versa. This restriction of the class of universal machine models is necessary, since for arbitrary universal machines U,U¢ there does not exist a translation program y:p¢® p of constant length O(cy ) with U(y(p¢))=U¢(p¢) for all p¢. Indeed, G. Chaitin requires this condition in the definition of universal machines.
With this restriction, by theorem 7.2.4,
|
If both U and U¢ were universal computers,
then H £ HU¢+c1 and HU¢ £ H+c2, where
c1, c2 are unspecified positive constants, corresponding
to, roughly speaking, the length of the binary translation programs, which are
usually of the order of 1000 bits. Thus, for all objects x the absolute value of
the difference between H and HU¢ satisfies
|
(7.20) |
The following relations hold: [(Chaitin [,])]
Let s and t be two
objects representable as binary strings.
|
Remarks:
(i) The subadditivity (26) & (27) will be used for a proof of incompleteness theorems for lower bounds on H. For details, see , p. . Informally speaking, it means that, with respect to program size, it is quite effective for a computer to ``do one thing after the other.''
(ii) Theorem (30) refers to the maximal complexity of a finite bit string. At a first glance it seems amazing that the complexity of a string can exceed its size. But, as has already been pointed out, one should bear in mind that a prefix-free program code has to be longer than non prefix-free program code.
Informally speaking, an enumeration of an output string has to contain information specifying the length of the output string. To demonstrate this fact consider a print program of an algorithmically incompressible sequence x(n) of the form ``PRINT THE FOLLOWING SEQUENCE OF LENGTH n: x(n).'' The print program contains the strings ``PRINT THE FOLLOWING SEQUENCE OF LENGTH'' and `` :'' which contribute O(1) to the program length; furthermore it has to contain an enumeration of n which needs no end-marker. Such an enumeration of n could be: ``READ THE NEXT |n| SYMBOLS: n'' and so on, until |¼|n|¼| is 1. This results in a nesting of read statements, corresponding to contributions £ log2(¼log2(n)¼). These terms have to be included for a calculation of the program length.
By recalling the argument using recursive instantaneous (prefix) encoding,
one obtains bounds from above
|
(7.34) |
(iii) Theorems (31), (32) and (33) express a very important identity between the algorithmic information content of an object s and the probability that it is algorithmically produced. They suggest a formal analogy of algorithmic information to Shannon information (), p. , and physical entropy. Theorem (31) is an immediate consequence of the definition of P* (s)=2-|s* |=2-H(s), equation (11).
A proof of theorem (32), which has important physical applications (see chapter , page ) is less straightforward. Following G. Chaitin [,], it has to be shown that most of the probability is concentrated on the minimal size programs, or alternatively that there are few minimal programs. If there are many large programs, a much smaller program is constructed in the proof. The proof uses non constructive elements, such as computation of the algorithmic probability in the limit from below, as well as the existence of universal computers and program codes with extraordinary properties. [The proof is not a trivial consequence of the usual definition of the (Shannon) information gain I(s)=-logp(s) if symbol s occurs. In this case, the probability p(s) is defined by the relative frequency of occurrence (and not the algorithmic probability) of the symbol s.]
Recall that lg(x) stands for the greatest integer less than the base-two logarithm of the real number x; for x ¹ 2n, n Î \N, this is just the integer part of the base-two logarithm of x. I.e., if 2n < x £ 2n+1, then lg(x)=n. Thus, 2lgx < x. Notice that, as has been pointed out before, the set of all true propositions of the form ``H(s) £ n'' or ``P(s) > 2-n'' is recursively enumerable because it is possible to ``empirically'' find H(s) by running all programs of size less than or equal to n and see whether they output s. On the basis of this process, postulate a universal computer D which simulates the original computer U and in addition enumerates all true theorems of the form ``P(s) > 2-n'' without repetition. Further postulate that for D a single program of length n exists which outputs s if and only if the condition P(s) > 21-n, or n ³ -lgP(s)+1 is satisfied. (The extra factor 2 in P(s) > 2×2-n is required for a proof of the existence of an instantaneous code, see below.)
Hence the number of programs p of length n such that D(p)=s is 1 if n ³ -lg
P(s)+1 and is 0 otherwise. The smallest program which outputs s is the one with
n = - lgP(s)+1, and thus
|
|
Since by theorem (32) P(s) =
e-H(s) åU(p)=s2H(s)-| p| = e-H(s)O(1), one concludes that there are few minimal
programs, i.e.,
|
(7.35) |
H¥ and P¥ will denote the algorithmic information content and
the algorithmic probability of infinite computations. Rather few is
known about the properties of H¥ and
P¥. A result of R. M. Solovay [] states
that, for arbitrary recursively enumerable infinite sets S,
|
(7.36) |
Assume, for example, a program for enumerating the natural numbers in
successive order. Such a program will not halt in finite time. The minimal
length of this program will eventually become ``much smaller'' than the
complexity of most of the individual numbers it outputs. The related ``finite''
version of this statement is the fact that there exist sets of objects
S={s1,¼,sn}, n < ¥ whose algorithmic information content H(S) is arbitrary
small compared to the algorithmic information content of some unpecified
single elements si Î S; i.e.,
|
(7.37) |
This chapter deals with the question of the time of computation, i.e., with the number of discrete steps in a computation of an object. (Another dynamical complexity measure is space or storage size, the number of distinct storage locations accessed by the computation.) Such an object can be a (binary) string, the solution to a mathematical problem et cetera. Let N be some number characterising the size of a problem. In chapter 4, p. pageref, techniques for representing an arbitrary object by a code string of symbols x are discussed. If an object (with code) x is the solution to the problem, the size parameter N needs not necessarily coincide with the length |x(N)| of x. Examples of such objects are the generation of a sorted list of N items or finding the roots of a polynomial of degree N.
Consider again a universal computer and instantaneous (prefix) program codes.
[Computational complexity]
Assume a problem of
the order of N and its solution x(N), if it exists. The dynamical or
computational complexity HD(x(N)) is the time (number of cycles,
computing steps) it takes for the fastest program p running on a universal
computer U to calculate x(N):
|
Examples:
(i) A trivial example is the enumeration of a string x(n) of length n on a sequential machine outputting one symbol per time cycle. A program containing a print statement of the form ``PRINT THE FOLLOWING SEQUENCE OF LENGTH n: x(n)'' consumes a minimal time HD(x(n))=n+O(1).
(ii) The telephone book search with N entries (more general, the search-a-sorted-list problem) outputs the searched-for number x(N) in minimal time HD=O(logN).
(iii) The travelling salesman problem of finding the shortest route for a traveller who wishes to visit each of N cities. The solution is the sequence x(N) of the cities visited consecutively. It is not difficult to reason that HD(x(N)) £ O(N!), but the statement that the problem is intractable, i.e., that it cannot be solved in polynomial time, i.e., that the minimal time is HD(x(N)) > O(Nk), where k < ¥, remains conjectural [,].
(iv) The task of finding the first n bits of the halting probability W (see section , p. ) is, for large n, uncomputable and HD(W(n))=¥ [,].
Ch. H. Bennett [] has proposed an alternative definition of computational
complexity. There, the computer program is fixed by identifying p with the
canonical, i.e., smallest-size, program [x(n)]* generating a (binary) sequence x(n) of length n. The
resulting measure is the ``logical depth.'' For a much more detailed
analysis, see Ch. H. Bennett []. [Logical depth (Bennett [])]
The logical depth D(x(n)) of a sequence
x(n)=x1¼xn is the time (number of
cycles, computing steps) it takes for the canonical program [x(n)]* running on a universal computer U to calculate x(n):
|
Remarks:
(i) By specifying a Turing machine model (or any one which outputs only one digit at the time), one obtains D(x(n)) ³ |x(n)|.
(ii) In view of the possibility of a ``trade-off'' between computing speed (related to computational complexity) and program size (related to algorithmic information) [], algorithmic information - logical depth and program size of the fastest program - computational complexity are dual concepts.
(iii) In Ch. H. Bennett's terminology, a sequence x(n) is [H(x(n)),D(x(n))]-deep, or D(x(n)) cycles deep with H(x(n)) bits significance. Any string x(n) might be called ``shallow'' if it can be produced in a time D which grows not faster than some polynomial in n, i.e., D(x(n)) £ nk, k < ¥. This amounts to saying that the problem of finding x(n) is in the complexity class of polynomial-time algorithms P; i.e., it is tractable (see definition , p. ). Otherwise it might be called ``deep.'' This terminology is somewhat arbitrary but justified by the fact that the class P is closed under variations of ``reasonable'' computer models, meaning that one computer can simulate another by a polynomial-time algorithm.
We shall use computational complexity to formalise the notion of randomness based on the heuristic approach of ``computational irreducibility'' in chapter , p. .
The uncomputability of computational complexity can be proved by
contradiction, utilising diagonalization techniques: Assume that there exists an
effectively computable algorithm TIME which, according to the
Church-Turing thesis, would correspond to a recursive function HD
(wrong). TIME computes the running time of an arbitrary algorithm p
which, implemented on machine U, produces the output x(N) and, if U(p) does not
halt, renders infinity. By identifying x(N) with an arbitrary sequence x(n) of
length n, TIME could construct an effectively
computable ``halting program'' HALT as follows:
|
(8.1) |
In contrast to algorithmic information, which, up to O(1), remains unchanged under changes of universal computer models, computational complexity is a highly machine dependent concept. However, as will be argued below, ``reasonable'' universal machines simulate one another by polynomial-type algorithms. Therefore, one computation which has a polynomial time bound on one computer will also have a polynomial time bound on virtually any other; perhaps though by a polynomial of different degree. This means for instance that, in Ch. H. Bennett's terminology, a shallow string remains shallow and a deep string remains deep.
Let us, for the moment, consider Amdahl's law which evaluates the
time gain of suitably algorithms on machines which process information in
parallel: Let f be the percentage of possible parallelism in this algorithm
and N be the number of parallel units, then the relative increase in computing
time with respect to a single computing unit (N=1) is [f/N+(1-f)]-1. Denote by
CN a computer with N instantly accessible processing units. We may
then consider the computational complexity HD(N,f) as a function of
the machine CN and the degree of possible parallelisation. As a
consequence, with an infinite number of instantly accessible processing units
(N=¥), the decrease in computational complexity with
respect to a single processor unit is given by
|
In what follows we restrict our attention (if not stated otherwise) to single-unit sequential universal computers such as to the Turing machine CT.
The following overview is only a short briefing on computational complexity classification. More detailed accounts can for instance be found in the books by D. Harel [], Jan van Leeuwen [] (editor), M. R. Garey & D. S. Johnson [], L. Kronsjö [], C. Calude [] and J. E. Hopcroft & J. D. Ullman [], among many others.
Let again N be some number characterising the size of a computational problem; for instance the task of sorting a list of N items, or finding the roots of a polynomial of degree N. Then we may ask, ``what is the functional behaviour of the time complexity HD(N), i.e., the minimal amount of computation time to solve a problem, as N increases?''
Computational problems which can be solved by polynomial-time algorithms are called tractable or feasible; if they are solvable but not in P, they are called intractable.
Examples:
Examples for problems in P are the telephone book search with N entries (more general, the search-a-sorted-list problem) with HD=O(logN), or the sorting N entries problem with HD=O(NlogN). Problems with O(2N), O(NN) or O(N!) are not in P.
A further conjecture, relating execution time in parallel to computers to
space (storage size) in sequential computers can be stated as follows. [Parallel
computation thesis]
Whatever can be solved in polynomially
bounded space (storage size) on ``reasonable'' sequential computers can be
solved in polynomially bounded time on ``reasonable'' parallel computers, and
vice versa.
Remarks:
(i) One may translate these conjectures into definitions if one defines computers to be ``reasonable'' only if they satisfy the statements.
(ii) Similar to the parallel computation thesis, a ``trade-off'' can be formulated between algorithmic information and computational complexity. For further details, the reader is referred to an article by G. Chaitin [].
type of problem | algorithmic status | computer model |
: | : | : |
unbounded | undecidable | oracle |
: | : | : |
exponential bound | computable but | universal computer |
HD(N) £ O(kN) | intractable | (universal CA, |
Turing Machine etc.) | ||
polynomial bound | tractable | universal computer |
HD(N) £ O(Nk) | (universal CA | |
Turing Machine etc.) | ||
finite | finite machine | |
N < ¥ | (insert your favorite | |
brand here: ``¼ '') | ||
The ``busy beaver function'' S (I would
have rather called it ``wild weasel function'') is the answer to the
question, ``what is the largest number which can be calculated by programs
whose algorithmic information content is less than or equal to a fixed, finite
natural number?'' In other words: [Busy beaver function] The busy beaver
function S(n) is defined by the largest
natural number whose algorithmic information content is less than or equal to n;
i.e.,
|
number of states n | number of 1's printed |
ST(1) | 1 |
ST(2) | 4 |
ST(3) | 6 |
ST(4) | 13 |
ST(5) | ³ 1915 |
ST(7) | ³ 22961 |
ST(8) | ³ 3·(7·392-1)/2 . |
For large values of n, S T(n) grows
faster than any computable function of n; more precisely, let f be an
arbitrary computable function, then there exists a positive integer k such that
|
The following results have not been proven for instantaneous codes.
Nevertheless, it could be expected that they hold even after translation to
instantaneous codes. As a consequence of its definition, S satisfies the following relations: [(G. Chaitin [])] If
S is defined, then
|
Although in general the computational complexity or logical depth associated
with a (finite or infinite) object is uncomputable, it is possible to derive a
non recursive/uncomputable) bound from above, associated with the algorithmic
complexity of programs which halt: [Bound from above on computation time and
complexity (G. Chaitin [])]
Either a program p halts in cycle time
less than S(H(p)+O(1)) or it never halts. Therefore, if
one defines d(n) = max |x* | £
n HD(x) = max |x* | £
n D(x) , then
|
Proof:
Here we shall only proof that there is an upper bound for the computational complexity and logical depth for programs of algorithmic information £ n, namely for objects x with H(x) £ n, HD(x),D(x) £ d(n) £ S(n+O(1)): consider two universal computers U and U¢. Given an arbitrary program p with H(p) £ n, U(p) simulates U¢(p) and additionally counts the cycle time t of U¢ until p has halted. If this happens, U(p) outputs the execution time t of p and halts. Now if d(n) is defined, then there exists at least one program pL of algorithmic information £ n which takes longest, such that U(pL)=d(n). Since, independent of n, the simulation of U¢ on U and counting requires O(1) additional bits of program size, the algorithmic information of p with respect to U is governed by the algorithmic information of p with respect to U¢ with an additional additive constant, i.e., H(d(n)) £ n+O(1). By theorem (5) we conclude that d(n) £ S(n+O(1)).
Remark:
Intuitively speaking, production of S(n) is among the most time-consuming tasks for a program of complexity £ n - indeed, it takes S(n+O(1)) cycles to do so.
[Uncomputability of S] For large n, S(n) is not effectively computable / non recursive.
Proof by contradiction: Assume S(n) were computable. This would provide a recursive solution to the halting problem, since, given an n bit program, one would only have to wait until time S(H(n)+O(1)), when all programs which halt have done so. But recursive solutions are not allowed by theorem , p. . The alternative is either a contradiction or acceptance of the uncomputability of S. There is no other consistent choice than stating the uncomputability of S.
Recurrence is defined by periodicity in output. [Recurrence] A
computer U(p) producing a periodic output
|
|
[Bound from above on recurrence] The greatest recurrence
Tr of a program of algorithmic information n is given by
|
Proof:
First, Tr(n) £ S(n+O(1)) will be proved. Consider again two universal computers U and U¢. Given an arbitrary program p with H(p) £ n, U(p) simulates U¢(p) and additionally produces a constant output symbol (say, ``0'') until p has halted. If this happens, U(p) outputs a different output symbol (say, ``1'') and goes into an infinite loop. If Tr(n) is defined, then there exists at least one program pL of algorithmic information £ n which takes longest, such that the recurrence of U(p) is Tr(n). Since, independent of n, the simulation of U¢ on U, outputting either ``0'' or ``1'' and supplementary operations requires O(1) additional bits of program size, the algorithmic information of p with respect to U is governed by the algorithmic information of p with respect to U¢ with an additional additive constant, i.e., H(Tr(n)) £ n+O(1). By theorem (5) one concludes that Tr(n) £ S(n+O(1)). The foregoing construction shows that there exist programs of length n+O(1) which yields output of periodicity S(n).
Alpha 60: I will calculate ¼ so that
failure ¼ is impossible.
English translation [], p. 149:
Lemmy: I'll fight until
failure does become possible.
Alpha 60: Everything I plan will be
accomplished.
Lemmy: That's not certain. I too have a secret.
Alpha 60:
What is your secret? ¼ Tell me ¼Mr Caution.
Lemmy: Something that never changes with the
night or the day, as long as the past represents the future, towards which it
will advance in a straight line, but which, at the end, has closed on itself
into a circle.
¼
Alpha 60: Several of my
circuits are looking for the solution to your puzzle. I will find it.
Lemmy:
If you find it ¼ you will destroy yourself in the
process ¼ because you will have become my equal, my
brother.
from ``Alphaville'' by Jean-Luc Godard []
Chapter 9
Classical results9.1 True ¹
provable
Consider the classical [] liar paradox ``I am lying'' in the form
``this statement is false.'' Kurt Gödel achieved a mathematically
meaningful theorem by translating it into a formal statement, which can be
informally expressed as ``this statement is unprovable.'' Gödel himself
was well aware of this analogy. Already in his centennial 1931 paper [], he
stated,
``Die Analogie dieses Schlusses mit der Antinomie
Richard springt in die Augen; auch mit dem ``Lügner'' besteht eine
nahe Verwandtschaft [Fußnote 14: Es läßt sich überhaupt jede epistemologische
Antinomie zu einem derartigen Unentscheidbarkeitsbeweis verwenden.], ¼''
``The analogy of this argument with the
Richard antinomy leaps to the eye. It is closely related to the ``Liar'' too;
[footnote 14: Any epistemological antinomy could be used for a similar proof
of the existence of undecidable propositions] ¼''
Similarly, other metamathematical and logical paradoxa [] have been used systematically for the derivation of undecidability or incompleteness theorems. The method is proof by contradiction: first, a statement is assumed to be true; this statement yields absurd (paradoxical) consequences; the only consistent choice being its unprovability or nonexistence. Mostly, absurd consequences are constructed by similar techniques as Cantor's diagonalization method; see below.
It is certainly not illegitimate to ask, ``what is the `essence' or the `meaning' of Gödel's incompleteness result,'' and, ``what is the `feature' responsible for it?'' There exist at least two features of the argument which are noteworthy: self-reference and the possibility to express but not to prove (!) certain facts about a formal theory intrinsically, i.e., within the theory itself. (This occurs only for theories which are ``strong enough'' to allow coding of metastatements within the theories themselves. Theories which are ``too weak,'' i.e., theories in which metastatements within the theories themselves cannot be coded, do not feature incompleteness of this kind, although they are incomplete in a more basic sense.) The next two sections will concentrate on these two issues.
Another, lively, example is a stone disc outside of Rome's Santa Maria in Cosmedin. There is a slot carved in the stone disc. Legend has it that anyone sticking a hand into this slot while uttering a wrong statement will not be able to get the hand out again. Rudy Rucker confessed thrusted his hand into the slot by saying, ``I will not be able to pull my hand back out'' [] - whoever was responsible for the reaction of the slot must have been in real trouble! The crocodile version of this dilemma [] is a crocodile, having stolen a child and saying to the child's mother, ``only if you guess whether I am going to give you back your child or not, shall you get it back,'' with the mother replying, ``you will not give it back.'' Still another all-time favourite is Russell's paradox [] in the form of a male barber who is obliged to ``shave all people who do not shave themselves''. This order is inconsistent when applied to the barber himself. - Russell's paradox applies to G. Cantor's 1883 definition of a set, i.e., ``A set is a Many which allows itself to be thought of as a One'': the ``set of all sets which are not members of themselves'' is a paradoxical construction, which, among other paradoxes embodied in G. Cantor's original set theory, has motivated the axiomatisation of set theory at the price of restricting the mathematical universe.
Yet, besides all these difficulties, there is nothing wrong with self-reference per se: take for instance the claim `` this statement contains ¼ words.'' - If ¼ = five, the sentence is true, for all other number-arguments it is false. (Nevertheless, paradoxical constructions are quite close: consider, for instance, Berry's paradox in the form, ``the smallest positive integer that cannot be specified in less than a thousand words'', which, if it existed, has just been specified by fourteen words.)
Indeed, self-reference is an essential feature of any intrinsic perception. Therefore it is not unreasonable to assume that self-reference, if applied properly, yields well-defined statements. Troubles in the form of inconsistencies occur in particular circumstances, e.g., by attempting a complete self-interpretation or, technically, after some kind of ``diagonalization.'' As will be discussed later, these inconsistencies, interpreted properly, represent a via regia to undecidability.
``I think the theorem of mine which von Neumann refers to is not that on the existence of undecidable propositions or that on the lengths of proofs but rather the fact that a complete epistemological description of a language A cannot be given in the same language A, because the concept of truth of sentences of A cannot be defined in A. It is this theorem which is the true reason for the existence of undecidable propositions in the formal systems containing arithmetic. I did not, however, formulate it explicitly in my paper of 1931 but only in my Princeton lectures of 1934. The same theorem was proved by Tarski in his paper on the concept of truth [[cf. A. Tarski's earlier paper []]] published in 1933 in Act. Soc. Sci. Lit. Vars., translated on pp. 152-278 of Logic, Semantics and Metamathematics [[ [] ]].''
Absolute truth is a transfinite concept, which is not definable by any finite description. An informal proof by contradiction of this fact can be envisioned as follows: suppose that a finite description TM of a ``universal truth machine'' exists. (No difference is made here between TM and its code # (TM).) The truth machine is supposed to work in the following way: one inputs an arbitrary statement, asking whether it is correct. TM then outputs TRUE or FALSE, depending on whether the statement has been correct or incorrect, respectively. Now consider the input, ``the machine described by TM will not output that this statement is true''. The resulting problem is of the same kind as the one encountered in the liar paradox: The truth machine cannot say TRUE or FALSE without running into a contradiction. Therefore, TM cannot decide all questions (it is unable to decide at least one question), contradicting the assumption that the truth machine decides all questions. However, somebody watching from the outside (i.e., someone who is not part of this truth machine) knows that the above statement is [(TRUE)\tilde], but this results in an extrinsic notion of truth which is stronger than the ``portion of truth'' available to the truth machine (to indicate this, the truth value is tilded). One could, of course, proceed by simply adding this statement to TM, producing a [(TM)\tilde]. By the same argument, [(TM)\tilde] would not be able to decide the input, ``the machine described by [(TM)\tilde] will not output that this statement is true,'' which is [([(TRUE)\tilde])\tilde], ¼, forcing a hierarchy of notions of truth ad infinitum.
The paradox of the liar and similar paradoxes can thus be resolved by accepting that these constructions operate with a bounded ``degree'' or ``strength'' of truth. In this sense, Gödel's incompleteness results mean a formal proof that the notion of truth is too comprehensive to be grasped by any finite mathematical model. No comprehensive concept of truth can be defined. In particular, no intrinsic consistency proof can be given.
[[In 1979]] I went up to [[John Archibald]] Wheeler and I asked him, ``Prof. Wheeler, do you think there's a connection between Gödel's incompleteness theorem and the Heisenberg uncertainty principle?'' Actually, I'd heard that he did, so I asked him, ``What connection do you think there is between Gödel's incompleteness theorem and Heisenberg's uncertainty principle?''This is what Wheeler answered. He said, ``Well, one day I was at the Institute for Advanced Study, and I went to Gödel's office, and there was Gödel...'' I think Wheeler said that it was winter and Gödel had an electric heater and had his legs wrapped in a blanket.
Wheeler said, ``I went to Gödel, and I asked him, `Prof. Gödel, what connection do you see between your incompleteness theorem and Heisenberg's uncertainty principle?' '' I believe that Wheeler exaggerated a little bit now. He said, ``And Gödel got angry and threw me out of his office!'' Wheeler blamed Einstein for this. He said that Einstein had brain-washed Gödel against quantum mechanics and against Heisenberg's uncertainty principle!
Despite such anecdotal dissent, several attempts have been made to translate mathematical undecidability into a physical context [,,,,,,], among them two very notable early articles by Karl Popper [] in the late 40'th.
After a brief review of the ``classical'' mathematical results, we shall take a fresh look at this issue. Various forms of undecidability related to physics can be formalised by the method of diagonalization. (Diagonalization has already been used in previous chapters and has been introduced first by Georg Cantor for a proof of the undenumerability of the reals.) The following physical applications will be discussed in their classical context first:
(i) The problem of forecast of a mechanistic, i.e., totally computable, system. Thereby it is assumed that absolute knowledge about the recursive laws governing that system has been given to us by some ``oracle.'' The general problem of forecast will be linked to what is called the ``recursive unsolvability of the halting problem.''
(ii) While the problem of forecast appears already in a very generel extrinsic context, one can consider the same problem in an intrinsic setup. I.e., consider a theory about a system represented within that very system. K. Gödel, utilising the technique of Gödel numbering, achieved the undecidability theorems by expressing (meta-arithmetic) statements about arithmetics within arithmetics. In analogy, one could hope for expressing ``intrinsic indeterminism'' by representing theoretical statements about a system within that very system.
(iii) The rule inference problem can be expressed by the question, ``given a specified class of laws, usually the class of recursive / computable functions, which one of these laws governs a particular system?'' Thereby it is assumed that the system is treated as a black box; i.e., one is only allowed to perform input / output analysis. This kind of problem is related to the problem of identifying and learning a language.
(iv) The impossibility to state exactly the algorithmic information content of an arbitrary sequence.
(v) The construction of ``toy universes,'' generated by finite automata and the investigation of their logical structure. One goal is the creation of nonlocal automata models which feature quantum-like behaviour, in particular ``computational complementarity.''
(vi) Intrinsically, the physical measurement and perception process exhibits (paradoxical) features resembling computational complementarity and diagonalization. An idealised measurement attempts the impossible: on the one hand it pretends to grasp the ``true'' value of an observable, while on the other hand it has to interact with the object to be measured and thereby inevitably changes its state. Integration of the measurement apparatus does not help because then the observables inseparably refer to the state of the object and the measurement apparatus combined, thereby surrendering the original goal of measurement (i.e., the measurement of the object). These considerations apply to quantum as well as to classical physics with the difference that quantum theory postulates a lower bound on the transfer of action by Planck's constant (h/2p).
One may even embark on a much more radical (metaphysical) program, which can be stated pointedly as follows: ``All instances of ``randomness'' in physics will eventually turn out to be undecidable features of mechanistic systems. There is no randomness in physics but a constant confusion in terminology between randomness and undecidability. God does not play dice.''
Georg Cantor's diagonalization technique has first been introduced around 1873 (in a correspondence with Dedekind) and since then has become the via regia to the investigation of undecidability. Probably the most prominent application, already introduced by G. Cantor himself, is a proof that the set of real numbers is not denumerable, i.e., there cannot exist any complete listing of the reals, one after the other.
Assume for the moment that the set of reals is denumerable. (This assumption
will yield a contradiction.) That is, the enumeration is a one-to-one function
f:\N ® \R (wrong), i.e., to any k Î \N exists some rk Î \R
and vice versa. No algorithmic restriction is imposed upon the
enumeration, i.e., the enumeration may or may not be effectively computable. For
instance, one may think of an enumeration obtained via the enumeration of
computable algorithms and by assuming that rk is the output of the
k'th algorithm. Let 0.dk1dk2¼ be
the successive digits in the decimal expansion of rk. Consider now
the diagonal of the array formed by successive enumeration of the reals,
|
(9.1) |
The ``halting problem'' is related the question of how a mechanistic system will evolve or what an algorithm or an automaton will output or what theorems are derivable in a formal system. In other words, it is referring to the problem of forecast for a mechanistic system.
[Halting problem]
Let there be an algorithm A and an input x. The
halting problem (Church version) is the decision problem associated
with the question whether or not A(x) will produce a specific output y in finite
time. Equivalently, one may ask whether A will terminate on x (Turing
version). The case ``A terminates or converges on x'' is denoted by
A(x)¯ ; the case ``A does not terminate or
diverge on x'' is denoted by A(x) . A.
Church's version of the halting problem reduces to A. Turing's version if the
termination condition is the production of output y.
[Recursive unsolvability of the halting problem (A. Turing [])] There is no effectively computable algorithm / partial recursive function which decides the halting problem. The halting problem is unsolvable.
To obtain more intuition, let us see how J. von Neumann interpreted the recursive unsolvability of the halting problem (the following quotation is taken from J. von Neumann's Theory of Self-Reproducing Automata, ed. by A. W. Burks [], p. 51):
Turing proved that there is something for which you cannot construct an automaton; namely, you cannot construct an automaton which can predict in how many steps another automaton which can solve a certain problem will actually solve it. So, you can construct an [[universal]] automaton which can do anything any automaton can do, but you cannot construct an automaton which will predict the behaviour of any arbitrary automaton. In other words, you can build an organ which can do anything that can be done, but you cannot build an organ which tells you whether it can be done.
The following three proofs by contradiction of the unsolvability of the halting problem use Cantor's diagonalization technique. They use algorithmic arguments, which, by the Church-Turing thesis, are valid for the class of recursive functions as well. (One might prefer not to make use of such ``proofs by the Church-Turing thesis.'' Then it would be necessary to construct an explicit model, a concrete implementation, of these algorithms. - See, for example, A. Turing's article [].)
Proof 1:
Consider an arbitrary algorithm A(x) with input x. x is a string of symbols. Assume that there exists a ``halting algorithm'' HALT (whose existence should be disproved) which is able to decide whether A terminates on x or not. [I.e., HALT(A(x))¯, i.e., the termination of HALT, is a property of this fictious ``halting algorithm''.]
Using HALT(A(x)) it is easy to construct another algorithm, which will be denoted by B and which has as input any effective program A and which proceeds as follows: Upon reading the program A as input, B makes a copy of it. This can be readily achieved, since the program A is presented to B in some encoded form # (A), i.e., as a string of symbols. The code # (A) is used as input string for A itself; i.e., B forms A(#(A)), henceforth denoted by A(A), and hands it over to its subroutine HALT. Then, B proceeds as follows:
case (i): if HALT(A(A)) decides that A(A) halts, then B does not halt. This can for instance be realised by an infinite DO-loop.
case (ii): if HALT(A(A)) decides that A(A) does not halt, then B halts.
We shall demonstrate that there is something wrong with B and, since a derivation of B from HALT is trivial (i.e., all manipulations of B despite the computation of HALT are obviously computable), that there must be something wrong with HALT: Recall that A is arbitrary and has not been specified yet. What happens if we substitute B for A, i.e., if we input B into itself and carry the argument through?
case (i): Assume that B(B) halts, then HALT(B(B)) forces B(B) not to halt;
case (ii): assume that B(B) does not halt, then HALT(B(B)) steers B(B) into the halting mode. In both cases one arrives at the contradiction ``B (does not) halt[s] if B [does not] halt(s)''. This contradiction can only be consistently avoided by assuming the nonexistence of B and, since B is a trivial construction from HALT, the impossibility of a halting algorithm HALT.
This proof is important not only because of its relevance for the
investigation of effectively computable processes but also due to its
method, which is very similar to Cantor's diagonalization
method described previously. The syntactic structure is essentially given by
|
Proof 2: A very similar argument goes like this. Assume a function f (whose existence we shall disproof) which is defined as follows: f(k) is either one more than the value of the kth computable algorithm applied to the number k Î \N or zero if this algorithm does not halt on input k. f itself cannot be computable, for if the k=nth algorithm calculated it, one would obtain f(n)=f(n)+1, which is impossible. The only way that f can fail to be computable is because one cannot decide if the nth program ever halts on input n.
Proof 3: Consider algorithms for enumerating sets of natural numbers, and number these algorithms. Define a set of natural numbers associated with all algorithms which do not contain their own number in their output set. This set of natural numbers cannot be computable, for any algorithm which would attempt to compute it would end up with a dilemma known as Russel's paradox [] of ``a male barber who shaves all those and only those who do not shave themselves'', who cannot shave himself nor avoid doing so (a more abstract version of Russel's paradox is obtained by considering ``the set of all sets which are not member of themselves''). There is only way out od the dilemma: such a set cannot be computable because it is generally impossibile to decide whether or not a specific algorithm outputs a specific natural number. This is Church's variant of the halting problem.
Since there is a one-to-one translation between algorithmic entities and formal systems, one has proven a form of Gödel incompleteness as well. A statement about the convergence of an algorithm can be interpreted as a mathematical statement which sould, at least in principle, be decidable within a formal system. For, if it were always possible to prove a theorem whether or not a particular algorithm will halt, this would contradict the halting problem.
An immediate physical consequence of the recursive unsolvability of the
halting problem is the following corollary: [Unpredictable physical observables
of mechanistic physical systems]
There exist
mechanistic physical systems (whose laws and parameters are known) whose
behaviour is unpredictable (undecidable). Examples of physical
systems are general-purpose computers, such as (insert your favourite brand
here:) ``¼,'' denoted by U. These machines are
universal up to finite computational resources. Therefore, the
recursive undecidability of the halting problem applies to them. (The finiteness
and discreteness of their state spaces limits the argument to some extend.
Nevertheless, for all practical reasons these limitations are irrelevant: every
megabit of memory space is worth 21000000 »
10300000 possible states.) More explicitly, by implementing an
arbitrary (legal) program p on U (assume an empty input list), a physical
``halting'' parameter sU(p) can be defined by
|
(9.2) |
Another, very similar, undecidable physical entity has been introduced by K. Popper [], pp. 181-182. In two remarkable papers [], K. Popper gives two other arguments for the impossibility of self-prediction, based on what Popper calls ``Gödelian sentences'' and the ``Ödipus effect.'' In summary, he states,
``[[The arguments]] all agree that calculators are incapable of answering all questions which may be put to them about their own future, however completely they are informed; and the reason for this appears to lie in the fact that it is impossible for a calculator to possess completely up-to-date initial information about itself. (`Know thyself', it seems, is a contradictory ideal; we cannot fully know our limitations - at least not those to our knowledge.)''
Nevertheless, it is possible to give some estimate for the longest computation time associated with programs with smaller than or equal to a fixed complexity n which halt []. Recall theorem (8.3), p. pageref, stating that either a program p halts in time less than or equal to the busy beaver function S(H(p)+O(1)), or else it never halts. Since, as has been argued before, computability of S(H(p)+O(1)) would again contradict the recursive unsolvability of the halting problem, theorem 9.3, the maximum time of computation for a program of length n exceeds any computable function of n. Indeed, if one would be willing to proceed empirically, one would have to wait longer than any computable time to decide whether an arbitrary program halts or what its output is! The empirical method of solving the halting problem is thus of no use for practical purposes even for programs of finite length.
It is tempting to
exclude ``Zeno squeezed oracles'' (cf. section 1.5.2, p. pageref) or, in A. Grünbaum's
terminology ([], p. 630), ``infinity machines'' by the requirement of
consistency. Indeed, posession of such an oracle would make the TIME
algorithm (cf. pageref) and
thus the ``halting function''
|
But then it would be possible to go through the algorithmic argument (based on G. Cantor's diagonalization method) against the recursive solvability of the halting problem. A seemingly paradoxical result, resulting in an inconsistency, would follow: an algorithm B could be ``constructed'' which halts if it does not halt and which does not halt if it halts.
However, at a second glance, such absurdities can be translated into somewhat more harmless statements. Consider again the above ``halting function'' HALT when applied to B. Let us assume that B(B) halts, i.e., HALT(B(B))=1 and let us follow the ``actual construction'' of B(B): B is constructed such that, with the knowledge that B(B) halts, B(B) would not halt; i.e., HALT(B(B))=0. But then, B is constructed such that, with this latter knowledge, B(B) would halt; i.e., HALT(B(B))=1. But then, B is constructed such that, with this knowledge, B(B) would not halt; i.e., HALT(B(B))=0 ¼ ad infinitum. I.e., in this procedural sense, HALT(B(B)) does not converge to a ``true value;'' it is just fluctuating between the values 0 and 1, yielding an infinite sequence 1010101010101¼. This situation resembles very much the situation if one actually tries to evaluate the liar paradox ``this statement is false.'' Because if it is false, then it is in accordance with what it claims; therefore it is true. Yet, if it is true, then it is not in accordance with what it claims; therefore it is false. Yet if it is false, then it is in accordance with what it claims; therefore it is true. Yet, if it is true, then it is not in accordance with what it claims; therefore it is false ¼ ad infinitum. One might say that in this procedural view, paradoxes give rise to very simple forms of evolution and change.
In what follows, a formal system will be called inconsistent if both a statement and its negation are provable therein. (A related version of consistency is the requirement that at least one statement is not provable therein.) Furtheremore, a formal system will be called incomplete if neither some statement nor its negation is provable therein.
The historical context of K. Gödel's incompleteness theorem was the formalist program, put forward, among others, by D. Hilbert []. This program aimed at reconstructing mathematics in purely syntactic terms. In particular, one of its goals was to ``condense'' all mathematical truth into a formal system which has a finite description; any proof should be a finite and ``mechanic'' manipulation of lists of symbols (in particular consistency proofs). Stated differently, such a formal system should be implementable as finite-size program on a computer, which should output all true theorems of mathematics. Anything that could be proven by non-constructive techniques should be derivable also by finitistic means of proof. (This statement is sometimes referred to as the ``conservation program'' [].) The formalist hope was to be able to replace ``truth'' by a finite notion of ``provability.'' Gödel showed in 1931 the impossibility of such a pursuit []. In particular he proved the following [Incompleteness Theorems (K. Gödel [])] Let L be a recursively enumerable and consistent formal system containing (Peano) arithmetic.
[Gödel's incompleteness theorems, pedestrian version]
No
``reasonable'' finite theory encompasses all of mathematics. It is impossible to
decide ``from within'' a theory whether it is free of contradictions.
The term ``consistent'' means that at least one statement is not derivable within L. In particular, L should be free of contradictions, such as the statement ``p and not p.'' (If such a form would occur then any statement would be derivable.)
Proof:
The following proof closely follows C. Smorynski's account [] in Handbook
of Mathematical Logic [], p. 821. Let # (j) denote
the code of some statement j. L\vdash j stands for ``j can be derived in L.'' jx is a
formula with free variable x. Function symbols which are defined in L correspond
to logical connectives and quantifiers. E.g., for all formulae j,y, L\vdash NEG (#(j) )=#(Øj),
L\vdash IMP (#(j), # (y) )=#( j®
y) and so on. For formulae jx
and terms t, the substitution operator SUB is defined by
|
(9.3) |
|
(9.4) |
|
(9.5) |
The encoding can be carried out in such a way that the following
derivability conditions are satisfied:
|
[Diagonalization Lemma] Let jx in the language of L
have one free variable as indicated. Then there is a sentence y such that
|
(9.9) |
Proof of diagonalization Lemma:
Given jx, let qx « j(SUB(x,x)) be the
diagonalization of j. Let m=# (qx) and let y = qm. Then
|
By substituting ØPROVE(x) for j, one obtains the [First incompleteness theorem] Let L\vdash y«
ØPROVE(# (y)).
Then:
(i) L\not\vdash y;
(ii)
under an additional assumption, L\not\vdash Øy;
Proof of first incompleteness theorem:
(i) Observe that by (6), L\vdash y implies L\vdash PROVE(#(y)), which implies L\vdash Øy, contradicting the consistency of L.
(ii) The additional assumption is a strengthening of the converse to
(6), namely
|
[Second incompleteness theorem] Let CON be ØPROVE(#(L )), where L is any convenient contradictory statement. Then
|
Proof of second incompleteness theorem: Let L\vdash y« ØPROOF(# (y)) again be as in theorem 9.5. It is shown that L\vdash y« CON.
L\vdash y® ØPROOF(# (y)) implies L\vdash y® ØPROOF(# (L )), since L\vdash L ® y by (6), implies L\vdash PROVE(#(L ® y)), which by (8) implies L\vdash PROVE (#(L ))® PROVE(#(y )).
Conversely, by (7), L\vdash PROVE(#(y)) ® PROVE(#( PROVE (#(y)))), which by (6) and (8) implies L\vdash PROVE(#(y)) ® PROVE(#(Øy)), since j« ØPROVE (#(y)). Then, L\vdash PROVE(#(y)) ® PROVE(#(yÙØy)) by (6) and (8), which implies L\vdash PROVE(#(y)) ® PROVE(#(L )). But L\vdash ØPROVE(#(L )) ® ØPROVE(#(y)), which is L\vdash CON ® y by definition.
Remarks:
(i) As no attempt is made to extensively discuss Gödel's findings here, the interested reader is referred to the literature, for example to C. Smorynski's review article [], to Volume I of Gödel's Collected Works [], M. Davis [] and to G. Kreisel []. For more informal introductions, see R. Rucker [], E. Nagel & J. R. Newman, [,], D. R. Hofstadter [], R. Penrose [] and J. L. Casti [,,,,,], among others.
(ii) Gödel's incompleteness theorems are again proved by the method of diagonalization, which has already been introduced previously (p. pageref, pageref). Diagonalization already manifests itself in the statement ``this statement is unprovable.''
(iii) Since, as has been pointed out before, there is an equivalence between the notions of formal system and effective computation (p. pageref), statements referring to the unsolvability of some computational task directly translate into statements referring to the incompleteness of formal systems. Another version of (the first) part (i) of Gödel's incompleteness theorem thus follows immediately as corollary of the recursive unsolvability of the halting problem. Assume it were always possible to prove the theorem associated with the problem of whether or not a particular algorithm will halt. This would contradict the recursive unsolvability of the halting problem.
(iv) As stated earlier (p. pageref), Gödel himself seemed to have interpreted his findings as the impossibility to give a complete and finite formal description of the concept of truth of statements about a theory within that theory.
(v) Note that the set of code numbers of theorems of the formal system PT is a recursively enumerable set which is not recursive (i.e., \N -PT is not recursively enumerable). This is due to the recursive unsolvability of the halting problem. Although one can systematically derive and list PT by an effective computation, one can never be sure whether some well formed formula which at some finite time is not in PT (as ``short'' as it may be) turns up at a later time.
(vi) A. Church, A. M. Turing, S. Feferman, G. Kreisel and others have pursuit the question of whether it is possible to pass on to a complete formal system by a transfinite limit. One could exploit the constructive feature of Gödel's incompleteness result, i.e., the construction of a true though unprovable theorem. Let us start with an initial system, say L1, whose theorems concerning integers are correct. By Gödel's construction it is possible to effectively compute a true but unprovable theorem A1, and to include this theorem as an axiom into a new system L2=L1È{ A1} and so on. The iteration Ln=L1È{A1,¼,An-1} might be done mechanically by a process of finite description, associated with a formal system [`L] which thus includes all Ln. Nevertheless, since [`L] can be effectively generated, it is still incomplete, and one must proceed further into the transfinite to overcome this incompleteness. This is made possible by a constructive notation of the ordinals and by considering the limit of this process. In that way one could ``overcome'' Gödel incompleteness at the price of finding a constructive notation of ordinals. For a much more detailed discussion the reader is referred to S. Feferman's review article [].
Assume two (universal) computers U and U¢. The second computer U¢ is being programmed with an arbitrary algorithm p, of which the first computer U has no knowledge of. The task of U, which shall be called the inference problem, is to conjecture the ``law'' or algorithm p by analysing output ``behaviour'' of U¢(p). The following theorem states that this task cannot be performed by any effective computation / recursive function.
[Recursive unsolvability of the rule inference problem (E. M. Gold [])] Assume some particular (universal) machine U which is used as a ``guessing device.'' Then there exist total recursive functions which cannot be ``guessed'' or inferred by U.
Proof:
The following proof follows M. Li and P. M. B. Vitányi [] and uses diagonalization. Assume that there exists a ``perfect guesser'' U which can identify all total recursive functions (wrong). It is possible to construct a function j*:\N ®{0,1}, such that the guesses of U are wrong infinitely often, thereby contradicting the above assumption. Define j* (0)=0. One may construct j* by simulating U. Suppose the values j* (0), j* (1), j* (2), ¼, j* (n-1) have already been constructed. Then, on input n, simulate U, based on the previous series {0, j* (0)},{1, j* (1)},{2, j* (2)},¼,{n-1, j* (n-1)}, and define j* (n) equal to 1 plus the guess of U of j* (n) mod 2. In this way, U can never guess j* correctly; i.e., make only a finite number of mistakes.
Informally, the idea of the proof is to take any sufficiently powerful rule inference method (guessing method) and to define from it a (total) recursive function which is not identified by it. Pointedly stated, this amounts to constructing an algorithm which (passively!) ``fakes'' the ``guesser'' U by simulating some particular recursive function j corresponding to the algorithm p until U pretends to guess this function correctly. In the diagonalization step, the algorithm then switches to a different function j* ¹ j, such that U's guesses are incorrect. A ``feedback'' is not essential to the argument, for U¢ may switch algorithms independently of U. The generic example, ``The first output of U¢ is one and each subsequent output of U¢ is one greater than the guess of U of that output,'' is mentioned in the review by D. Angluin and C. H. Smith [].
One can also interpret this result in terms of the recursive unsolvability of the halting problem: there is no recursive bound on the time the guesser U has to wait in order to make sure that his guess is correct.
For the original proof, as well as a discussion and related topics, the reader is referred to appendix I, pages 470-473, of E. M. Gold [], as well as to the reviews by D. Angluin and C. H. Smith [] and M. Li and P. M. B. Vitányi [], and to an article by L. M. Adleman and M. Blum [].
An immediate consequence of the recursive unsolvability of the rule inference
problem is the following corollary: [Recursive unsolvability of inference for
mechanistic physical systems]
There exist
mechanistic physical systems whose laws and parameters cannot be determined
recursively (by any effective computation).
We shall give only a rather informal idea of the proof [,], which is based on ``Berry's paradox,'' For the historical background, see, for instance, R. Rucker [], p. 100. One form of it is given by ``the smallest positive integer that cannot be specified in less than a hundred words'', which, if it would exist, would have just been specified by fourteen words. As a consequence, this number cannot exist. In a rephrasing closer to algorithmic information theory, this paradox can be expressed as, ``the first positive integer x such that H(x) > 100 words.'' More generally, ``consider the least natural number that can't be defined in less than n bits.'' - Although, by equation (34), p. pageref, this phrase is not longer than log* 2(n)+O(1) bits, it purports to specify a number whose definition needs a phrase which is at least n bits long. This yields a contradiction for sufficiently great numbers n, more specifically for all n > log* 2(n)+O(1). As a consequence, the above statement is consistent only for n £ log* 2(n)+O(1). A compact formal proof of can be found in G. J. Chaitin's book Information-Theoretic Incompleteness [].
So far, the computational complexity, i.e., the minimal time consumption for calculating proofs of theorems in (ii) has been ignored. It has been proven by G. Chaitin [] that the time of computation for propositions of the form ``H(x)=k'' with k < m=H(axioms)+O(1) and ``H(x) ³ m=H(axioms)+O(1)'' is uncomputable. Informally stated, any derivation of such theorems amounts to their ``mechanised'' computation. Due to the recursive unsolvability of the halting problem or, more precisely, due to the maximal halting time, which is roughly proportial to S(m), there does not exist any bound from above on the halting time which is recursive in m.
The ability of computable processes to produce objects with higher algorithmic information content than themselves will be reviewed next. - Indeed, a program which, for example, counts and outputs integers in successive order will eventually end up with an integer of higher algorithmic information content than its length.
In the spirit of the incompleteness theorem (9.7) for lower bounds on algorithmic information, one may be tempted to formulate the following erroneous statement, which might be a source of misconception:
``Consider a consistent theory, representable as formal system. Within such a theory no theorem representable as string x can be generated which has algorithmic information more than O(1) greater than the information content of the axioms of the theory. I.e., x is a theorem only if H(x) £ H( axioms)+O(1).As has been pointed out by G. Chaitin, this statement is false. G. Chaitin stated []:
Or, in pedestrian version,You cannot generate a 20-pound theorem with a 10- pound theory if pound is the unit of algorithmic information.''
``I've said that ``You can't prove a 10 pound theorem from a 5 pound set of axioms.'' Many people erroneously take this to mean that I am asserting the following theorem: If A is a set of axioms that proves theorem T, then H(T) < H(A) + O(1). I most certainly do not: this theorem is obviously false. Let me give two proofs. ``Deep'' proof: if the axioms A yield infinitely many theorems T, then some T are arbitrarily complex, because there are only finitely many simple strings. Proof via natural counter-example: Consider Peano arithmetic and all theorems of the form ``n = n'', where n is the numeral for a very complex natural number.A more technical statement is ``You can't prove a 10 pound theorem from a 5 pound set of axioms AND KNOW THAT YOU'VE DONE IT.'' (I.e., know that you've proven a theorem with a particular complexity that substantially exceeds that of the axioms.) Restated in this slightly more careful fashion, it is now obvious that my assertion is an immediate corollary of my basic theorem that one can prove ``H(S) > n'' only if n < H(A) + O(1).''
A simple counterexample is the theorem ``n=n'', where n is some numeral of a ``very complex'' natural number. The algorithmic information of ``n=n'' can exceed the information content of the axioms of the theory, since there always exists n's such that H(n) >> H( axioms)+O(1). Notice, however, that the above statement is true if one specifies a particular theorem for instance by specifying the place t in the enumeration of provable theorems. Then such a derivation corresponds to the application of a recursive function (see theorem (7.2.4), p. pageref). The trouble occurs if one considers infinite, boundless computations, such as derivation of ``all'' formulas or the enumeration of ``all'' naural numbers in \N - for a few more details, see section 7.3, p. pageref. Another way of expressing this fact is the following [Creation of algorithmic information] There exist formal systems whose axioms contain finite algorithmic information, in which it is possible to generate theorems of arbitrary high algorithmic information.
[Pedestrian version] Eventually you can generate a 20-pound theorem with a 10- pound theory if pound is the unit of algorithmic information.
More examples:
(i) Consider an algorithm which outputs the natural numbers, one after the other. Such an algorithm can be ``very short,'' since, by starting from 0, in the n'th iteration step it needs only to add 1 to the previous number and output it. Eventually it will output a natural number corresponding to a bit string with higher algorithmic information than the length of the program, which is O(1).
(ii) Consider Peano's axioms, representable as finite string PA; PA has a finite algorithmic information content H(PA)=O(1); nevertheless, since the Peano axioms allow counting, it is possible to derive and enumerate all natural numbers. By theorem 30, p. pageref, ``many'' binary numerals of length k have algorithmic information H(k)=k+H(k)+O(1), which may exceed H(PA). Some of these numbers { nL} Ì \N will even be Gödel numbers or representations of formal systems L whose algorithmic information exceeds the algorithmic information of the original Peano axioms, i.e., H(nL) > H(PA). Notice, however, that there exist ``extremely'' large numbers with a ``very'' small algorithmic information content: take, for example, 22 22 22 22 (cf. section 7.3, p. pageref).
An immediate consequence is the fact that the algorithmic information necessary to specify some set S may be smaller than the algorithmic information of some of its constituents, i.e., H(S) < maxsi Î S H(si). An example for such a set is the set of natural numbers \N, for which H¥ (\N ) < maxn Î \NH(n)=¥.
This can be combined with theorem 9.7 by the following scenario: [Pedestrian version] Eventually you can generate a 20-pound theorem from a 10- pound theory if pound is the unit of algorithmic information, but intrinsically you wont be able to prove that.
|
It follows from the definition that all terms Yx cannot be reduced to a form which is not further reducible; further reductions render a regress which is infinite if not stopped:Yx\im y\im xy\im xxy\im xxxy\im xxxxy\im ¼ ad infinitum. In what follows we describe applications of the fixed point operator. Notice that the identification of the combinator x with an object is purely syntactical; the semantic aspect of ``meaning'' belongs to an interpretative metalevel.
Russell paradox: Let F(f) be a property of properties f defined by F(f)=Ø f(f), where Ø is the symbol for negation. On substitution of F for f, one arrives at F(F)=ØF(F). If one considers the statement ``F(F)=ØF(F)'' as a proposition, one arrives at a contradiction. The resolution of this ``paradox'' is the exclusion of such propositions, restricting the class of allowed propositions.
When creating classical set theory, one of G. Cantor's main objectives was to cope with infinities in a suitable mathematically form (and not to cause schoolchildren headache by drilling them in useless formalism). This program fails just because of a contradiction of the above type: For Cantor, ``a set is a Many which allows itself to be thought of as One'' [9.2]. Stated differently, ``a set is the representation of a thought''. In this sense every ``thought'' corresponds to a set, and set theory encompasses all forms of thoughts, not necessarily of mathematical origin.
If we allow most general (also nonrational) processes to stand for the term ``thought'' in the above definition, we are lead to inconsistencies. Assume a ``set V of all sets that are no members of themselves'', i.e., V = { x | x Ï x}. Insertion of V for x yields a contradiction, since if V Î V, then by definition V Ï V. As a consequence of this type of ``diagonalization'' argument, V cannot be a set although it is ``thinkable''.
One solution is to restrict the term ``thought'' to ``thoughts'' which are effectively computable from specific ``axioms'' or ``basic premises''. One such ``system of thoughts'' is the Zermelo-Fraenkel system. The above contradiction is resolved by excluding ``pathological thoughts,'' thereby avoiding contradiction in this restricted universe.
Epimenides' paradox: if x=ØTrue is identified with the statement that a statement is false, then the existence of a fixed point operator expresses Epimenides' paradox by Yx=y=xy=ØTruey.
Gödel incompleteness: if x=ØPROVE is identified with the statement that a theorem is unprovable in some specified formal system, then the existence of a fixed point operator expresses Gödel's incompleteness theorem by Yx=y=xy=ØPROVEy.
The above examples resemble K. Popper's ``paradox of Tristram Shandy'' (p. ), an absurd attempt of complete self-comprehension ``going wild,'' or Zeno's ``paradox of Achilles an the Tortoise'' (p. pageref).
In this chapter, the quantum mechanical concept of complementarity will be modelled by computational complementarity. One advantage of computational complementarity over quantum complementarity, if you like, is the use of elementary primitives from the theory of (finite) automata. There is no need for introducing the sort of ``WODOO'' magic which is sometimes encountered in discussions on the epistemology of quantum theory. To put automaton models for quantum systems in the proper perspective: Strictly speaking, they correspond to nonlocal hidden variable models. The ``hidden'' physical entities are the ``true'' initial states of the automata.
Computational complementarity has been first investigated by E. F. Moore in his article Gedanken-Experiments on Sequential Machines []. The name computational complementarity is due to D. Finkelstein [,], who also made the first attempt to construct logics from experimentally obtained propositions about automata; see also the more recent investigation by A. A. Grib and R. R. Zapatrin [,]. The following investigation has been carried out independently. Although the goals are very similar, the methods and techniques used here differ from the ones used by previous authors. Therefore, no attempt is made to discuss these previous results in detail.
Informally speaking, quantum complementarity states that there are complementary observables such that it is impossible to simultaneously and independently measure all of them with arbitrary accuracy. The experimenter has then to choose which one of the complementary observables should actually be measured. The actual measurement process modifies measurement(s) of other observable(s) which are complementary to the measured observable. To put it pointedly: the experimenter has to decide which one of the many possible observables shall be measured. Measurement of this observables restricts or makes impossible measurement of other, complementary, observables.
This is also the case for entangled subsystems which are spatially separated. (The terminology ``entanglement'' has been used by E. Schrödinger; the German original is ``Verschränkung'' [].) Events associated with one subsystem depend on events associated with the other subsystem. Both subsystems may be space-like separated, a property which is often referred to as ``nonseparability'' or ``nonlocality.'' For a stunning consequence of these quantum mechanical features, the reader is referred to the delayed-choice experiment envisioned by J. A. Wheeler [].
A. Einstein, B. Podolsky and N. Rosen (EPR) have attempted to point to what they called the incompleteness of quantum theory [], and have sparked a very lively debate ever since []: Assume two spatially separated subsystems which are entangled. I.e., as the EPR argument goes, a precise measurement of some observable, say, spin or polarisation, in one subsystem could be interpreted as an indirect precise measurement of a corresponding ``observable'' in the other subsystem as well. (This conjecture is usually supported by conservation or symmetry arguments.) EPR associate elements of physical reality with the event which has been actually measured as well as with the indirectly (!) inferred event. (The notation used in this book differs from the one used by EPR; see 4.1, p. pageref.) Consider two such observables, which are complementary if they are measured in one and the same subsystem. The EPR argument asserts that if one of these observables is measured in one subsystem and the other observable is measured in the other subsystem, then one can infer both - albeit in the quantum mechanical point of view, complementary - observables in one and the same subsystem with arbitrary accuracy. Since quantum mechanics declares that complementary observables cannot be measured simultaneously with arbitrary accuracy, this, the EPR argument closes, demonstrates the incompleteness of quantum theory. Stated pointedly: the EPR argument claims that quantum mechanics cannot predict what the experimenter can ``measure (& infer).'' Finally, the authors conjectured that it should be possible to invent alternative theories which are ``more complete'' than quantum mechanics.
Such theories have indeed been proposed. They can be subsumed under the title hidden variable theories. I believe it is not unfair to state that most of the more serious models share two common features: they do not predict more observable phenomena than quantum mechanics and they are nonlocal. Yet, they provide a sort of ``classical arena'' hidden to the experimenter.
The Einstein-Podolsky-Rosen argument has been extended in various forms. J. S. Bell gave a statistical analysis of the correlation function of a system consisting of two entangled subsystems []. For a very clear introduction, see A. Peres [], A. Shimony [] and J. F. Clauser and A. Shimony [], among others. D. M. Greenberger, M. Horne and A. Zeilinger presented a stronger result by an analysis of the correlation function of a system consisting of three entangled subsystems []. While Bell's original argument is statistical, i.e., requires observation of a great number of events, the Greenberger-Horn-Zeilinger setup is demonstrated, a least in principle, by a single measurement. For a short review, see also N. D. Mermin's article []. One conclusion of all the so-far accumulated evidence is that quantum systems indeed do not allow the independent co-existence of complementary observables. (In nonlocal hidden variable models co-existing complementary observables are no longer independent.)
In a theorem by S. Kochen and E. P. Specker, the impossibility of an independent co-existence of complementary observables is interpreted algebraically []. These authors have demonstrated that in general it is impossible to ``enrich'' the quantum description by additional (hidden) variables such that in this ``more complete'' theory the logical operations and, or and not are defined as in the classical propositional calculus. Stated differently, no Hilbert lattice (of dimension > 2) can be embedded in a Boolean lattice such that the lattice operations of meet, join and orthocomplementation survive in their original context, i.e., as operations on independent elements. See also A. Peres [] and I. Pitowsky [], chapter 4.
We shall deal, informally speaking, with the intrinsic perception of a computer-generated universe. The investigation is based on the construction of primitive experimental statements or propositions. Then the structure of these propositions will be discussed, thereby defining algebraic relations and operations between the propositions. It will be shown that certain features known from experimental (quantum) logics, in particular complementarity, can be modelled by the experimental logics of finite automata.
The methods introduced here somewhat resemble those invented by J. von Neumann and G. Birkhoff [], C. H. Randall [], G. W. Mackey [], J. Jauch [], C. Piron [] and others in the context of quantum and empirical logic.
Logical structures which have been invented abstractly (such as Boolean logic) may not be suitable empirically. An ``empirically adequate'' logical formalism should be obtained [not (only) from ``reasonable'' first principles, but] by studying the experimentally decidable propositions and their experimentally determined interrelations. - This epistemologic approach might be compared to the program of non-Euclidean geometry, put forward by Gauss, Lobachevsky, Bolyai (among others) and operationalised by Einstein.
Lattice theory (see chapter 5, p. pageref) provides an effective formal representation of such logical structures. As no attempt is made here to intensively discuss these methods here, the reader is referred to the literature, in particular to G. Birkhoff's Lattice Theory [], J. Jauch's Foundations of Quantum Mechanics [], G. Kalmbach's Orthomodular Lattices [], P. Pták and S. Pulmannová's [] Orthomodular Structures as Quantum Logics, R. Giuntini's Quantum Logic and Hidden Variables [] and A. Dvurecenskij's Gleason's Theorem and Its Applications [], among others.
G. Birkhoff and J. von Neumann originally obtained an ``experimental logic'' by a ``top-down'' approach. It is based on von Neumann's Hilbert space formalism of quantum mechanics. Physical properties corresponding to experimental propositions are idendified with projection operators on the Hilbert space; or equivalently with the associated subspaces. The lattice of closed subspaces of a Hilbert space, henceforth called Hilbert lattice, corresponds to a lattice of experimental propositions. called the calculus of propositions. At best, given suitable correspondence rules, quantum mechanical Hilbert lattices and propositional calculi should be one-to-one translatable. Subsequently, J. Jauch [] C. Piron [], among others, proposed a ``bottom-up'' approach by considering the experimental structure of propositions as the primary entities. For a historical introduction, see M. Jammer's The Philosophy of Quantum Mechanics [].
A complete lattice theoretic characterisation of Hilbert lattices has been given by W. J. Wilbur []; see also R. Piziak []. Wilbur's axioms has also been shortly reviewed in chapter 5, p. pageref. A purely algebraic characterisation of infinite dimensional Hilbert lattices has not (yet?) been given. It is for instance known that infinite dimensional Hilbert lattices are atomic, irreducible, complete, satisfy the orthomodular and the orthoarguesian laws and the exchange axiom []. Yet, these features do not sufficiently define a Hilbert lattice: As has been proven by H. A. Keller [,], there are spaces which are not Hilbert spaces but have atomic, irreducible, complete lattices satisfying the orthomodular law and the exchange axiom. One may conjecture that additional axioms characterising Hilbert lattices algebraically may be necessary. Indeed, one could speculate that such a characterisation is not recursively enumerable []. In an actual proof one might try to embed arithmetic in Hilbert lattices.
In what follows, a method similar to quantum logics will be introduced for the intrinsic logics of automaton universes. Although specific classes of finite automata will be analysed, these considerations apply to universal computers as well. (Finite automata can be simulated on universal computers.) A ``bottom-up'' approach is pursued. First, the possible experimental propositions and their experimentally determined interrelations are studied. Then, these can be compared with and (for specific automaton universes even) modelled after the experimental logic of quantized systems. The first step requires the construction of an ``automaton propositional calculus.'' The latter step requires an algorithmic feature which has first been discussed by E. F. Moore in his article Gedanken-Experiments on Sequential Machines [] and which in D. Finkelstein's words [] is called ``computational complementarity.'' Yet, for the same reasons as before, such a correspondence between an automaton universe and a quantized system may only be weakly defined.
An (i,k,n)-automaton A will be considered whose transition and output tables are known beforehand and which has i internal states, k input symbols and n output symbols. Recall the single-copy and multi-copy `` Gedankenexperiments.''2 In a sense, an automaton is treated as a ``black box,'' whose transition and output tables (i.e., informally speaking, its ``intrinsic machinery'') are given in advance but whose initial state is unknown. The automaton is ``feeded'' with certain input sequences from the experimenter and responds with certain output sequences. We shall be interested in the distinguishing problem: ``identify an unknown initial state.''
The element 1 is given by the set of all states { 1, 2,¼,i}. (Recall that we are considering an (i,k,n)-automaton with i internal states, k input symbols and n output symbols.) This corresponds to a proposition which is always satisfied:
The element 0 is given by the empty set Æ (or {}). This corresponds to a proposition which is false (by definition the automaton has to be in some internal state):
The class of all propositions and their relations will be called automaton propositional calculus. In what follows, automaton propositional calculi are denoted by \D or [\D \tilde] if they are defined intrinsically or extrinsically, respectively. (In this context, intrinsic and extrinsic stands for the one-automaton and multi-automaton configuration, respectively. For a definition, see chapter 6, p. pageref.) Each particular outcome which, if defined, has the value TRUE or FALSE, shall be called ``event.'' In this sense, an automaton propositional calculus, just as the quantum propositional calculus, is obtained experimentally. It consists of all potentially measurable elements of the automaton reality and their logical structure, with the implication as order relation.
Not all of the propositions and their corresponding subsets may be accessible by experiments. - This is where the considerations become nontrivial. An experiment is said to ``identify'' a proposition of the form ``the automaton is in state aj or in state am or in state al ¼'' if the proposition can be decided by performing the experiment (yielding either one of the results TRUE or FALSE). Lattice relations of the form ``p1® p2'' as well as the lattice operations `` p1Úp2,'' `` p1Ùp2'' and ``Øp'' have the usual meaning ``implies,'' ``or,'' ``and,'' ``not'' only if an experiment properly identifies the propositions resulting from these operations. I.e., only if these operations are operational and can be defined experimentally, they are in \D.
A reminder: Let A be a set. A partition is a family {
ai} of nonempty subsets of A with the following properties:
(i) ai Çaj = Æ or ai=aj;
(ii) A=Èi ai.
E.g., a partition { {1} ,
{2,3} } generates the set {1,2,3}.
The elementary propositions can be conveniently constructed by a partitioning
of automaton states generated from the input/output analysis of the automaton as
follows: [Partition of propositions, version I]
Let V be the set of
partitions (Æ is the empty input sequence)
|
(10.1) |
|
[Partition of propositions, version II]
Let w=s1
s2 ¼sk be a sequence of input
symbols,
|
(10.2) |
|
(10.3) |
|
(10.4) |
|
(10.5) |
|
(10.6) |
Remarks:
(i) {1,2,3} is interpreted as ``1Ú2Ú3.''
(ii) A Mathematica package for computing V=Èw { v(w)} is listed in appendix , p. .
Examples:
(i) Let p1 be the proposition that the automaton is in state 1. Then p1 is identified by a partition of the form { ¼,{1},¼}. p1 is not identified by a partition of the form { ¼,{1,2},¼}.
(ii) Let p1Úp2 be the proposition that the automaton is in state 1 or in state 2. Then p1Úp2 is identified by a partition of the form { ¼,{1} ,{2},¼} or { ¼,{1,2},¼}.
The order relation ``\fo '' can be defined by a properly defined
implication, which is motivated on operational grounds, i.e., by the
possibility to perform actual experiments to verify the statements, as follows.
[Partial order]
Let pi be
propositions of the form ``the automaton is in state ai.'' A
partial order relation pj\fo pm, or
|
(10.7) |
[``Or'' and ``and'' operations]
The proposition
|
(10.8) |
The proposition
|
(10.9) |
[Complement operation]
The complement
|
(10.10) |
|
Examples:
(i) If 1={1,2,3,4,5} and p1={1,2}, then Øp1={3,4,5}.
(ii) If 1={1,2,3,4,5} and p1={5}, then Øp1={1,2,3,4}.
Remarks:
(i) The above definition of implication (order relation) cannot be consistently applied to all automata. An automaton (counter-) example will be given (p. ) whose automaton propositional calculus is not a lattice and whose ``implication'' is not transitive [].
(ii) Not all classes of partitions correspond to Moore-type automata. A more general automaton model are Mealy-type automata.
(iii) The partial order relation can be conveniently represented by drawing the Hasse diagram thereof. This can be done by proceeding in two steps. First, the Boolean lattices of propositional structures based on all relevant state partitions v(w) are constructed. [This can be done by generating the set of all subsets of v(w) and identifying the subset relation ``a Ì b'' with the implication ``a® b.''] Then, the union of all the Boolean subalgebras generated in that way renders the complete partial order of the automaton propositional calculus. This can also be understood graph theoretically [,]. A Mathematica package for computing the graphical representation of the Hasse diagram of automaton propositional calculi is listed in appendix , p. .
(iv) The lattice relation pj® pm can alternatively be defined from the operations pj Ùpm=pj or pj Úpm=pm. The tautology pj® pm « (Øpj)Úpm is valid only if pjÚpm is defined [].
(v) Classical tautologies for Boolean lattices such as []
|
a | b | |
d0 | b | a |
d1 | a | a |
a | b |
0 | 0 |
If the two intrinsic states are labelled a and b, the two inputs 0,1 and the
output 0, this automaton represents a machine which constantly outputs 0,
regardless of the input & regardless of its present and past internal
states. No experiment distinguishes between the two states a and b, and no
conclusion can be drawn from the phenomenology upon the ``hidden machinery'' of
the automaton, in particular of the existence of two separate internal states.
The isomorphic minimal-state automaton of the trivial automaton is just a
one-state machine. Its intrinsic world is monotonous, and its TRUE
propositions are of the form, ``the digit `0' appears,'' or
equivalently, ``the trivial automaton is in state `a' or in state
`b'.'' All state partitions corresponding to all input sequences coincide:
|
(10.11) |
Extrinsically, i.e., in the multi-automaton configuration, one could not extract more information from the trivial automaton than by the single-automaton configuration. Extrinsic and intrinsic propositional calculi [\D \tilde] and \D are identical, i.e., \D = [\D \tilde] and can be represented by the lattice drawn in Fig. .
|
|
\D is isomorphic to the Boolean lattice 21.
1 | 2 | 3 | |
d0 | 2 | 3 | 1 |
d1 | 2 | 3 | 1 |
1 | 2 | 3 |
e | a | t |
The resulting output sequences are repetitions of ``¼ eat¼,'' regardless of the input.
The output is a one-to-one map of the state space of the automaton. All state
partitions corresponding to all input sequences coincide:
|
(10.12) |
Once again, for the eating automaton, the extrinsic and intrinsic propositional calculi [\D \tilde] and \D coincide, i.e., \D = [\D \tilde] . The eating automaton propositional calculus can be represented by the lattice drawn in Fig. . It is isomorphic to the Boolean lattice 23.
|
|
Not all automata render a ``propositional calculus'' which is a lattice and whose ``implication'' (``order relation'') is transitive []. Consider the (4,2,2)-automaton defined in table .
1 | 2 | 3 | 4 | |
d0 | 2 | 3 | 4 | 4 |
d1 | 2 | 2 | 4 | 1 |
1 | 2 | 3 | 4 |
0 | 0 | 1 | 1 |
The ``intrinsic propositional calculus'' is defined by the partitions
|
|
|
It is obviously no lattice. The ``implication'' is not transitive either, because 1® 1Ú2 requires input ``0'' and 1Ú2® 1Ú2Ú3 requires input ``1,'' but 1 ® 1Ú2Ú3 cannot be realised by any experiment. This example shows that in general there is no guarantee that to every automaton there corresponds a properly defined automata propositional calculus which is a lattice and whose implication is a partial order relation.
One way of approaching the problem is to consider adaptive experiments instead of preset experiments, i.e., to analyse each output symbol separately, beginning with the first output (which, in the case of Moore-type atomata, comes for free) and reacting accordingly. In the case of the automaton discussed, the first output (empty input string) of an adaptive experiment would be ``0'' or ``1,'' depending on whether the automaton is in states {1Ú2} or {3Ú4}, respectively. Input ``0'' or ``1'' distinguishes between the remaining states, and the intrinsic automaton propositional calculus becomes Boolean.
In quantum theory, complementarity is often stated as follows ([], p. 154):
The description of the properties of microscopic objects in classical terms requires pairs of complementary variables; the accuracy in one member of the pair cannot be improved without a corresponding loss in the accuracy of the other member.Or, stated explicitly:
It is impossible to perform measurements of position x and momentum p with uncertainties (defined by the root-mean square deviations) Dx and Dp such that the product of DxDp is smaller than a constant unit of action [((h/2p) )/2].A historical review of the physical concept of complementarity is for instance given M. Jammer's book The Philosophy of Quantum mechanics [], chapter 4.
Computational complementarity is a very similar structure which is based on finite automata. There, measurement of one aspect of an automaton makes impossible the measurement of another aspect. E.g., in the case of the Moore automaton (see below) measurement of the proposition {1Ú2} makes impossible measurement of the proposition {1Ú3}. In a sense, computational complementarity is a ``poor man's version'' of diagonalization for finite systems: whereas diagonalization changes an infinite number of entities, in the case of finite automata, computational complementarity affects only a finite number of observables.
Consider the Moore automaton [], defined in table .
1 | 2 | 3 | 4 | |
d0 | 4 | 1 | 4 | 2 |
d1 | 3 | 3 | 4 | 2 |
1 | 2 | 3 | 4 |
0 | 0 | 0 | 1 |
The Moore automaton is an example of a type of automata with the following
remarkable feature. [Computational complementarity (E. F. Moore [])]
There exists a (finite) automaton such that
any pair of its states are distinguishable, but there is no experiment which can
determine in what state the automaton was at the beginning of the experiment.
The term ``computational complementarity'' has been introduced by D.
Finkelstein []. J. Conway calls this phenomenon ``Moore's uncertainty
principle'' []. Computational complementarity is introduced here with
Moore's original example. Readers may find the Mealy automaton defined in , p.
more comprehendible.
Proof:
Consider the Moore automaton in an arbitrary initial state. Recall that there is only a single copy to play with.
First note that any experiment will distinguish between the state 4 and any other state, since if the automaton is in state 4, any such experiment will begin with ``1'' at the first position of the output sequence. What remains to be looked at are the pairs (1,2), (1,3) and (2,3).
Lets look at the pair (2,3). Any input of length one (i.e., either 0 or 1) will distinguish between these states. I.e., if the Moore automaton were either in state 2 or in state 3 initially, one could see this immediately by reading the second bit of the experimental sequence.
What remains to be looked at are the pairs (1,2) and (1,3). (i) Lets assume we start our experiment with input ``1.'' If the automaton would be either in state 1 or in state 3, this would induce a transition according to
input | 1 | ¼ | |
state | 1 | 3 | ¼ |
output | 0 | 0 | ¼ |
input | 1 | ¼ | |
state | 3 | 4 | ¼ |
output | 0 | 1 | ¼ |
input | 1 | ¼ | |
state | 1 | 3 | ¼ |
output | 0 | 0 | ¼ |
input | 1 | ¼ | |
state | 2 | 3 | ¼ |
output | 0 | 0 | ¼ |
(ii) Lets assume we start our experiment with input ``0.'' If the automaton would be either in state 1 or in state 2, this would induce a transition according to
input | 0 | ¼ | |
state | 1 | 4 | ¼ |
output | 0 | 1 | ¼ |
input | 0 | ¼ | |
state | 2 | 1 | ¼ |
output | 0 | 0 | ¼ |
input | 0 | ¼ | |
state | 1 | 4 | ¼ |
output | 0 | 1 | ¼ |
input | 0 | ¼ | |
state | 3 | 4 | ¼ |
output | 0 | 1 | ¼ |
In other words, by a single-copy experiment, it may only be possible to obtain partial information about the Moore automaton's internal state. In particular one has to decide whether one would like to distinguish between the states 1 and 2 - and thus start the experiment with input ``0'' - (exclusive) or between the states 1 and 3 - and thus start the experiment with input ``1''. Hence, stated differently, if the initial state is chosen at random, the states 1,2 and 3 cannot be distinguished from one another at the same time, since any distiction between the states 1 and 2 makes impossible a distinction between the states 1 and 3, and any distiction between the states 1 and 3 makes impossible a distinction between the states 1 and 2. Based on this observation, let us consider the automaton propositional calculus next.
Consider the following propositions:
The extrinsic propositional calculus of the Moore automaton can be obtained from multi-automaton distinguishing experiments; i.e., by allowing experiments on an arbitrary number of identical copies of an automaton. All of the states are pairwise distinguishable. Therefore, [\D \tilde] is trivially Boolean, and hence it is distributive and thus modular and orthomodular. It is represented by Fig. .
|
|
|
(10.16) |
|
(10.17) |
|
(10.18) |
The intrinsic Moore automaton propositional calculus can be derived according to definitions 10.2.2 and 10.2.2. It is the pasting of two blocks 23. The atoms of the blocks are the elements of v(0) and v(1). Both blocks share one common element, i.e., {4}. The intrinsic Moore automaton propositional calculus is represented in Fig. .
|
|
It is an orthocomplemented lattice: the complements are given by:
|
The intrinsic propositional calculus of the Moore automaton is not a distributive and thus not a Boolean lattice. The intrinsic propositional calculus of the Moore automaton is a modular lattice, since it does not contain the lattice of Fig. 5.2, p. pageref as a subalgebra. Since modularity implies orthomodularity, the intrinsic propositional calculus of the Moore automaton is an orthomodular lattice. - It does not contain the subalgebra O6 drawn in Fig. 5.3, p. pageref.
A physical interpretation of \D has been suggested by R. Giuntini [], p. 161.
In summary, by comparing the extrinsic and intrinsic Moore automaton
propositional calculi, one obtains
|
(10.19) |
J. H. Conway and W. Brauer, among others, have studied computational complementarity of the Moore type in some detail [,]. In what follows, some results are reviewed.
[(Conway [], Brauer [])] Two distinguishable states of an (i,k,n)-automaton of the Moore type can be distinguished by some input word of length at most i-n. [Two distinguishable states of any (i,k,2)-automaton of the Moore type can be distinguished by an input word of length at most i-2.]
Moore uncertainties cannot occur in Moore type automata with fewer than four internal states.
An arbitrary number of pairwise distinguishable states of any (i,k,n)-automaton of the Moore type can be distinguished by an input word of length at most (i-n+1)ii. (If n > i, the trivial experiment with no input distinguishes between the states.)
Consider the transition and output tables of a (3,3,2)-automaton.
1 | 2 | 3 | |
d1 | 1 | 1 | 1 |
d2 | 2 | 2 | 2 |
d3 | 3 | 3 | 3 |
1 | 2 | 3 | |
o1 | 1 | 0 | 0 |
o2 | 0 | 1 | 0 |
o3 | 0 | 0 | 1 |
Input of 1, 2 or 3 steers the automaton into the states 1, 2 or 3, respectively. At the same time, the output of the automaton is 1 only if the guess is a ``hit,'' i.e., if the automaton was in that state. Otherwise the output is 0. Hence, after the measurement, the automaton is in a definite state, but if the guess is no ``hit,'' the information about the initial automaton state is lost. Therefore, the experimenter has to decide before the actual measurement which one of the following hypotheses should be tested (in short-hand notation, ``{1}'' stands for ``the automaton is in state 1'' etc.): { 1 } = Ø{ 2,3 },{ 2 } = Ø{ 1,3 },{ 3 } = Ø{ 1,2 }. Measurement of either one of these three hypotheses (or their complement) makes impossible measurement of the other two hypotheses.
No input, i.e., the empty input string Æ, identifies
all three internal automaton states. This corresponds to the trivial information
that the automaton is in some internal state. Input of the symbol 1
(and all sequences of symbols starting with 1) distinguishes between the
hypothesis {1} (output ``1'') and the hypothesis {2,3} (output ``0''). Input of
the symbol 2 (and all sequences of symbols starting with 1) distinguishes
between the hypothesis {2} (output ``1'') and the hypothesis {1,3} (output
``0''). Input of the symbol 3 (and all sequences of symbols starting with 1)
distinguishes between the hypothesis {3} (output ``1'') and the hypothesis {1,2}
(output ``0''). The intrinsic propositional calculus is thus defined by the
partitions
|
This lattice is of the ``Chinese latern'' MO3 form. It is non distributive because it contains the lattice of Fig. 5.5(d), p. pageref as sublattice. It is modular, since it does not contain a subalgebra of the form drawn in Fig. 5.2, p. pageref, and hence it is orthomodular.
The obtained intrinsic propositional calculus in many ways resembles the lattice obtained from photon polarisation experiments or from other incompatible quantum measurements. Consider an experiment measuring photon polarisation. Three propositions of the form
For (i,k,n)-Mealy type automata, the number is N = (in)ik: the transition functions map k input symbols times i states onto i states, resulting in iik possibilities, and k input symbols and i states are mapped onto n output symbols, resulting in nik possibilities. E.g., for k=n=2, i ³ 1,
number of generic (i,2,2)- | number of generic (i,2,2)- | |
Moore automata | Mealy automata | |
N(1,2,2) | 2 | 4 |
N(2,2,2) | 64 | 256 |
N(3,2,2) | 5832 | 46656 |
N(4,2,2) | 1048576 | 16777216 |
N(5,2,2) | 312500000 | 10000000000 |
N(6,2,2) | 139314069504 | 8916100448256 |
N(7,2,2) | 86812553324672 | 11112006825558016 |
N(8,2,2) | 72057594037927936 | 18446744073709551616 |
N(9,2,2) | 76848453272063549952 | 39346408075296537575424 |
N(10,2,2) | 102400000000000000000000 | 104857600000000000000000000 |
The number of non isomorphic automata grows substantially slower: for Mealy-type automata, M. A. Harrison showed that the number of classes of non isomorphic automata is asymptotic (i.e., for i®¥) to (in)ik/i!; i.e., non isomorphism reduces the number of automata by a factor of 1/i! (cf. [], theorem 6.7).
With regards to the structure of their (intrinsic) propositional calculi, one expects an increasingly complex behaviour as the number of states i is increased. One may indeed speculate that the limit of an unbounded number of internal states yields a model of computation which is at least as powerful as a universal computer. - After all, for example, a Cellular Automaton is nothing but an infinite array of interconnected finite automata. (Using cardinal numbers, one may speculate that there are À1 many infinite-state automata but only À0 many Turing machines.)
Extrinsic automaton propositional calculi, i.e., ones obtained by multi-automata configurations, are always Boolean and thus, in a sense, trivial. If there are i pairwise distinguishable states, then [\D (i)\tilde]=2i. If not mentioned otherwise, in what follows we shall therefore concentrate on the intrinsic automaton propositional calculi.
The Hasse diagrams of the automaton propositional calculi for all (1,x,x)-, (2,2,2)-, (3,2,2)-, (4,2,2)- Moore type automata and for some (5,2,2)- Moore type automata are listed in Fig. . They can be obtained by ``brute force,'' i.e., without utilising isomorphism, in four steps: (i) generate all automata; (ii) generate the set of all state partitions for these automata; (iii) generate the graphs (of the Hasse diagram) of the automaton propositional calculi; (iv) apply Compress (for a listing, see the appendix, p. ) to reduce the set to all non isomorphic graphs. The last step includes the problem of graph isomorphism, in particular the use of St. Skiena's function IsomorphicQ[g_Graph,h_Graph], to generate the set of graphs which are not isomorphic to each other. Graph isomorphism is conjectured NP-hard [].
Examples:
Fig. shows F4, the Hasse diagrams of generic intrinsic propositional calculi of Mealy automata up to 4 states. Fig. shows only those propositional calculi which are lattices. The state partitions can be generated not by analysing automata but by permutation (for a Mathematica program, see , p. ).
Examples:
Take, for instance, the set
|
1 | 2 | 3 | 4 | |
d0 | 1 | 1 | 1 | 1 |
d1 | 2 | 2 | 2 | 2 |
1 | 2 | 3 | 4 | |
o0 | 0 | 1 | 1 | 1 |
o1 | 0 | 1 | 2 | 2 |
In order to make all internal automaton states accessible, i.e., reachable by sequences of input symbols from arbitrary internal states, one has to add additional input symbols whose associated output is redundant. This can be demonstrated in another example, a Mealy-type automaton realising the propositional calculus of Fig. 10.5, p. pageref. It is given in tables .
1 | 2 | 3 | 4 | |
d1 | 1 | 1 | 1 | 1 |
d2 | 2 | 2 | 2 | 2 |
d3 | 3 | 3 | 3 | 3 |
d4 | 4 | 4 | 4 | 4 |
1 | 2 | 3 | 4 | |
o1 | 0 | 0 | 0 | 1 |
o2 | 0 | 1 | 0 | 2 |
o3 | 0 | 0 | 1 | 2 |
o4 | 0 | 0 | 0 | 1 |
More generally: Assume an arbitrary set of state partitions V with m elements
{ v1,¼,vm }; i.e., |V| = m [``|x |'' stands for the
cardinality (number of elements) of a set x]. Consider
an arbitrary state partition vj = { al }, then the set
|
(10.24) |
|
(10.25) |
In order for the internal states to be accessible (see argument above), one
needs at least i input symbols, where i stands for the number of internal
states. In order to obtain as many state partitions as there are in V, one needs
at least m=|V| input symbols.
By combining both limits from below on the number of input symbols yields the
requirement that there are at least
|
(10.26) |
|
(10.27) |
The requirement that there are at least as many output symbols as there are
elements in the state partition with the highest number of elements yields a
bound from below on the number of output symbols
|
(10.28) |
With a transition, all information of the past state of the automaton should
be destroyed. As stated before, in order to make the internal states accessible,
there have to be at least i input symbols to steer the automaton into its i
internal states. Assume that the internal states as well as the input symbols
are labelled by successive positive integers, beginning with 1. Let q denote an
arbitrary internal state. Then, the transition function
|
(10.29) |
Note that any propositional calculus of Moore-type automata can be obtained by Mealy-type automata as well (but not vice versa). If the first output of Moore-type automata is omitted, then both automaton types realise the same class of propositional calculi.
|
|
|
|
The previous sections concentrated on the construction of a suitable propositional calculus from the input/output analysis of an automaton. What is called the inverse problem here is the construction of suitable automata which correspond to particular (orthomodular) lattices, in particular to subalgebras of Hilbert lattices. One could speculate that, stated pointedly, similar to the ``induction'' of the Hilbert space formalism of quantum mechanics from an experimental quantum propositional calculus, a correspondence between a certain class of automaton propositional calculi and subalgebras of Hilbert lattices could be postulated. Stated differently: ``given an arbitrary orthomodular (subalgebra of a Hilbert) lattice \L; is it possible to construct an automaton propositional calculus \D realising \L?''
In developing an answer to this question, the notion of ``generic automaton propositional calculus'' has to be made precise. This can be done by the following definition of a partition logic (cf. section 5.4.2, p. pageref). [Partition logic] Consider a set M and a set P of partitions of M. Every partition P Î P generates a Boolean algebra of the subsets in the partition P. As for Boolean algebras, the partial order relation is identified with the subset relation (set theoretic inclusion) and the complement is identified with the set theoretic complement. The pasting of an arbitrary number of these Boolean algebras is called a partition logic. Since any set of partitions can be realised by some automaton propositional calculus and vice versa (cf. p. pageref), the class of partition logics and the class of automaton propositional calculi are equivalent. We are now in the position to state the following result (for the notion of prime, see definition 5.3.6, p. pageref): [[]] Any orthomodular lattice L is isomorphic (1-1 translatable) to some partition logic (automaton propositional calculus) iff L is prime. I.e.,
| |
|
| |
|
For a proof, see M. Schaller and K. Svozil [].
Remark:
The notions of two-valued states and prime ideals are equivalent (cf. p. pageref). Therefore, prime orthomodular lattices admit two-valued states.
Examples:
For elementary orthomodular lattices, theorem 10.4 can be verified easily. Recall theorem 5.3.7, p. pageref, stating that every orthomodular lattice is a pasting of its blocks. (Block decomposition may be NP-hard.) ``At face value,'' every automaton state partition v(¼) with n elements generates a Boolean algebra 2n. If these Boolean algebras are identified with blocks, the set of automaton state partitions V represents a complete family of blocks of the automaton propositional calculus.
Some concrete examples of the construction of an automaton state partition from a prime orthomodular lattice will be considered next. - In general there will be infinitely many automata whose propositional calculi are isomorphic to \L.
For an easy start, consider the pasting of two disjoint blocks; e.g.
of two 23. Label the atoms of these blocks by 1,2,3 and 4,5,6.
Imagine two separate automata A1 and A2 whose internal
states are labelled by 1,2,3 and 4,5,6, respectively. Assume that all internal
automata states can be distinguished separately, yielding two state partitions
{{{1},{2},{3}}} and {{{4},{5},{6}}}. - A (nonunique) construction technique for
Mealy automata from arbitrary state partitions has already been described above
(p. pageref). The pasting of
these subalgebras can for instance be achieved by substituting the union of the
first atom of one algebra and all atoms from the other algebra for the
first atom of one algebra and vice versa. E.g., in the above example,
|
|
|
Figure Picture Omitted Picture Omitted Figure |
In a similar way, a pasting of two almost disjoint Boolean algebras (a
Greechie logic) with one common atom (and its complement) could be obtained by
additionally substituting the union of one atom of one algebra and one
atom from the other algebra for the first atom of one algebra and vice
versa. The two respective atoms should no longer occur in other elements of
the partition. E.g., in the above example, states 3 & 6 are identified:
|
|
|
Figure Picture Omitted Picture Omitted Figure |
A generalisation to arbitrary pastings of an arbitrary number of blocks is
straightforward. For example, it is relatively straightforward to construct an
automaton which is pasting of two 24 with two common
elements; see Fig. . A possible realisation by state partitions is
|
|
|
Another ``all-time favourite'' pasting is represented in Fig. . A possible
realisation by state partitions is
|
|
|
In summary: One interpretation of pasting is the generation of a new ``product'' automaton A from a family of automata A* = {A1,A2,¼, Al}. Let all Ai's have disjoint input alphabets of exactly one symbol per automaton. (More symbols per automaton would do as well.) Let all Ai's have disjoint internal states. Let all Ai's have identical output alphabets (some output symbols may not be needed). The input alphabet of A consists of the union of the disjoint input alphabets from the automata in A* . The internal states of A consist of the union of the disjoint internal states from the automata in A* , with certain ``pasted'' elements identified.
To realise a pasting (in the form described here) of l blocks of at most of order 2j (i.e., l Boolean algebras with at most j atoms), one needs at least l input symbols and j output symbols. These numbers are not optimal, since the construction method employed here is not optimal.
Not all automata propositional calculi correspond to orthomodular lattices.
By the loop lemma 5.3.7, p.
pageref, any pasting of
almost disjoint Boolean subalgebras which contains a loop of order 3 or 4 is no
orthomodular lattice. As an example, consider the following partition:
|
|
Figure Figure Picture Omitted Picture Omitted Figure |
The Greechie diagram reveals a loop of order 3. A loop of order 4 is contained in the propositional calculi drawn in Figs. and , which contain the subalgebra O6 drawn in Fig. 5.3, p. pageref and which are therefore no orthomodular lattices.
|
|
|
|
For a review of more general results, see G. Kalmbach, Orthomodular Lattices [], chapter 4.
[J. R. Greechie []] The orthomodular poset given in Fig. does not correspond to any partition logic. It admits no states.
Proof by contradiction: Assume that there exists a partition logic corresponding to the orthomodular poset drawn in Fig. 10.17 (wrong). The set of all internal automaton states is denoted by A. Consider an arbitrary element ai Î A. This element has to be contained in exactly one of the atoms of the block x. Without loss of generality one can assume that ai is contained in the atom denoted by 1. ai has also to be contained in exactly one of the atoms of the block y. Without loss of generality one can assume that ai is contained in the atom denoted by 2. ai has also to be contained in exactly one of the atoms of the block z. Without loss of generality one can assume that ai is contained in the atom denoted by 3. Any attempt to associate ai with one of the atoms of the block d yields a contradiction with the assumption that ai has to be contained in one and only one of the atoms of the blocks x, y and z.
This can be traced back to the feature that there exists two disjoint coverings {a,b,c,d} and {x,y,z} of the orthomodular poset by its blocks such that the number of blocks in the two coverings is different. An orthomodular lattice which does not correspond to a partition logic is given in P. Pták and S. Pulmannová [], p. 37.
The following corollary is a direct consequence of theorem 10.4: Any prime orthomodular subalgebra of a Hilbert lattice is isomorphic (1-1 translatable) to some finite automaton propositional calculus. I.e.,
| |
|
| |
|
| |
|
Every Hilbert lattice is orthomodular, but not every orthomodular lattice is
the subalgebra of some Hilbert lattice. For example, consider the Greechie and
Hasse diagrams drawn in Fig. of an orthomodular lattice introduced by R. J.
Greechie and reviewed by R. Giuntini ([], chapter 15, p. 139). The
orthoarguesian law does not hold in this lattice. Yet, by the loop
lemma 5.3.7, p. pageref, it is an orthomodular
lattice. With the transformations
|
|
|
|
The transition and output tables of a Mealy-type automaton realising the set of state partitions and thus yielding an orthomodular lattice which is not a subalgebra of some Hilbert lattice is enumerated in table .
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
d1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
d2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
d3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
d4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
d5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
d6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
d7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
d8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
d9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 |
d10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
d11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | 11 |
d12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 |
d13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 |
d14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 | 14 |
d15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
o1 | 0 | 1 | 2 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
o2 | 1 | 1 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
o3 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
o4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 2 | 2 | 0 | 0 | 0 | 0 |
o5 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 2 | 1 | 0 | 0 | 0 | 0 |
o6 | 2 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
o7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 2 | 0 |
o8 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
o9 | 0 | 1 | 2 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
o10 | 1 | 1 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
o11 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
o12 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 2 | 2 | 0 | 0 | 0 | 0 |
o13 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 2 | 1 | 0 | 0 | 0 | 0 |
o14 | 2 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
o15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 2 | 0 |
It remains to be seen whether microscopic phenomena indeed can be represented by Hilbert lattices or, alternatively, by the larger class of orthomodular lattices, or even by the larger class Fn (see below. So far, no experiment has been performed in order to test, e.g., the orthoarguesian property of quantum mechanics in Hilbert space.
Having established a similarity between types of lattices of the subspaces of a Hilbert space (or more general geometric spaces) - i.e., those which are prime (which implies that they admit states) - and certain automaton propositional calculi, one might then proceed by considering (probability) measures on such non distributive structures. It can be expected that by Gleason-type theorems a Hilbert space formalism similar to quantum mechanics is recovered.
The lattice-theoretic answer might be sketched as follows. Let Fi stand for the family of all intrinsic propositional calculi of automata with i states. From the point of view of logic, the intrinsic propositional calculi of a universe generated by universal computation is the limiting class limn® ¥Fn of all automata with n ® ¥ states. Since F1 Ì F2 Ì F3 Ì ¼ Ì Fi Ì Fi+1 Ì ¼, this class ``starts with'' the propositional calculi represented by Fig. 10.8, p. pageref.
It is tempting to speculate that we live in a computer generated universe. But then, if the ``underlying'' computing agent were universal, there is no a priori reason to exclude propositional calculi even if they do not correspond to an orthomodular subalgebra of a Hilbert lattice. I.e., to test the speculation that we live in a universe created by universal computation, we would have to look for phenomena which correspond to automaton propositional calculi not contained in the subalgebras of some \SS - such as, for instance, the one represented by Fig. 10.18, p. pageref.
There is again a difference between the extrinsic and the intrinsic configuration: whereas in the extrinsic setup, the experimenters posess an unlimited number of identical copies of the automaton, in the intrinsic setup, both experimenters share a single automaton copy. In the latter case, if the automaton features computational complementarity, the automaton's response to one experimenter may affect its response to the other experimenter. It is therefore suggestive to identify the correlation function of the extrinsic experiment with the ``classical'' correlation functions. The intrinsically obtained correlation functions will be ``quantum-like.''
Assume that the automaton outputs ``suitable'' numbers sj Î \R. The automaton's initial state is chosen at random. A
correlation function (expectation value) can be defined by the average
product of outputs of two (successive) experiments; i.e., by counting the
frequency of occurrences for N experiments (here, ``×'' stands for scalar
multiplication):
|
(10.30) |
For example, take the Mealy-type automaton defined in section 10.3.2, p. pageref, but with a different output table: the symbol ``0'' (no hit) is substituted by the number ``-1'' (other output functions may do as well). The output table then is
si | 1 | 2 | 3 |
o1 | 1 | -1 | -1 |
o2 | -1 | 1 | -1 |
o3 | -1 | -1 | 1 |
Assume further that the initial automaton state which has to be guessed is presented at random, i.e., for a large number of experiments the extrinsic frequency of occurrence of the internal states 1, 2 and 3 is 1/3. In order to compute the extrinsic correlation function [C\tilde](j,m), it then suffices to consider table .
1 | 2 | 3 | |
1 | 1 | -1 | -1 |
1 | 1 | -1 | -1 |
[C\tilde](1,1)= | 1/3(1+ | 1+ | 1)=1 |
1 | 2 | 3 | |
1 | 1 | -1 | -1 |
Ø1 | -1 | 1 | 1 |
[C\tilde](1,Ø1)= | 1/3(-1- | 1- | 1 )=-1 |
1 | 2 | 3 | |
1 | 1 | -1 | -1 |
2 | -1 | 1 | -1 |
[C\tilde](1,2)= | 1/3(-1- | 1+ | 1 )=-1/3 |
The correlation function is symmetric in the arguments, i.e., C(j,m)=C(m,j). One can therefore consider C(|j-m|).
It is not difficult to see that the following ``Bell-type'' inequality holds:
|
(10.31) |
In order to evaluate the intrinsic correlation function (expectation value) C(j,m) properly, one would have to evaluate a properly defined probability measure on the non distributive lattice represented in Fig. 10.6. One way of doing this is to establish a link to the lattice of subspaces of a Hilbert space, and then using the scalar product of Hilbert space for the definition of probability. This would give a correlation function similar to the quantum mechanical one for the spin-1/2 system, resulting in a violation of the Bell inequality (31).
One may conceive that the two experimenters are located at (in what amounts to in their intrinsic model) ``two spacelike separated points.'' Yet, since actually the experimenters perform measurements on one and the same automaton copy, they could recognise this ``entanglement'' or ``inseparability,'' or ``nonlocality.'' The obvious contradiction with their separateness might cause much puzzlement.
The above considerations on computational complementarity were motivated by modelling automata with ``quantum-like'' behaviour, at least with respect to the quantum mechanical feature of complementarity. These considerations shall now be extended to measurement processes in mechanistic systems of a general kind.
Assume two propositions (or features) of a (mechanistic) system, denoted by F1 and F2. Successive measurements of F1 and F2 do not necessarily ``disturb'' each other, but we are not interested in this trivial case. If, on the other hand, measurement of F1 - caused by some kind of ``interaction'' of the system and the measurement apparatus - makes impossible a successive measurement of F2, then the resulting propositional structure will reflect ``complementarity.'' In particular, the propositional calculus will not be distributive and Boolean. Stated differently, the paradoxical attempt of certain measurement procedures to measure the object while at the same time changing its state, yields an experimental propositional calculus which, in some aspects, resembles the quantum mechanical one. - Indeed, one may speculate that quantum theory is the only theory so far which implicitly takes into account this kind of ``complementarity.''
For example, imagine a dark room with a tennis ball (or a frog!) moving in it. Assume further an experimenter therein, who is not allowed to ``turn on some light,'' and has to touch the ball (frog) in order to measure its position and velocity. As the experimenter touches the ball (frog) in order to measure its position, it will change its velocity.
The above consideration seems to necessitate a non distributive experimental propositional calculus for classical physics as well; at least for those circumstances where, intuitively speaking, the measurement processes causes ``disturbances'' which are ``of the same order'' than the measured effect. Classical physics pretends that the interaction can be made arbitrarily small, although in many circumstances this requirement is of little operational relevance. Quantum mechanics postulates a bound from below - Planck's constant - for a transfer of action between a measurement apparatus and the measured object. Similar arguments have already been put forward and formalised in reference []. For earlier formalisations in the context of an ``operational logic,'' see C. H. Randall [] and G. W. Mackey [].
A system ``forecast'' or ``prediction'' will be decomposed into two distinct requirements or phases: the algorithmic representation or description of a system, succeeded by the actual computation of a prediction, based on that representation. It will be assumed that the system is ``sufficiently complex'' to support universal computation. One appropriate representation or description of a computable system is an algorithm or program code which, implemented (extrinsically) on different computers or (intrinsically) within the same system, yields a replica of that system. A mechanistic system can also be considered as formal system, with the index or Gödel number of the axioms serving as description. In this sense the terms ``(algorithmic) description'' and ``program code,'' as well as ``index'' or ``Gödel number'' are synonyms. (Historically, such descriptions have been interpreted as ``natural laws'' governing the system.)
We shall now be concerned with the question of inferring ``laws,'' i.e., algorithmic descriptions, from the experimental input/output analysis of a system. Considerations will be restricted to mechanistic, i.e., computable systems. - By definition, for systems which are non mechanistic, no reasonable concept of ``recursive law'' can be given. (Probabilistic laws may still apply.) The system is perceived as a ``black box'' on which experiments are allowed only via some kind of input/output terminal. Classical results reviewed in chapter 9 apply.
Nevertheless, any universal computer U can simulate a mechanistic system once the recursive ``law'' governing the system is known. This can be done by encoding the mechanistic system (as a program) on U and performing the simulation of the system evolution on a ``step-by-step'' basis. In that way it is possible to simulate a mechanistic physical system completely and in finite time. I.e., any entity in the mechanistic system can be brought into a one-to-one correspondence with its simulation. In many cases, this simulation will take longer than the evolution of the original system, since the latter one is, stated pointedly, ``the perfect computing agent to simulate itself.'' - Think, for instance, of systems which consist of a great number of processes which are ``local'' in the sense that they are not affected by other processes taking place in ``great (space, time, ¼) distance.'' Simulation of such systems on a (universal) computer with only one processor unit would be inefficient. [Performance could be improved by the use of (universal) computers which are adapted to the system; in the case of ``local'' processes, by the use of Cellular Automata instead of sequential machines such as the Turing machine.] Therefore, externally, given an algorithmic description, a prediction of a mechanistic physical system is not so much a problem of principle - let alone the recursive unsolvability of the rule inference problem - but a problem of performance, i.e., of time and other computational resources.
Of course, there exist trivial possibilities of forecast. E.g., one may think of a universal computer simulating the mechanistic system with higher speed of computation than the mechanistic system itself. A speedup can also be realised for periodic systems.
Consider the following Gedankenexperiments.
Example 1: Consider a physical system S producing numbers on a display. Assume an observer A, for whom S for all practical purposes is a ``black box''; i.e., despite the input & output terminals (e.g., keyboard, display), A has no knowledge of S. Assume a second observer B, who, by intuition or other insight, knows that S calculates the digits of p, displays them, and in doing so, has arrived at a specific n'th digit. In this case one may ask the following questions.
(i) (Rule inference problem) How does A without communicating with B learn about the ``meaning'' of S, i.e., how could A find out that S outputs the digits of p?
(ii) (Halting problem) To what extend is the predictive power of B restricted by finite computational resources? - E.g., what sense makes any ``knowledge'' claimed by B that S has arrived at the n=10200'th decimal place of p, which is the digit ``7,'' according to B? For even if A uses a whole galaxy as computer, and even if A is willing to wait for the result of the computation for a time comparable to the age of the universe, at least with present-day mathematical means, it is impossible to confirm this statement and to predict the 10200'th decimal place of p [,]. Although ideally p can be calculated deterministically to arbitrary precision, one is forced to apply a probabilistic description by restrictions in computational resources and intuition. (This was, after all, the perception of Laplace's ``old'' probability theory.) To put it pointedly: Although A may have no access to a CRAY 2 computer, A might be willing to believe D. H. Bailey's claim [] (see also J. von Neumann et al. [] for a false conjecture) that the next ten digits following the 29 359 000'th digit in the decimal expansion of p are 3, 4, 1, 9, 2, 8, 4, 1, 7, 8, but, on rational grounds, A cannot accept a claim such as ``with a probability greater than 1/10, the 10200'th digit in a decimal expansion of p is 7 ''.
Example 2: Assume the same setup as for example 1, with the difference that, for B, S produces Champernowne's sequence []; i.e., the enumeration of the set of all finite ``decimal'' words, i.e., words using the alphabet 0,1,2,3, ¼9, in lexicographic order (9 > 8 > 7 > ¼ > 1 > 0) ``0 1 2 3 4 5 6 7 8 9 10 11 12 13 ¼.'' Assume A performs some statistical tests in order to find some hint on how the sequence is generated. In particular, let us assume that A counts the frequencies of arbitrary fixed sequences such as ``1'', or ``79'', or ``16¼0''- D. G. Champernowne has shown that the enumeration is a Bernoulli-sequence [], i.e., any arbitrary partial sequence occurs with the expected limiting frequency. The same has been demonstrated numerically [,] for the decimal expansion of p up to 26 million places and for partial sequences up to length 6.
The above Gedankenexperiments are no exceptions. Due to the recursive unsolvability of the rule inference problem, there are rather few physical systems whose laws can be determined exactly. But even if an initial value and the law governing the system were known precisely it would in general be impossible to forecast the future behaviour of the system: As has already been pointed out, due to the recursive unsolvability of the halting problem and the non recursive enumerability of the maximal halting time, no ``computational shortcut'' would be feasible.
The moral of these examples is the fact that sequences ``looking random'' may stem from a low-complex but hidden and unknown deterministic evolution. Indeed, due to a possible ``creation'' of algorithmic information, in the limit of infinite computation time, a low-complex dynamical system could produce a sequence which is random in the Martin-Löf/Solovay/Chaitin sense (see , p. ), although no recursive bound could be given that would ``guarantee'' that, at any particular moment, the system outputs objects with incompressible algorithmic information content.
Presently, in physics no distinction is being made between randomness, as for instance postulated by quantum mechanics and continuum physics, and undecidability; the difference being that undecidability occurs even for computable systems (i.e., halting problem etc.), whereas randomness implies uncomputability. One may even speculate that, what is presently perceived as ``randomness'' in physics will eventually turn out to be the feature of ``undecidability'' of computable systems. Stated pointedly, ``insofar randomness is identified with unpredictability, deterministic systems are random, and (as will be argued later,) insofar randomness is defined by indeterminism, it is not operationalisable.''
Since already externally the problems of induction and forecast are generally recursively unsolvable, one may suspect that the situation might not get worse in the intrinsic case. However, whereas in the external setup the experimenter is allowed to ``set aside'' a complete simulation of the observed system without altering the original system, this is not the case in the intrinsic setup. There, any model simulation of the system is necessarily part of that same system. This results in a worsening of the speedup theorem; i.e., of the unpredictability of the system phenomenology. For systems theory review, see L. Löfgren [].
The question of a description within the same system or process is related to the question of whether an agent can posess an intrinsic ``theory,'' ``description'' or ``blueprint'' of itself. This has been analysed in the context of self-reproduction. Consider the question, ``Can a (universal) computer reproduce itself?'' According to J. von Neumann, there are (at least) two distinct meanings or interpretations of this question ([], pp. 118-126). The answer will therefore be ``yes'' or ``no,'' depending on the way in which self-reproduction is realised.
In the first mode of self-reproduction, which will be called the ``passive mode'' (quotation from A. W. Burk's editorial comments [], pp. 125-126), ``the self-reproducing automaton contains with itself a passive description of itself and reads this description in such a way that the description cannot interfere with the automaton's operations.'' Exactly how the description inducing self-reproduction is obtained is irrelevant to the argument. It comes from a source external to the automaton, presumably from an oracle. (As has been discussed already in chapter 11, p. pageref, any external source trying to obtain the automaton's description has to cope with the recursive unsolvability of the rule inference problem.) See also remark (iii) below.
It is indeed possible to construct a program which includes its own description and, through that description, is able to reproduce itself. The affirmative answer can be intuitively envisioned as follows (see A. W. Burks, [], p. 55): A Gödel number may be regarded as a description of a formula. At least in some cases, the Gödel number of a statement may be described in fewer symbols than the statement, else Gödel's self-referring undecidable statement (which is central for a proof of Gödel's incompleteness theorems) could not exist.
John von Neumann was one of the first who has put forward an explicit formal (Cellular Automaton) model of a universal self-reproducing automaton []. A formal proof of the existence of self-reproducing automata is too technical to be included here. See, for instance, von Neumann's original work [], or H. Rogers' Theory of Recursive Functions and Effective Computability [], p. 188, which also contains the following informal proof. It uses a scheme of a self-reproducing machine which has the structure M=(D,C,E,(b,i)), where D is a ``blueprint realiser'' (that can build an object from a given ``blueprint program''), C is a ``program copier,'' E is some supplementary equipment for handling inputs and outputs for C and D, and (b,i) is a ``program'' consisting of b, which is the ``blueprint program'' for D,C,E; i is a set of supplementary instructions. The machine takes its orders from i and operates as follows. b is placed in D, and replicas D¢,C¢,E¢ of D,C,E are produced. Then (b,i) is placed in C, and a copy (b¢,i¢) is made. The reproduction M¢=(D¢,C¢,E¢,(b¢,i¢)) is then assembled. (The index `` ¢ '' stands for the copy here.)
Remarks:
(i) There is a difference between a possible self-description and the impossibility of self-prediction. Whereas certain finite algorithms can reproduce a copy of their own past, they are unable to ``catch up'' with their immediate presence and predict their future. This will be discussed below.
(ii) The reproduction circle as it has been conceived does not contain any process step similar to diagonalization. Paradoxical reproduction attempts which originate from diagonalization will be considered below.
(iii) It has been assumed that the ``blueprint program'' b (interpretable as the ``algorithmic description'' of M in terms of ``laws'' and ``system parameters'') is known beforehand, presumably from an oracle. Oracle computation is necessary here because due to the recursive unsolvability of the rule inference problem, no effective computation exists which in general outputs the ``laws'' governing a mechanistic system.
(iv) Note, however, that there exist computer states or configurations, in the Cellular Automaton context called the ``Garden of Eden'' configuration, which cannot be produced by any other configuration. Therefore, such configurations cannon reproduce themselves.
Example:
J. H. Conway's Life Cellular Automaton [,] is a universal computer which may exhibit self-reproducing configurations.
The following theorem summarises the possibility of self-reproduction in the passive mode. There exist complete intrinsic theories (algorithmic descriptions) of computable systems which are defined passively, i.e., without self-examination. For a proof, see J. von Neumann [].
The second mode of description, called the ``active mode,'' is more relevant for application, where intuition or oracles are rarely encountered. In the active mode, the ``self-reproducing automaton examines itself and thereby constructs a description of itself'' (quotation again from A. W. Burk's editorial comments [], p. 126).
Unfortunately, this characterisation may give rise to misunderstandings: if a self-reproducing automaton is specified as a device consisting of elements which can be analysed, identified, and, after this analysis, restored to their previous state, then self-reproduction by self-inspection can indeed be performed. As has been pointed out by R. Laing [], one of many possible strategies is, informally speaking, to divide the automaton into two distinct parts. Each part contains, in some form, an analysing and a constructing element. Initially the first part analyses the second part, which is assumed to be passive, and constructs a copy of it. Then the first part activates the second string and becomes passive. The second part analyses the first part and constructs a copy of it. In that way, a copy of the original automaton is obtained.
Nevertheless, while feasible with specific assumptions, strategies of this type are not generally applicable. One reason why the above strategy fails for a more general type of automata is the feature of computational complementarity encountered in single-automaton experiments (see chapter 10, p. pageref). A ``diagnostic'' stimulus analysing (part of) some automaton in general cannot be made ``nondestructive:'' Assume some (universal) computer consisting of parts which feature computational complementarity. Self-reproduction by self-inspection in the active mode would require that the act of observation of the initial state (distinguishing problem) would not destroy the possibility to measure other aspects of this part and would not make impossible the reconstruction of the original state. As has been pointed out by E. F. Moore [], this is impossible in the single-system configuration; i.e., in a setup where only one copy of the part is available.
In the general case, self-reproduction by self-inspection is not feasible. Even before the publication of E. M. Gold's findings [], J. von Neumann suspected that an argument utilising diagonalization in the form of the Richard paradox may give a hint on this question. In J. von Neumann's own words ([], pp. 121-122):
¼ In order to copy a group of cells [[J. von Neumann uses a Cellular Automaton model]] ¼ it is necessary to ``explore'' that group to ascertain the state of each one of its cells and to induce the same state in the corresponding cell in the area where the copy is to be placed. This exploration implies, of course, affecting each cell of this group successively with suitable stimuli and observing the reactions. This is clearly the way in which the copying automaton B can be expected to operate, i.e., to take the appropriate actions on the basis of what is found in each case. If the object under observation consists of ``quasi-quiescent'' cells6 ¼, then these stimulations can be so arranged as to produce the reactions that B needs for its diagnostic purposes, but no reactions that will affect other parts of the area which has to be explored. If an assembly G, which may itself be an active automaton, were to be investigated by such methods, one would have to expect trouble. The stimulations conveyed to it, as discussed above, for ``diagnostic'' purposes, might actually stimulate various parts of G in such a manner that other regions could also get involved, i.e., have the states of their cells altered. Thus G would be disturbed; it could change in ways that are difficult to foresee, and, in any case, likely to be incompatible with the purpose of observation; indeed, observing and copying presuppose an unchanging original.¼
If one considers the existing studies concerning the relationship of automata and logic, it appears very likely that any procedure for the direct copying of a given automaton G, without the possession of a description LG, will fail; otherwise one would probably get involved in logical antinomies of the Richard type.
What is
the Richard paradox? Published in 1905 by Jules Richard [], it is one
of the first ``classical'' paradoxa of (meta-)mathematics. Reviews can be found
in St. C. Kleene's Introduction to Metamathematics [], p. 38,39, as well
as in J. von Neumann's Theory of Self-Reproducing Automata [], pp.
123-125. Richard's paradox resembles the Berry paradox which uses a language to
describe numbers. Let E1,E2,E3,¼ be an enumeration of all expressions of the language which
define functions of one natural number variable with two values 0 and 1. For
example, the expression ``n is odd'' can be used to define a function
which has the value 1 if n is odd and has the value 0 if n is even. Let
fi(n) be the function defined by expression Ei, and define
-fi(n) by
|
In a proof by contradiction, it is assumed that the enumeration E1,E2,E3,¼ is complete in the sense that it lists all expressions corresponding to functions, and that, since E¢ is obtained by a trivial ``bit switch'' (diagonalization), E¢ is expressible in the language (wrong). But then, E¢ has to be somewhere in the enumeration of all expressions E1,E2,E3,¼. Yet, E¢ corresponds to the function -fn(n), which differs from every function in the enumeration at least at fn(n). Consequently, E¢ cannot be in the enumeration of expressions E1,E2,E3,¼. Yet, E¢ is supposed to be an expression which is trivially obtained from the enumeration E1,E2,E3,¼!
There is no other consistent alternative than to assume that the expression E¢ cannot be expressed by the language and, since E¢ is obtained by a trivial ``bit switch'' [(diagonalization) which is surely expressible in any ``reasonably complex'' language] from the supposedly complete enumeration of expressions E1,E2,E3,¼, that there does not exist a complete enumeration of expressions; i.e., no enumeration lists all expressions corresponding to functions definable by the language.
The Richard paradox can be utilised to formulate the following theorem
(resembling the non-enumerability of the recursively enumerable reals, p. pageref). [Incompleteness of
intrinsic theories]
In general, no complete
intrinsic theory (algorithmic description) of a universal computable system can
be obtained actively, i.e., by self-examination.
Proof:
(i) Assume a recursive step-by-step enumeration of a mechanistic
system. This recursive enumeration can, for instance, be thought of as an
infinite computation which corresponds to the mechanistic system. Any event
corresponds to some particular output (cf. 4.1, p. pageref). It generates the
entire phenomenology. The recursive enumeration is specified by a computable
total translation function from the natural numbers onto the computable real
numbers ENUM :\N ® \R. Let us call some t a
name for p if ENUM derives p from t. (In technical terms, t can
be interpreted as the Gödel number or index or code of p.) It is thus possible
to enumerate all numbers which occur in {pi} via
|
(12.1) |
A notion of element of (physical or automaton) reality is used which refers only to outcomes of actual measurements. This concept differs from the EPR-terminology [], where the existence of ``elements of physical reality'' is claimed even for experiments which have not been performed. The sets {pi} or {pij or the matrix [p] defined by [p]ij=pij correspond to the entirety of elements of (physical or automaton) reality.
(ii) Every intrinsically definable theory must be consistently representable from within the system and therefore must be associated with some sequence pi in the enumeration.
(iii) One may ask, ``does there exist a complete intrinsic theory T in the sense that the entire phenomenology of the system is derivable from T?'' The formal translation of such a statement is the existence of a name tT , such that T=ENUM (tT) encodes all pi's, including itself. Rather than proving the nonexistence of an intrinsic and complete theory by enumeration of all pij's, it suffices to show that no intrinsic theory exists which reproduces the diagonal sequence 0.p11p22p33¼ of outcomes.
It is indeed possible to enumerate all pij's in a single
``universal'' real number u by drawing counterdiagonals from the upper right to
the lower left in the matrix [p]:
|
(12.2) |
In this way one obtains
|
(iv) [diagonalization]: Now consider the real number d =
d1d2d3¼, which is
constructed from u by taking the subsequent diagonal elements pnn and
switching its bit, i.e., by substituting
|
(12.3) |
Notice that d is obtained by a simple function g(u ) which would be computable from within the system. (It has been assumed that the system is capable of universal computation.) Evidently, d is different from every one of the ENUM(t) in at least its diagonal element pnn. This means that there is no t which is a name for d .
(v) This fact results in the following alternative. Either one of the following statements is true:
(I) There does not exist any tT for which ENUM (tT )=u . This has as its immediate consequence the incompleteness of any intrinsic theory T¢ which is representable by some code tT¢ within the system. This means that there is no intrinsically definable and complete description of how the system operates; There is no other consistent choice than (I).
(II) There exists a name for u and thus for d, but contradiction occurs: Recall that the enumeration is complete if u and thus d would have some name, say tu and td, then these names would occur somewhere in the enumeration; assume at the m'th and the n'th position. But there exists at least one digit dn in the binary expansion of d which is not equal to pnn.
Of what use is a passive description? Can a mechanistic system ``comprehend'' itself completely? Indeed, it may be suspected that a finite intelligence to which is presented an (algorithmic) description of itself will never be able to get a complete ``self-comprehension'' of itself through that description. In a sense, this is a direct consequence of the recursive unsolvability of the halting problem, stating that, in general, no ``speedup'' is possible. One may still consider predictions obtained by the step-by-step enumeration of a system.
The following argument resembles Zeno's paradox of ``Achilles and the Tortoise'' []. K. Popper has given a similar account [], based on what he calls ``paradox of Tristram Shandy.'' Think of the attempt of a finitely describable ``intelligence'' or computing agent to understand itself completely. It might first try to describe itself by printing its initial description. (It has been argued above that there is nothing wrong with this attempt per se, and that there indeed exist automata which contain the ``blueprint'' of themselves.) But then it has to describe itself printing its initial description. Then it has to describe itself printing its printing its initial description. Then it has to describe itself printing its printing its printing its initial description ¼ ad infinitum. Any reflection about itself ``steers'' the computing agent into a never-ending vicious circle. In a sense, ``in the limit of an infinity of such circles,'' the agent has completed the task of complete self-comprehension. Yet, for any finite time, this cannot be achieved.
One may state the above argument in terms of the universal self-reproducing automaton M introduced for an informal proof of the existence of such machines. Iteration [which can be encoded by an algorithm of length O(1)] of the process of self-reproduction yields a new machine, which is just the concatenation of the original machine and its copy, yields MM¢, MM¢(MM¢)¢, MM¢(MM¢)¢(MM¢(MM¢)¢)¢, ¼ ad infimum. Since no generation is equal to any previous one, the process never stops. If self-reproduction is performed in parallel, each iteration step takes one (discrete) unit of time. Of course, the similarity to Zeno's paradox of Achilles and the turtle is only informal, since one does not have any notion of ``distance'' between two algorithms. [One may count the program size, realising that two successive algorithms increase their sizes by a factor of two. Then, if one associates an algorithmic probability of 2-L to an algorithm of length L, ``in the limit of infinitely many self-reproductions,'' a vanishing algorithmic probability is obtained.] Nevertheless, after each reproduction, a new algorithm, incorporating two copies of the original algorithm exists. Therefore, at least for finite effective computations and for finite time, the reproduction never yields a complete simulation, even if reproduction is automated: if an recursive algorithm allows a finite code for the automatic reproduction of itself, it would not be able to ``catch up'' with its present form. Stated pointedly: even if a ``mechanistic intelligence'' knows all about itself, it would neither be able to comprehend its own present state nor to predict its own future.
Paradoxical constructions and the associated proof by contradiction via diagonalization is a main tool for an investigation of undecidability. An attempt has been made to apply the method of diagonalization to physics. Recall the correspondence between physical, algorithmic, mathematical and formal logic, which is summarised in table 2.2, p. pageref. If this correspondence is extended to undecidability, one arrives at table .
physics | algorithmics | mathematics | formal logic |
intrinsic | Richard's | ||
indeterminism | paradox | ||
Zeno | |||
paradox | |||
undecidable | Chaitin's | Berry's | |
increase of | bounds on | paradox | |
entropy | computability | ||
measure | of algorithmic | ||
information | |||
unpredictable | halting | Gödel | Epimenides' |
observables in | problem | incompleteness | paradox |
deterministic | |||
systems | |||
no effectively | rule inference | ||
computable | problem | ||
inference scheme | |||
Does not the mind's possibility to make mistakes
Presently, there exist (at least) three generic definitions of randomness:
(i) von Mises ``collectives'';
(ii) Ville type definitions based on statistical tests;
(iii) definitions based on algorithmic information.
Early, intuitive, mathematical concepts of randomness were either too
restrictive, such that no mathematical entity existed which was random, or too
wide, such that ``too regular'' sequences were random. The problem is to define
randomness narrowly enough to exclude regular sequences while on the other hand
broadly enough to assure the existence of sequences characterised as random.
Instead of a comprehensive historical review, I would like to refer briefly
to one of the first intuitive but correct accounts in Philipp Frank's book
Das Kausalgesetz und seine Grenzen []. Frank states that (p. 156,157)
Die Aussage aber, daß ein bestimmtes Ereignis A in keinem
Kausalgesetz als Glied auftritt, hätte offenbar nur dann einen Sinn, wenn man
ein Verzeichnis aller Kausalgesetze besäße. [English translation:] Any event can be called random only ``with
respect to a specific causal law.'' An event would occur absolutely random if
it is random with respect to all causal laws, i.e., if it is not the effect of
any causal law.
But then, the statement that a certain event is the effect of no
causal law would make sense only if one could obtain a list of all causal
laws. In recursion theory one could argue that, by identifying ``causal law'' with
recursive or computable function, it would indeed be possible
to enumerate all ``causal laws.'' However, due to the recursive unsolvability of
the inference and the halting problem, it would in general be impossible to
prove that an event is the ``effect'' of a ``causal law'' corresponding to a
recursive function. Unfortunately, as has already been suspected by Frank, a
proof of randomness is ``too strong'' to be decidable by finite computational
resources - a formal proof of randomness of an infinite sequence would
essentially require knowledge of all true universal theorems of number theory
[of the form "n A(n), where A(n) is quantor-free].
The heuristic notion of ``lawlessness,'' or ``indeterminism'' of an entity
has been formalised in the context of algorithmics and algorithmic information
theory by P. Martin-Löf [], C. P. Schnorr [], G. J. Chaitin [,], R. M. Solovay
[], A. N. Kolmogorov [], L. Löfgren [], L. A. Levin [] and C. Calude [] among
others. As a brief reminder and outlook, table lists the complexity classes with
the associated types of randomness.
result in
the chance to think correctly?
from ``The First Surrealistic Manifesto''
by André Breton
Chapter 14
Randomness in mathematicsEs gibt also nur einen Zufall ``in Bezug auf ein bestimmtes
Kausalgesetz''. Ein Zufall schlechthin, also gewissermaßen ein absoluter
Zufall wäre dann ein Ereignis, das in bezug auf alle Kausalgesetze ein Zufall
ist, das also nirgends als Glied eines Kausalgesetzes auftritt.
complexity | static | algorithmic/ | Martin-Löf/Solovay/Chaitin |
program size | randomness | ||
loop depth | - | ||
dynamic | computational/ | T-randomness | |
time | |||
storage size | - | ||
As for algorithmic information theory, G. Chaitin's approach and terminology is adopted. Again, the notation U(p, s)=t will be used for a computer U with program p, input (string) s and output (string) t. Æ denotes the empty input or output (string). Furthermore, U(p,Æ)=U(p).
Sequences of natural numbers may correspond to the codes of time series, measurement results and so on (cf. chapter 4, p. pageref). Without loss of generality, one can consider binary sequences containing 0's and 1's. The symbol ``w'' stands for the (ordinal) number infinity, and 2w represents the set of all infinite binary sequences. Infinite sequences x=x1x2x3¼ in 2w can also be represented as binary reals r in the interval [0,1] if one identifies x with r=0. x1x2x3¼ . If one wishes, one can then transform this real into an n-ary (radix n) representation corresponding to an n-ary sequence. This bijective map does not alter the complexity-based definitions of randomness given below. All of these definitions hold with probability one.
Consider some arbitrary sequence; e.g., the sequence of pointer readings from some experimental device. One has to make precise what is meant by ``law-like.'' As has been suggested above, a reasonable translation of ``law-like'' appears to be effectively computable or, by the Church-Turing thesis, recursive.
But then one has to bear in mind that any sequence x(n) of finite length n can be generated by an algorithm; e.g., by the program ``PRINT x(n); END.'' In this sense, there are no random sequences of finite length. Any attempt to grasp the intuitive notion of what amounts to a random finite sequence remains ambiguous. Hence, randomness in the sense of ``lawlessness'' is defined for sequences of infinite length only. Physically, such sequences are impossible to generate and process. Therefore, any formal definition of randomness which is necessarily based on infinite sequences is not operational either.
There are non recursive sequences containing ``large junks'' of recursive
(e.g., constant) subsequences, such as the sequence
|
One arrives at a more satisfactory concept for (finite) random sequences by assuring that it is ``maximally incompressible'' by algorithmic means: A sequence is random if it cannot be generated by any shorter algorithm [,,]. Therefore, on the average, any algorithm reproducing a finite initial sequence of a random sequence should be at least of the same length as the sequence itself. I.e., the program amounts to a mere enumeration, at best. The terms ``law'' and ``shorter'' are implemented with the help of the algorithmic notions of ``algorithm'' and ``algorithmic information,'' respectively (cf. chapter 7, p. pageref).
[Randomness / Chaitin randomness, version I] A sequence
x Î 2w is
weakly random or Chaitin random if the algorithmic information
of the initial sequence x(n)=x1¼xn of length n of the base-two expansion of x
does not drop arbitrarily far below n:
|
[Randomness / Chaitin randomness, version II] A sequence x Î 2w is random
(CR) if the static complexity of the initial segment x(n)=x1¼xn of length n of the base-two expansion of x
eventually becomes and remains arbitrarily greater than n:
|
Stated differently, in general the average information content per unit length of a random sequence cannot be ``compressed'' into any representation (program code) which is of smaller length than the original sequence itself. As has been shown by G. Chaitin [], the notions of weak randomness and randomness are equivalent.
Since by theorem 9.7, p. pageref, in general H(x(n)) is uncomputable, this definition cannot be readily operationalised. In some particular cases, the definition can be directly applied. I.e., if it ``appears evident'' that a sequence has low algorithmic information, such as with very regular, e.g., periodic, sequences, then randomness can be excluded. However, one must bear in mind that there exist Chaitin random sequences with initial sequences ``looking regular,'' and that there exist very ``irregular looking'' sequences with extremely low complexity.
Martin-Löf/Solovay/Chaitin random
|
|
(14.1) |
A sequence x Î 2w is called normalised random, iff
|
|
|
(14.2) |
One may speculate that the binomial distribution of a fair coin, generated by 2-N(1+u)N=2-Nåi=1N(
| |
|
Although Chaitin randomness is the formal analogue of intuitive notions of ``lawlessness'' or indeterminism, there may well be other reasonable definitions of randomness employing different types of complexities. In particular one could define a T-randomness from computational complexity (cf. section 8.1, p. pageref and St. Wolfram []).
In view of the fact that the class P of polynomial-time algorithms is
invariant under the variation of ``reasonable'' machine models (``reasonable''
machine models are circularly defined by the the requirement of invariance), one
could define a problem of the order of N to be T-random as follows:
[T-randomness]
Given a problem of the order of N. Its solution x(N) is
T-random if the computational complexity of its solution is not
polynomially bounded, i.e., if it is not in P.
Remarks:
(i) The notion of an ``admissible place selection'' is somewhat ambiguous and difficult to implement. The German original [], p. 12, states:
Aus einer unendlichen Folge von Beobachtungen wird durch ``Stellenauswahl'' eine Teilfolge gebildet, indem man eine Vorschrift angibt, durch die über die Zugehörigkeit und Nichtzugehörigkeit der nten Beobachtung (n=1,2,¼) zur Teilfolge unabhängig von dem Ergebnis dieser nten Beobachtung und höchstens unter Benutzung der Kenntnis der vorangegangenen Beobachtungsergebnisse entschieden wird.Von Mises wanted this selection to be as general as possible, thereby not restricting the f's. Intuitively, f would correspond to a system of a gambler who does not believe in probability theory. Von Mises' approach proposes that such a system of gambling is of no avail.
(ii) In Richard von Mises vision, the collective comes first, then comes probability (``erst das Kollektiv, dann die Wahrscheinlichkeit''): The probability of an attribute in a collective equals the relative frequency of that attribute within the collective. Thus, in von Mises' approach, randomness of collectives is primary to probability. The collective defines the probability function P. This is in contradistinction to the usual approach to probability.
(iii) For a detailed discussion, see M. van Lambalgen [,,,] and C. P. Schnorr [].
|
A sequence is ¥-distributed if it is k-distributed for all positive integers k. An ¥-distributed sequence is called a Bernoulli sequence.
Heuristically, this amounts to counting the relative frequency of an arbitrary word in a given sequence, which should be identical with the probability of the word itself. In the spirit of von Mises' definition, these place selections are too weak a criterion for randomness. For instance, it has been shown by D. G. Champernowne [] that the enumeration of the set of all finite ``decimal'' words, i.e., words using the alphabet 0,1,2,3, ¼9, in lexicographic order (9 > 8 > 7 > ¼ > 1 > 0) ``0 1 2 3 4 5 6 7 8 9 10 11 12 13 ¼'' is BR although it is based on a ``very simple'' construction principle.
Bernoulli randomness corresponds to the property of Borel normality; G. J. Chaitin has first investigated [] this property of W; C. Calude has proved [] that all random sequences have this property. In this context it is interesting to note the example due to von Mises [] take an arbitrary binary sequence x(n) = x1x2 ¼xn ¼ and construct the ternary (radix-3) sequence (over the alphabet {0,1,2}) y(n) = y1y2 ¼yn ¼, where y1=x1, yn=xn-1+xn, n ³ 2. Then, y(n) is never Bernoulli random because the patterns ``02'' and ``20'' do not appear in it. (Cf. C. Calude and I. Chi tescu [], where this question is discussed within the Chaitin-Martin-Löf theory of randomness.)
This definition of Church neglects all non recursive place selections fnr. Since the Bernoulli place selection is recursively enumerable, Church randomness (CHR) implies Bernoulli randomness. Again in the spirit of von Mises' definition recursive enumerability is too restrictive a criterion.
J. Ville suggested an approach which is, in a sense, dual to the notion of collectives: A random sequence should satisfy `all' properties of probability one. These properties of probability one are probability laws of the form ``m({ x Î 2w | A(x)} )=1,'' where m is a (normalised) measure and A is a formula. Just as for ``allowed place selections'' in von Mises' approach, the problem is to specify the term ``all properties A of probability 1.''
For details, the reader is referred to P. Martin-Löf's original work [], G. Chaitin's book Algorithmic Information Theory [], C. P. Schnorr's book Zufälligkeit und Wahrscheinlichkeit [] and M. van Lambalgen's dissertation [,], among others. Again, some definitions require the construction of a real binary number r=0.x1 x2x3¼ Î [0,1] from the sequence x Î 2w.
[Martin-Löf randomness] A real r is Martin-Löf random (MLR) if it is
not contained in any set of an recursively enumerable infinite sequence
Ai of sets of intervals, such that the measure m(Ai) (no double-counting) is always less than or
equal to 2-i, i.e., m(Ai) £ 2-i. I.e., r is MLR if
|
Remarks:
(i) Any recursive evaluation of sequences of sets Ai corresponds to a statistical test with increasing significance levels.
(ii) The choice of the radius of convergence 2-i corresponding to the levels of significance is arbitrary. By substituting 2-f(i) for 2-i, any computable, non decreasing ``regulator of convergence'' f(i)[( i® ¥) || (® )] ¥ for i would work just as well.
The following definition is not based upon any ``regulator of convergence''
or ``level of significance'' as Martin-Löf randomness. [Solovay randomness] A
real r is Solovay random (SR) if for any recursively enumerable
infinite sequence Ai of sets of intervals with the property that the
sum of the measures of Ai converges, i.e.,
|
|
Furthermore, [Ville, van Lambalgen [,]]
There
exist uncountable many Church random sequences which are not
Martin-Löf/Solovay/Chaitin random. I.e., for suitable measures m, the set of Church random sequences {CHR} and the set of
Martin-Löf random sequences {MLR} satisfy
|
An alternative construction technique uses the random iterated
algorithm [,,]. It uses the iterated function system S1(x) =
1/3 x, S2(x) = 1/3
x+2/3 with probabilities
p1=p2=1/2 (other values ¹ 0 would do just as well, but less efficiently). This
iterated function system has been
obtained by identifying the point 0 in the zeroth construction step with the
points 0 and 2/3 in the first construction step, and by
identifying the point 1 in the zeroth construction step with the points
1/3 and 1 in the first construction step. By starting with
an arbitrary ``seed'' 0 £ x £
1, the successive iteration of either S1 and S2 chosen at
random yields the Cantor set as an ``attractor'' or ``invariant set'' of the
iterated function system: Let C stand for the
Cantor set; then
|
(15.1) |
|
|
The Cantor set can be brought into a one-to-one correspondence with the
binary (base-2) interval [0,1] by associating the numbers ``0'' and ``1'' with
the remaining left and right subintervals in each construction step,
respectively. The n'th construction step corresponds to the n'th position in the
code of the binary real. The prefix ``0.'' is associated with the zeroth step.
Yet, the Cantor set has Lebesgue measure zero, since limn® ¥(2/3)n=0. Moreover,
when measured at different resolutions d(n) =
3-n, the Cantor set has different length
(2/3)n between 1 (corresponding to n=0) and 0
(corresponding to n=¥). It is possible to overcome the
``paradoxical'' feature of a resolution-dependent length by using a ``fractal''
measure limn® ¥
2n( 1/(3n))D, the price being the
introduction of a non integer fractal or similarity dimension
D. More generally, if one requires scale invariance of the measure, then
|
(15.2) |
|
(15.3) |
There are several alternative definitions of dimensions; detailed discussions
and comparisons can be found in the books of K. J. Falconer [,] and M. Barnsley
[] and in an article by J. D. Farmer, E. Ott and J. A. Yorke []. By setting
N(n+1)=d(n+1)=1, by defining N(n)=N, d(n)=d and by taking the limit of
infinite resolution, one obtains
|
(15.4) |
|
(15.5) |
The number of elements at the n+1'st construction (observation) level with
resolution d(n+1) relative to the number of elements at
the n'th construction (observation) level with resolution d(n) is given by
|
(15.6) |
|
(15.7) |
|
(15.8) |
A similar approach as for the construction of the Cantor set can be applied for the construction of random fractals. Random fractals of dimension D can be recursively defined by successively cutting out a fraction of 1-r(D) elements of length d(n+1) [corresponding to the n+1'st construction (observation) level] from an interval of length d(n) at random (see also [], chapter 15, p. 224). A Mathematica program for a construction of such structures is listed in the appendix, p. . Fig. shows random fractals of various dimensions between 0 and 1.
|
|
For alternative constructions and discussions of random fractals, see K. J. Falconer [,], R. D. Mauldin and S. C. Williams [] and U. Zähle [], among others.
The Morse-Thue sequence featurs a different type of ``self-similarity.'' The sequence can be constructed by several methods (cf. M. R. Schroeder [], p. 316). A step-by-step enumeration is obtained by taking the sum modulo 2 of the digits of successive integers, written in binary notation; i.e.,
enumeration of integers | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ¼ |
binary representation | 0 | 1 | 10 | 11 | 100 | 101 | 110 | 111 | 1000 | 1001 | ¼ |
sum of digits modulo 2 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | ¼ |
The first 300 bits of the Morse-Thue sequence are drawn in Fig. .
The sequence can also be constructed by appending to each subsequence its complementary sequence (demonstrating its aperiodicity). Still another iteration procedure yielding the Morse-Thue sequence: starting from the ``seed'' 0, at any stage of the construction, one adds the complementary symbol to the right (demonstrating that there are as many 0's as there are 1's). This explains also another one of its features: by discarding every second place, the original Morse-Thue sequence is recovered. This kind of ``self-similarity'' manifests itself in the frequency plot drawn in Fig. (c) below.
One may investigate the spectral density SV(f) of a
fractal random signal V(t) by []
|
(15.9) |
Roughly speaking, the graph (not the set of the pulses!) of a signal with a
power spectral density SV(f) µ 1/fb has a fractal (box-counting) dimension of
|
(15.10) |
Fig. shows plots of log(SV(f)) as a function of log(f) for the random fractals in Fig. 15.2, revealing typical signatures of a 1/f noise.
D=0.8 Figure
D=0.9 Figure
This compares to the spectral density of white noise, which is a constant function of the frequency; see Fig. (a).
The spectral analysis of the Cantor set (Fig. 15.1) is drawn in Fig. 15.5(b). The spectral analysis of the Morse-Thue sequence is drawn in Fig. 15.5(c).
The standard model of signal transmission in information theory is drawn in Fig. .
For certain types of signals it is useful to consider signal transmission of a more general type. In what follows, random fractal signals are considered which are transmitted via multiple channels. This setup is drawn in Fig. . (Indeed, adding noise in the standard setup amounts to adding a parallel channel with a noise signal.)
Such signals can be considered as sets of points or sequences of symbols;
e.g., of binary symbols. Special attention will be given to the
intersection of random fractal signals. As has been pointed out by K.
J. Falconer [], theorem 8.2, p. 102, 103, under certain ``mild side
conditions,'' the intersection of two random fractals A1 and
A2 which can be minimally embedded in \RE is again a
random fractal with dimension
|
(15.11) |
|
(15.12) |
By induction, this result generalises to the intersection of an arbitrary
number of random fractal sets. The dimension of the intersection of n random
fractals is ``frequently'' given by
|
(15.13) |
Consider the coding of a signal by random fractals by their
dimension parameter. An example is the case of just two source symbols
s1 and s2 (cf. 4.1, p. pageref) encoded by (RFP stands
for ``random fractal pattern'')
|
(15.14) |
Let us call the Ai's the primary signals (primary
sources), and the intersection Çi=1n Ai of the primary
signals as the secondary signal (secondary source). We shall study
interesting special cases of equation (13). The addition of white noise
to a random fractal signal, denoted by \BbbI with D(\Bbb I) » 1, results in the recovery of the original fractal signal
with the original dimension; i.e.,
|
(15.15) |
By assuming that all random fractals have equal dimensions, i.e.,
D(Ai)=D and DÇ ³ 0, equation (13) reduces to
|
(15.16) |
|
|
Fig. shows the theoretical prediction of DÇ (n) versus n for various values of dimensions D.
From these figures it can be seen that, intuitively speaking, the larger the number of channels, the higher the dimension of the primary signal has to be in order to obtain a secondary signal of nonzero dimension. Fig. shows the theoretical prediction of the critical number of channels as a function of the dimension of the primary signal.
An immediate consequence of (16) is that, for truely fractal
signals (D < 1), any variation of the fractal dimension of the secondary
signal DÇ is directly proportional to the
number n of the primary signals; i.e.,
|
(15.17) |
Consider, for the moment, as a working hypothesis the following conjecture:
``Chaos in physics corresponds to randomness in mathematics (Chaos = randomness).''With this in mind, it is of greatest physical relevance which concept of randomness is envisaged and if this abstraction is appropriate for the perception of chaotic motion. One could, for instance, call a number ``random'' if it is irrational (i.e., if it has no periodic representation), if it is Chaitin random, or if it is contained in some other set with non vanishing measure. Another possibility would be to call a number (representing the evolution of a system) ``weakly random'' if it is impossible to infer from previous places the future ones. The complexity-based approach to randomness, which is mainly pursued here, suggests the following statement (T. Klein's ``every system is a perfect analogue of itself'' has been communicated to me by A. Zeilinger.)
``With respect to computational resources, every chaotic system is an optimal analogue of itself; i.e., one cannot simulate a chaotic system with less than its own resources.''
Depending on the type of computational resources (i.e., algorithmic information or computational complexity), one can identify at least four classes of chaos, which will be discussed below. [From a purely algorithmic point of view, classes I (``deterministic chaos''), II and III are equivalent, because there is no fundamental difference between the ``program'' and the ``input'' (code).]
Chaos I - ``deterministic chaos'' is characterised by two [three] criteria:
Stated pointedly, in ``deterministic chaos,'' the randomness or, in a weaker
dictum, the incomplete information of the initial value ``unfolds''
throughout evolution. A criterion for ``deterministic chaos'' therefore
is a ``suitable'' evolution function capable of ``unfolding'' the information of
a random real associated with the ``true'' but unknown initial value
x0. I.e., either the uncertainty dx0 of the initial value or a corresponding
variation of the initial value increases with time. The Lyapunov
exponent l can
be introduced as a measure of the separation of two distinct
initial values. Consider a discrete time evolution of the form
xn+1=f(xn) and an uncertainty interval
(x0,x0+e) of measure e which, after n iterations, becomes
(f(n)(x0),f(n)(x0+e )), which is of measure eexp{nl(x0)}.
f(n) stands for the n-fold iteration of f. The Lyapunov
exponent l(x0) is defined for e® 0 and n®
¥ as
|
|
Since the natural unit in algorithmics is bit, it is more appropriate to define the Lyapunov exponent in terms of the basis 2 (instead of e); all logarithms are then binary logarithms log2, yielding Lyapunov exponents which are by a factor of 1/loge 2 greater then the ones defined in terms of the basis e.
For continuous maps G(x)=dx/dt, one obtains a change of uncertainty dx by
|
(16.2) |
For l > 0, a linear increase in the precision precision of the initial value dx0 renders merely a logarithmic increase in the accuracy of the prediction. The above scenario is precisely the signature for ``deterministic chaos'' in classical, deterministic continuum mechanics: the evolution by ``suitable'' (positive Lyapunov exponents) recursive / effectively computable functions whose ``random'' arguments (i.e., initial values) are elements of a continuous spectrum. Such continua serve as a kind of ``pool of random reals''. Therefore, the ``randomness'' of classical ``deterministic chaos'' resides in its initial configuration. (See also J. Ford [,] for a very clear and illuminating discussion of this topic.) In classical physics, continua are modelled by \Rn or \Cn; and ``randomness'' translates into Martin-Löf/Solovay/Chaitin randomness. If these classical continua turn out to be appropriate for physical theories remains to be seen.
For a much more detailed review as well as for alternative passes to ``deterministic chaos,'' see H. G. Schuster [], J.-P. Eckmann [], P. Cvitanovi\'c [], Hao Bai-Lin [], K. J. Falconer [] and others. The symbolic dynamics aspects of nonlinear maps have been worked out in detail by P. Grassberger [] and Ch. D. Moore [] and will not be reviewed here.
The starting point of the Verhulst/Feigenbaum scenario [,] to
``deterministic chaos'' is the observation that many nonlinear systems behave
generically: there exist ``tuning parameters'' a which
determine the periodicity or stochasticity of the state evolution. For example,
the logistic map
|
|
|
(ii) for a Î
(a1, a¥ ) there is, depending on
the parameter a, a hierarchy of fixed points and
associated periodic trajectories. By varying a one
notices a succession of fixed point instabilities accompanied by bifurcations at
aN: if an N'th order fixed point x*N is defined by its recurrence after N
computing steps (and not before), that is after N iterations of f,
|
|
(iv) for a = 4 and after the variable transformation xn=sin2(pXn) one obtains a map f:Xn® Xn+1=2Xn (mod 1), where (mod 1) means that one has to drop the integer part of 2Xn. By assuming a starting value X0, the formal solution to n iterations is f(n)(X0)=Xn=2nX0 (mod 1). f is easily computable: if X0 is in binary representation, f(n) is just n times a left shift of the digits of X0, followed by a left truncation before the decimal point (see Fig. ). Now assume X0 Î (0,1) is Martin-Löf/Solovay/Chaitin random. Then the computable function f(n)(X0) yields a Martin-Löf/Solovay/Chaitin random evolution. It should be stressed again that in ``deterministic chaos,'' the evolution function f itself is computable / recursive, X0 is random, and f ``unfolds'' the ``information'' contained in X0 in time. (For a Î (4, ¥) the evolution for most points x0 Î (0,1) diverges.)
One may criticise this scenario by its assumption of a continuum. Indeed, there is no effective computation capable of simulating such a ``chaotic'' evolution exactly. The evolution requires oracle capacity. If the Verhulst/Feigenbaum scenario as well as other types of ``deterministic chaos'' is simulated by a universal or (worse) finite computer such as (insert your favourite brand here:) ``¼ ,'' in finite time one never obtains a Martin-Löf/Solovay/Chaitin random evolution, i.e., chaos I, but just chaos of type IV (see below). As von Neumann put it [],``Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin.'' One might add though, ``if he thinks he can do so in finite time,'' because computation of Chaitin's W (cf. 14.1.2 p. pageref and Chaitin's books [,]) is such a sinful endeavour.
Despite the fact that classical mechanics and electrodynamics (as well as quantum theory) are continuum theories, such an assumption cannot be physically operationalised. One of the greatest advantages of the continuum assumption is, in fact, not physically motivated, but rather stems from the formal convenience by which techniques of calculus can be developed and applied. However, as has been shown by E. Bishop and D. Bridges in Constructive Analysis [], it is possible to develop a ``reasonably'' comprehensive analysis based on constructive concepts, which are, to a certain extend, related to recursion theory (see also Computable Analysis by O. Aberth [] and Computability in Analysis and Physics by M. B. Pour-El and J. I. Richards []).
One may, of course, retreat from the postulate of the actual existence of continua by claiming that the arbitrariness residing in the continuum assumption simply reflects the fact that we do not know the exact initial state of the system. In this pragmatic ``weak chaos'' approach, some random element of the continuum is substituted for an undecidable initial value (cf. chapter 13, p. pageref).
Chaos II or chaos III may correspond to the undecidability of single events on the quantum scale. (One must not confuse the undecidability of single events on the quantum scale with the evolution of the linear time-dependent Schrödinger equation.) Both routes are driven by indeterministic evolutions such as Chaitin's diophantine equation for W. They require oracle capacity.
Whereas chaos of class I, II and III supports static or Martin-Löf/Solovay/Chaitin randomness, it requires not only infinite means, but very strong forms of non recursivity. The following chaos class can, for finite times, only support T-randomness. It has the advantage of requiring only recursive (for finite strings merely finite) resources.
Chaos IV,1 is characterised by an unfolding of the T-random initial value X0. The computational complexity of X0 unfolds in time, as is the case for the algorithmic complexity of Chaos I. A suitable dynamical evolution function is characterised by positive Lyapunov exponents.
Chaos IV, 2 is characterised by a non random initial value X0, say a finite number. Its computational complexity resides in the dynamical law governing the system.
In the infinite time limit, chaos IV may be capable of becoming Martin-Löf/Solovay/Chaitin random. A constructive example for this claim is the program for calculating the Chaitin random real W in the limit from below of a computable sequence of rational numbers. However, there is no computable regulator of convergence; i.e., one never knows how far to go to get W within a given accuracy. In fact, convergence is slower than any computable function.
There are some speculations that biological evolution resulting in the DNA sequence of nucleic acids as radix-4 symbols may become Chaitin random as well []. This scenario is driven by the ``creation'' of algorithmic information in the course of evolution modelled by an non halting computable process.
In table the various aspects of the four classes of chaos are represented schematically. From a purely algorithmic point of view, classes I (``deterministic chaos''), II and III are equivalent, because there is no fundamental difference between a ``program'' and its ``input.''
deterministic | indeterministic | |
computable | chaos IV | chaos II |
initial | T-random | Chaitin random |
values | (Chaitin random) | oracle |
effective computation | ||
uncomputable | chaos I | chaos III |
initial | Chaitin random | Chaitin random |
values | oracle | oracle |
In what follows the term ``quantum chaos'' will be very widely understood as chaos in the context of quantum mechanics. This chapter has been written from a very subjective point of view. No attempt has been made to review the present discussion of quantum chaos. For more details, the reader is referred to M. Berry [], B. V. Chirikov, F. M. Izrailev and D. L. Shepelyansky [], G. M. Zaslavskii [] and A. J. Lichtenberg and M. A. Lieberman [], among others.
The discussion of quantum chaos will be divided into two distinct sections: (i) Chaotic phenomena may originate from the evolution of the wave function Y. The wave function evolves according to the time-dependent linear Schrödinger equation i(h/2p) [(¶)/(¶t)]Y = [^H] Y; (ii) Processes related to state preparations and measurements may cause an irreversible and sometimes unpredictable ``reduction'' of the wave function. Not very much attention is paid to the question of how ``classical chaos'' originates in ``quantum chaos.''
Classically, the partitioning of phase space is in general not invariant with respect to the time flow of a system. One is tempted to speculate that, rather than the classical continuous phase space spanned by position/momentum variables, a space of action variables is an appropriate concept for an ``it-from-bit''-reconstruction of quantum mechanics.
It is interesting to note the following remark by A. Einstein [], p. 163,
``There are good reasons to assume that nature cannot be represented by a continuous field. From quantum theory it could be inferred with certainty that a finite system with finite energy can be completely described by a finite number of (quantum) numbers. This seems not in accordance with continuum theory and has to render trials to describe reality with purely algebraic means. However, nobody has any idea of how one can find the basis of such a theory.''
The case of a bounded quantized system with a computable evolution function can be modelled by a finite automaton model or a finite digital computer with finite precision, say m, in a binary (radix 2) representation, corresponding to M=2m different discrete states.
This scenario is different from the busy beaver scenario which yields a maximal recurrence time of S(m) >> M >> 2m because it is assumed that the internal storage capacity of a quantized system is bounded by m bits. If we do not restrict the internal storage resources, then the characteristic time scale of the quasi random regime may become ``very large,'' in fact, S(M) (cf. 8.3, p. pageref). The time evolution is also discrete. (In contradistinction, the Schrödinger theory uses continuous space and time parameters as well as continuous expansion coefficients of the eigenstates.) If one cycle time is normalised by Dt=1, the maximal time for a single state to reach the initial state is TMAX=M, but on the average, áTñ = M1/2. This is the time when (due to the computability of the deterministic evolution) on the average a new period starts.
The above consideration shows that there are essentially two time scales, separated by áTñ, associated with quasi random and periodic evolution: (i) For times t << áTñ, the complexity of the system evolution can be of the order of the number of cycles (which is identical to t). Hence this starting segment may be random (not absolutely random, but in the sense of ``finite random sequence''), but not necessarily so, because aperiodicity is no sufficient criterion for randomness; (ii) when the periodic regime is reached, the system can no longer be random.
The above finite automaton model for bounded discrete quantum systems [,,] suggests that the time evolution of a quantized system can be divided into two distinct regimes: in the pseudo random regime the system evolves according to the Schrödinger theory until it reaches a state arbitrary close to its initial configuration; afterwards the system is in a Poincaré-recurrent regime.
(i) Quasi random regime: The quasi random behaviour after a state preparation (during measurement) can be motivated by heuristic arguments as follows: the average time period is estimated by áTñ » (h/2p) e-1, where e is the level spacing. If DE defines the energy uncertainty, t << áTñ, and DEt » (h/2p). For (h/2p) (DE)-1 » t << áTñ » (h/2p) e-1 and thus DE >> e, DE exceeds the average energy level spacing, yielding a kind of ``quasi continuum'' of the energy spectrum associated with quasi randomness.
(ii) Poincaré-recurrent regime: The following theorem has
first been discussed by N. S. Krylov [] and independently by S. Ono [] and by P.
Bocchieri & A. Loinger []. For more detailed discussions see, among others,
L. S. Schulman [], B. Chirikov et al., [,,], T. Hogg & B. A. Huberman
[], A. Peres [] and A. Peres & L. S. Schulman []. [Quantum recurrence
theorem] Let Y(0) be a wave function evolving in time
under the Hamiltonian H which has only discrete eigenvalues En, n
Î \N. Then, for each d > 0
and for each t, there is a T > t such that (||·|| is the norm)
|
There are some reasons to perceive the linearity of the wave function in the
Schrödinger equation as an artefact of the bare theory without any
interaction, in particular self-interaction. For instance, taking into account
the nonlinear self-energy terms R[Y], the Schrödinger
equation is no longer linear [],
|
|
With K(m,a;Y)=-i([^H]0Y+R[Y])/(h/2p) , the nonlinear Schrödinger equation ¶/¶t Y = K may well be able to produce chaos via the Feigenbaum scenario, with the mass m and the coupling strength a as ``tuning'' parameters. A test of this scenario, however, is rather demanding. In principle one can vary both a and m, for instance a(Q2) in deep inelastic scattering [Q2 stands for the energy-momentum transfer at the vertex], or by ``sandwiching'' a charged particle of mass m(a) between two parallel conducting plates at distance a apart. For the time being, the associated time scales and mass changes (e.g. Dm(1 cm)/ m » 10-12) are prohibitively small.
I shall briefly review speculations suggesting that problems to represent the so-called ``collapse of the wave function'' within the framework of the Schrödinger theory could be resolved by the introduction of a nonlinear Schrödinger equation []. The idea is that nonlinear dynamics gives rise to a reduction of the state function, very much like Feigenbaum's scenario reduces a system's motion to points, limit cycles or strange attractors by restricting its effective degrees of freedom. These nonlinearities may originate in radiative corrections to the bare theory [,], as discussed above, or may have different physical origins []. The rôle of a macroscopic measuring apparatus is the input of large fluctuations. In the language of complexity theory, this is equivalent to saying that due to the enormous (static or dynamic) complexities of macroscopic devices, they disturb the regular deterministic motion (according to the Schrödinger theory) of quantized systems by introducing an evolution which is effectively random (i.e., white noise).
This argument requires essentially two distinct elements: (i) reduction of the state vector due to a reduction of the effective degrees of freedom by attractors in generic nonlinear evolutions (Feigenbaum scenario); and (ii) undecidability of single events on the quantum scale by the interaction with macroscopic measuring apparata and their associated high complexity measures. A deterministic evolution of both the quantum system and the measuring device is not excluded. The argument states that large fluctuations prevent predictions of single events.
Suppose m repeated experiments of the following form: in the ith experiment the system is prepared to be in an identical initial state Y0. Then it evolves a time ti without any external measurement or state preparation. At ti it is measured to be in a state Yi. Suppose further the times are ordered such that t1 < t2 < ¼ < ti < ¼ < tm, then one obtains a radix-N < ¥ sequence of length m+1: y(m)=# ( Y0) # (Y1) ¼# ( Ym) . This series should not be confused with the time flow of the system, represented by Y(t) = exp(-it[^H]/(h/2p) )Y0 = ån £ N < ¥cn(t)Yn.
I shall concentrate on the relationship between the time-ordered actual measurement results in y(m) and the time-dependent Schrödinger function Y(t) next. Since the positions # ( Yi) of y(m) represent pure states, áY(ti)|Yiñ = cni(ti). Hence, at a time ti the system is in a state Yi with probability |cni(ti)|2. As long as c(t) is a nonsingular distribution, the probabilistic interpretation of |c|2 allows for arbitrary Martin-Löf/Solovay/Chaitin random sequences y = limm®¥y(m), even with computable coefficients c. An example, the ``quantum coin toss'' is discussed in detail next.
For any test of quantum mechanical undecidability it is essential to use signals with no (extrinsic) noise from a controllable source of very low extrinsic complexity. To the author's knowledge the optimal realisation of such a source is a laser emitting coherent and linearly polarised light. All emitted quanta from such a source are in an identical state. The polarised laser light is then directed towards a material with anomalous refraction, such as a CaCO3 crystal, which is capable of separating light of different polarisations. Its separation axis should be arranged at ±45° with respect to the direction of polarisation of the incident laser beam. Then each of the two resulting beams, denoted by 0 and 1, respectively, has a polarisation direction ±45° from the original beam polarisation. A detector is in each of the beam passes (see Fig. ).
|
|
For an ideal anomalous refractor, the probability that a light quantum from the polarised source will be in either one of the two beams is 1/2.
A binary sequence y(n) can be generated by the time-ordered observation of subsequent quanta. Whenever the quantum is detected in beam 0 or 1, a corresponding digit 0 or 1 is written in the next position of y(n), producing y(n+1). In this way, n observations generate a sequence y(n).
One perception of this process is the amplification of noise from the vacuum fluctuations of the photon field (cf. R. Glauber []). If, for any reason, this noise would exhibit regular non random characteristics [rendering, for instance, amplitude oscillations |ytñ = sin (wt) |0ñ+cos(wt) |1ñ with constant frequency w], one could detect these regularities and find discrepancies with the postulate of microscopic randomness.
It is suggested [] that such a sequence is published and suitably distributed (e.g. by electronic mail) by a bureau of standards. This sequence could then be taken as a reference for statistical tests, some of which are suggested below, and more generally, as a standard for a generic random sequence. Of course, as has been pointed out by C. Calude [], there is no guarantee that such an initial sequence (or any other sequence) of finite length cannot be algorithmically compressed substantially; this comes from the fact that, for example, a sequence of a thousand 0's should occur with equal probability as a particular ``irregular (i.e., algorithmically incompressible) sequence'' containing a thousand symbols.
Compare y to any pseudo random sequence j, generated by a finite deterministic automaton. Whereas j could be applicable to a great variety of purposes such as numerical integration or optimisation of database retrieval, it will inevitably fail specific statistical tests. Take for example the statistical test corresponding to the generating algorithm of j itself - the law which is encoded by this algorithm is per definitionem capable of generating (``predicting'') all digits of j. Thus, at least with respect to its own generation law, j is provable non random.
The postulate of microphysical indeterminism and randomness asserts that there is no such ``generating'' law and hence no statistical test to ``disprove'' the randomness property of y. Indeed, with this postulate, y should pass all statistical tests with probability one, at least not for infinite sequences. Thus y can serve as generic source for a random bit sequence.
In what follows several statistical and algorithmic tests are suggested which could be applied to y(n).
(i) Frequency counting: for y(n) to pass this test it has to be proven that any arbitrary sequence of m digits occurs in y(x) with a limiting frequency 2-m. In order to obtain a reasonable confidence level (see D. Knuth [] for details), m has to be smaller than approximately n-7. An infinite sequence passing this test for arbitrary m is called Bernoulli sequence. As has already been mentioned, this criterion is rather weak. It is satisfied by the enumeration of the binary words in lexicographic order [], and, within finite accuracy, by the decimal expansion of p [,]. Actually, in the above experimental setup, the statistics of a 1-digit string (m=1) should be used for calibration of a suitable angle, which is defined by the requirement that 0 and 1 should occur in y(n) with frequency 1/2.
(ii) Algorithmic compressibility: y(n) could be the input of various compression algorithms (e.g., the Huffman algorithm), which should produce a (compressed) string of length Hc(n) with H(y(n)) £ Hc(n) £ n. On the average, Hc(n) should increase as n increases, i.e., áDHc(n)/ Dn ñ = 1. Every compression algorithm is a kind of ``code breaking device'' based upon a hypothesis on ``laws'' governing sequences. Some of them are used for commercial applications and are readily available.
(iii) Spectral test: This is a critical test at least for linear congruential sequences. For a detailed discussion see D.Knuth []. The idea is to investigate the ``granular'' structure of y(n) in D-dimensional space in the following way. Split y(n) into N º n/k subsequent partial sequences y(n,i) of length k. Generate N binary numbers 0 £ xi < 1 by xi º y(n,i)/2k. For a D-dimensional analysis, arrange subsequent xi's into M º N/D D-touples Xj. The Xj's could be perceived as points in \RD. Consider further all families of (D-1)-dimensional parallel hyperplanes with points Xj. If 1/n(D) denotes the maximal distance of these hyperplanes, n (D) is called the D-dimensional ``accuracy'' of y(n). n(D) should on the average be independent of the dimension, i.e., áDn(D)/DDñ = 0. For statistical reasons, one cannot achieve a D-dimensional accuracy of more than about 2k/D and 1/MD. Thus the spectral test is reliable only for n(D) < 2k/D and sequence length n > kD(n(D))D.
(iv) High-dimensional integration: Assume an analytically computable D-dimensional integral F(D) º ò01¼ò01 dx1¼dxD f(x1,¼ ,xD). Consider again a representation of y(n) by M=n/kD points Xj in the D-dimensional unit interval. Define F¢(D) º (1/M)åjf(Xj). Then for arbitrary test functions f and with probability 1, the discrepancy |F(D)-F¢(D)| µ M-1/2 only depends on the number of points and not on the dimension. In order to obtain accuracies of the order of M-1/2 with the Simpson method, one needs at least MD/8 points to obtain the same order of discrepancy. There the number of points depends on the dimension.
The proposed tests are not independent. Certain compression algorithms use tables of repeating sequences and are thus connected to frequency counting methods. The spectral test analyses the distribution of points generated from sequences in a unit interval of highdimensional space. It is thus a criterion for the quality of approximation in numerical integration.
There are other fairly strong statistical tests such as the law of the iterated logarithm [], but many of them turn out to be not practical for their low confidence levels in applications.
Presently there seems to be an overwhelming evidence for what A. Shimony calls a ``peaceful coexistence'' of quantum mechanics and relativity theory [,], despite the fact that they evolved from quite different contexts. Although there are difficulties originating in the ``collapse of the Schrödinger wave function'' [], and even with sophisticated setups using delayed choice [], the Aharonov-Bohm effect [], variations of boundary conditions [], ``haunted'' measurements [] and photon cloning [,], nonrelativistic quantum mechanics remains relativistically causal.
This ``peaceful coexistence'' is amazing when viewed in the context of inseparability or non-locality for entangled states []: One important feature of quantum inseparability is the ``stronger-than-classical'' correlation (cf. A. Peres []) between events from an entangled state; i.e., the correlations between such states are higher than can be accounted for by local classical models representable by Bell-type inequalities [].
Attempts to construct local classical models which feature these ``stronger-than-classical'' quantum correlations with abstract set theoretical concepts have been made in the context of nonpreservation of the probability measure and the Banach-Tarski ``paradox'', see I. Pitowsky [], and in the context of the application of random ultrafilters as ``hidden variables'', see W. Boos [].
In accordance with the conjecture of ``peaceful coexistence'', it is commonly accepted that these ``stronger-than-classical'' quantum correlations cannot give rise to any faster-than-light signalling and thus cannot violate relativistic causality [,,]. This is ultimately guaranteed by the assumption of unpredictability (or, more strongly: of randomness) of single events, resulting in ``uncontrollable non-localities'' []: inseparability establishes itself only after the recollection of entangled events, which, perceived separately, occur randomly. By this bailout, nonrelativistic quantum mechanics violates the locality assumption without violating relativistic causality. To put it pointedly []: quantum theory predicts event dependence but parameter independence for entangled states.
Obedience to relativistic causality for nonrelativistic quantum mechanics is amazing enough by itself, and even more so if one recalls that consistency remains only conjectural if ``manifestly covariant'' terms (such as tensors and spinors) are the entities in which relativistic quantum field theories are expressed []. Let me briefly review the folklore belief that such a procedure ensures relativistic causality.
The usual implementation of what is called ``local causality'' in relativistic quantum field theory requires independence of the field amplitudes at spatially separated points. Local causality is then guaranteed by a proper connection between spin and statistics. For instance, in the case of a massive scalar field, the commutator is given by the Pauli-Jordan function and vanishes for spacelike separations t2 -[x\vec] 2 < 0, i.e., outside the light-cone. [However, this renders causal Green's functions with nonvanishing contributions for spacelike separated points, i.e., Dc(x)=á0|Tf(0)f(x)|0ñ µ q(-x2)(m/|x|)K1(m|x|)+¼. For the massless case and for small x2=t2-[r\vec] 2 ¹ 0 (close to the light-cone), Dc can be expanded, yielding Dc(x) µ x2.] This presupposes the invariance of the speed of light for arbitrary operational configurations, in particular in propagation processes in which light quanta are exchanged. Such processes have to be described in the framework of quantum field theory itself. In other words, what is treated as a prerequisite here has to be actually an outcome of relativistic quantum field theory. But in the spirit of quantum field theory it is not unreasonable to consider the velocity of light inserted in the ``bare'' theory as a parameter which becomes renormalised en route to the full model, very much like mass or charge. This is exactly the theme of an ongoing debate whether for instance quantum electrodynamics may give rise to acausal effects in the regime of finite space-time processes [,,], or for negative vacuum energy densities such as in cavity quantum electrodynamics [,], in the charged vacuum state [] and with wormholes []. Possible violations of Lorentz invariance have also been discussed for very short distances [,].
More generally, one could speculate about a breakdown of Lorentz invariance without causality violation in relativistic field theory. As for nonrelativistic quantum mechanics, this yields a scenario of violation of locality by uncontrollable events, associated with the preservation of relativistic causality, specifying a principle of ``peaceful coexistence'' of relativistic quantum field theory and relativity theory.
All physical knowledge has to be (re-)constructed from outcomes of a finite number of elementary experiments, i.e., from finite sequences of TRUE-FALSE alternatives. One of the most important problems in statistics is the problem of the ``best'' representation of an incomplete state of knowledge about a physical system.
Intuitively speaking, the probability of an event (i.e., a particular outcome) in an experiment or trial can be interpreted either (i) subjectively, as a rational measure of belief, expressed before or, if the outcome is not yet known, after the experiment (F. P. Ramsey []); or (ii) empirically, as frequency of occurrence (R. von Mises [], K. Popper []); or (iii) by an inductive logic approach (J. M. Keynes [], R. Carnap []). For an interesting discussion of these interpretations with respect to quantum mechanics, see I. Pitowsky [], p. 182.
Let ei be an event in a manual \M and let h stand for some rational hypothesis, then P(ei,h) is the probability of obtaining the event ei given the hypothesis h. After an infinite number of experiments, the inner probability P(ei,h¥ )=P(si)=pi would be the maximum prior knowledge about a system which is otherwise unpredictable.
According to A. N. Kolmogorov [], one can axiomatise probability theory by
requiring that for all ei Ì \M and a certain
event 1,
|
Since infinite experimental series are impossible, one has to rely upon data and probabilities from finite observations. This results in an arbitrariness of the definition of the probability distribution P(ei,hN < ¥), corresponding to choices of different hypothesis. The arbitrariness has to be eliminated by additional constraints on P. These constraints, in particular Jaynes' principle, use the notion of ``Shannon information'' and ``information theory entropy,'' which will be introduced next.
One may ask, ``is it possible to quantify information by a function I(pi) which measures the amount of information in the occurrence of an event with probability pi?'' It is reasonable to require that I satisfies three properties:
|
|
|
|
(18.1) |
|
(18.2) |
A similar argument shows that (1) is a unique choice: assume
there is another function g(p) satisfying (i)-(iii), then
|
Recall that pi was defined as the probability for the occurrence
of the source code symbol si and therefore the probability for
obtaining the information I(pi). From this follows that on the
average, i.e., over the whole alphabet of symbols S={ si}, one
obtains an average information per symbol of
|
(18.3) |
The following theorem is stated without proof. For details, see R. W. Hamming
([], page 119). For instantaneous (prefix) encoding, the information theory
entropy yields a bound from below for the average codeword length by
|
(18.4) |
As mentioned before, after a finite number of experiments one
obtains only incomplete knowledge of the inner probability. This case will be
treated next. The rational measure of ``information'' and ``uncertainty'' will
now be used in a reformulation expressing the ``relative information content''
and the ``relative uncertainty'' of a measurement series. Assume we have
performed M experiments and ask ourselves, ``what is the information gain
after N ³ M experiments?'' Motivated by equation
(2), the relative
information content can be modelled by I(M,N) = I(P(si,N))- I(P(si,M)), and a relative Shannon information
can be defined as follows. [Relative Shannon information, uncertainty] Let
|
|
(18.5) |
The uncertainty is the missing information relative to the maximum
information from ¥ trials:
|
(18.6) |
|
As has been proved in chapter 4, p. pageref, the functional form (5) of I is uniquely determined by certain ``reasonable'' requirements, such as continuity, I(N;N)=0 et cetera. Reviews of these requirements as well as a proof of the uniqueness of I can be found in R. W. Hamming ([], p. 103), A. Hobson ([], p. 35) and A. Katz ([], p. 14). For a discussion of E. T. Jaynes' maximum entropy principle and its relation to other concepts, such as R. A. Fisher's maximum likelyhood principle, see an article by M. Li and P. M. B. Vitányi [].
Since infinite experimental series are not realisable, one has to guess the hypothesis hN with only a finite number N of experiments performed. Such guesses do not yield unique choices. Therefore, one has to assume ``reasonable'' requirements in order to specify the choice of the hypothesis hN further.
One such ``reasonable'' side condition is the requirement that the
probability distribution should not contain any additional
``information'' which is not suggested by the experimental data and
therefore should maximise the ``amount of uncertainty.'' The above
considerations can be summarised in [Jaynes' principle] Assume the experimental
outcomes ei and functions fj(ei) =
fj(i). Given N additional data from experiments. The hypothesis
hN has to be adjusted such that the probabilities P(ei,N)
obey
|
(18.12) |
|
(18.13) |
|
(18.14) |
Jayne's principle is similar to an approach by R. A. Fisher [,], who added a
second criterion of efficiency by requiring that the variance
of the estimating statistic (at least for large sample spaces) should not exceed
that of any other consistent statistic estimating the same parameter. Roughly
speaking, whereas with probability ei is treated as variable and the
hypothesis h is constant, with likelihood the hypothesis h is the variable and
the events are constant. More precisely, the likelihood
L(h,ei) of the hypothesis h given data ei is defined by
|
Summhammer's principle: Whereas Janes' principle fixes the arbitrariness in the guessing hypothesis, there is yet another undesirable feature of certain statistics. It may happen that the variance, i.e., the confidence uncertainty, increases with increasing data sample.
Consider the following example []: two physicists perform coin tosses (for
quantum ones, cf. p. pageref) and have collected
a total of N=1000 events, K0=800 for ``head,'' coded by # ( head)=0 and K1=200 for ``tail,'' coded by # (
tail)=1. They hypothesise that the distribution is
binomial and
|
|
This somewhat pathological feature of the binomial distribution can be
overcome by a transformation [,,,] T:P® c. c is called ``phase''
for reasons which will become clearer below. We require that c only depends on the number of experiments N, and
not on the outcomes K0,K1, i.e.,
|
(18.16) |
|
(18.17) |
|
(18.18) |
|
|
(18.21) |
|
(18.22) |
One generalisation of J. Summhammer's approach is the requirement that,
besides the variance, all moments of a distribution should depend only
on the number of experiments, i.e., the size of the sample space. [Generalised
Summhammer principle] Let P(ei,hN) be a probability
distribution. A new distribution c is obtained by
requiring that all moments about the mean áP(hN) ñ = åi P(ei,hN) are independent
of P and only dependent on the size of the sample space N, i.e., for r > 1,
|
(18.23) |
If the computer U is perceived as an information source, then the algorithmic
probability P(s) can be identified with the probability of the occurrence of an
object s from that source. As has been shown by G. Chaitin [,], there is a
connection between the algorithmic information H(s) and its algorithmic
probability P(s) of an object such as a symbol or a sequence of symbols s [a
notable exception are infinite computations; cf. equation (36), p. pageref]:
|
At this point technical obstacles appear.
(I) One did not require the most efficient encoding, for which the Kraft sum in (1) holds with equality. Moreover, for universal machines, not all allowed (instantaneous) codes halt. Since W = ås P(s) has been defined to be the sum of the halting probabilities P(s), for prefix code, P(Æ ,1,2,3,¼)=W < 1, and there is no unit element 1 such that P(1)=1. I.e., one cannot directly associate an interpretation as probability measure to P, as would be necessary for identifying P with the probability in the Shannon information.
There is yet another difficulty, since there exist programs which halt but output nothing, i.e., the empty list Æ. This problem can be circumvented by identifying no event from the information source with some additional or existing source symbol #(Æ).
A probability measure interpretation for P could be restored at least in two ways.
(i) include a not-halting element ``NH,'' then define P(NH)=1-W and
1={NH, Æ,1,2,¼};
(ii) divide P(i) by W, such that P¢(i)=P(i)/W and thus 1={ Æ,1,2,3,¼}.
At first glance, (ii) looks more appealing for physical applications, since a program which does not halt and outputs nothing corresponds to no event. It seems strange to ascribe a probability to non-occurrence, as suggested by (i). A second thought, however, might support (i), since this view reflects a physical system's inability to perform a given task, say, to produce a specific object i, and/or its redundancy in producing one and the same object by one or more processes.
As is often done in information theory (cf. R. W. Hamming [], p. 119), one
may assume (ii) and normalise P(s) by the Kraft sum, i.e.,
|
(18.24) |
(i) The important identity ``H=-log P+O(1),'' relating the algorithmic information content H of a single object to the probability of its occurrence is not changed by transformation (24), since the additional constant -logW can be absorbed into O(1), i.e., H(x)=-log2 (WP¢(x))+O(1)=-log2 P¢(x)-log2 W +O(1)=-log2 P¢(x)+O(1).
(ii) Unlike for instantaneous encoding of source information which contribute with certainty to the Kraft sum, only the halting programs and therefore not all allowed codes contribute to the halting probabilities P(s). Substitution (24) therefore does not only normalise an ineffective encoding, but also normalises the capacity of the computer (the physical system) to produce output (states).
(iii) åi=Æ, 1,2,¼P(i)=1 can no longer be calculated in the limit from below such as W, but may oscillate [] if it is computed by Chaitin's techniques; cf. equation (2), p. pageref.
(II) The algorithmic information H(x(n)) of ``almost all'' sequences
x(n) grows ``maximally'' in the sense that [cf. equation (30), p. pageref]
|
(III) As has been pointed out already in section 7.2.2, p. pageref, a redefinition of P makes it non subadditive.
The following nomenclature is taken mainly from V. M. Alekseev & M. V. Yakobson [], K. Petersen, Ergodic Theory [], M. van Lambalgen [,] and A. I. Khinchin []; see also chapter 4, p. pageref.
[Dynamical system] Let (X,f) denote a dynamical system. X is a manifold which can be interpreted as a generalised phase space (with points x Î X) and f is a measurable transformation on X, representing a discrete evolution of the system.
Consider some measurable regions Ei Ì X of X, which, intuitively speaking, contain all the regions which can be resolved by measurements. The higher the measurement resolution, the smaller the diameter of the regions and the more such regions will be necessary to cover all of X. X=Èi=1q Ei defines a covering x = { E1,¼,Eq} of X. The constituents Ei need not be disjoint, i.e. EiÇEj ¹ Æ. If the covering is disjoint, we call it a partition.
[Evolution, orbit] The map f:X® X
represents a discrete time evolution. Let t=0 be some time at which the
system is in some x0 Î X, and let
``f(n)'' denote the n-fold application of f. If f is invertible, then
f generates a cascade or orbit x of x0 by
|
(18.25) |
We next turn to the source coding of (X,f). [Source coding] To every measurement region Ei Î x associate a pointer reading si. The set of pointer readings S={s1,¼sq} constitutes a source alphabet with q symbols. Let S be represented by numbers, encoded by the numerals 0,1,2,¼, # (sq) of basis q (radix-q). (For simplicity, we shall restrict ourselves to q=2 below.)
Let
|
(18.26) |
|
(18.27) |
|
(18.28) |
Remarks:
(i) Superscripts are used to denote the i'the symbol si, whereas subscripts are used to denote the place of the symbol in the sequence of pointer readings Y. For example, if the system is at t=0 in x0 Î E2, at t=1 in x1 Î E4, at t=2 in x2 Î E1, at t=3 in x3 Î E4, ¼, then Y(x0)=¼s2s4s1s4¼.
(ii) f induces a left shift T of one time unit on qw, such that
|
(18.29) |
(iii) Y can be perceived as a word composed from a finite sequence of characters of an alphabet si Î S. Define xn={ Es0¼sn-1 | Es0¼sn-1 = Es0Çf-1Es1ǼÇf(-n+1)Esn-1=Çi=1n f(-i+1)Esi-1}. The word is admissible, if it corresponds to at least one x0 Î xn. In this case there exists a Y, which can be perceived as a possible sequence of pointer readings, or a symbolic cascade, representing the ``history'' of the system from an initial value x0.
(iv) From now on, we shall disregard the past history of the system, and consider only future events symbolised by Y(x0,n), n Î \N.
If (X,f) is equipped with a probability P(Y), it induces a measure m on xn by
|
(18.30) |
|
(18.31) |
|
(18.32) |
In the following we shall use a binary source alphabet (i.e., q=2) with the two symbols 0 and 1.
[Metric entropy] The metric entropy [`h] is
defined by
|
(18.33) |
Recall that, if pi is the probability of occurrence of symbol si Î S, in (3), page pageref, h=-åi=1qpilogpi+ O(1) has been interpreted as the average amount of information gain per symbol. Thus, interpreted in terms of the Shannon information, the metric entropy is the average amount of information gained per experiment if one performs infinitely many experiments. A positive value of [`h] indicates that each experiment contributes a ``substantial'' amount of information. In this sense, [`h] reflects a system's unpredictability and randomness.
[(van Lambalgen [,])] Let m be an ergodic computable
measure on 2w with metric entropy [`h] . Then for m-almost all
x0 Î 2w
, the normalised algorithmic information of a single trajectory is
identical to the metric entropy
|
(18.34) |
In what follows I state some theorems relating Lyapunov exponents defined in (1) on page pageref to entropy measures (see also L.-S. Young []).
Let Cn stands for n times continuously differentiable; let diffeomorphisms be differentiable functions with a
differentiable inverse. [(Pesin [])] For
C2-diffeomorphisms f preserving an ergodic measure m which is equivalent to Lebesgue, the metric entropy measure
equals the sum over positive Lyapunov exponents,
|
(18.35) |
[(Ruelle [])] For C1-diffeomorphisms f preserving an ergodic
measure m the metric entropy measure is smaller than or
equal to the sum over positive Lyapunov exponents,
|
(18.36) |
[(Young [,])] For C2-diffeomorphisms f:\R2® \R2 preserving an ergodic measure, the
Hausdorff dimension of the attracting set can be related to the metric
entropy measure and the reciprocal sum over non vanishing Lyapunov exponents by
|
(18.37) |
The following scenarios are a subjective selection of the author. For other discussions, see E. T. Jaynes' Information Theory and Statistical Mechanics [], Complexity, Entropy and the Physics of Information, ed. by W. H. Zurek, and R. Schack & C. M. Caves []. For a discussion on the thermodynamics of computation, see R. Landauer [,], C. H. Bennett [], E. Fredkin [] and T. Toffoli []. (Cf. the authors of Physical Review Letters 53, 1202-1206 (1984) for an interesting discussion.)
As has been pointed out by G. Chaitin [], W. H. Zurek [] and the author [,], there exists the seemingly paradoxical ability of computable processes to produce objects with higher algorithmic information content than themselves. Indeed, as W. H. Zurek points out, a program which, for example, counts and outputs integers in successive order will eventually end up with an integer of higher algorithmic information content than its length.
This scenario features computable initial states and a computable evolution. Examples for physical systems which realise this scenario are computers which count. For more details, see chapter 7, p. pageref, as well as chapter 9, p. pageref.
This scenario features computable initial states and an uncomputable evolution. The entropy increase originates from an indeterministic system evolution.
A slight variant of this scenario is the assumption that entropy is increased by elementary events in the quantum domain, some of which occur randomly and contribute to the algorithmic information increase [].
As has been pointed out by G. Chaitin, it is impossible to exactly measure
large values of algorithmic information. Any approximation of this measure
yields a bound from above on the algorithmic information, since it relies on
suboptimal program code, i.e.,
|
(18.38) |
For a summary of the discussed scenarios for entropy increase, see table .
process type/initial value | computable | uncomputable |
computable | ``creative'' processes | oracles |
& uncomputability of H | ||
uncomputable | ``deterministic chaos'' | all other entries combined |
Computer scientists as well as artists creating virtual realities and physicists interested in epistemology find themselves confronted with very similar questions:
``How does the (virtual) world look like from within?''
``Which properties has the (virtual) interface?''
``How does an (virtual) intelligence make sense of its (virtual) environment?''
``Is there any way to distinguish between a `virtual' reality and a `real' reality?''
``How can (virtual) theories be created by mere self-referential methods?''
``What kind of `truth' can be ascribed to such (virtual) theories?''
and so on.
It is not unreasonable to suspect that many of those asking these questions would like to base their investigations on what they consider as ``solid foundations;'' i.e., some ``final'' methods which remain unaffected as time goes by and science progresses. This includes the possession of some ``Archimedean,'' external, point, from which a universe could be described without changing it. They may desperately grab for something more real than paradigms which are constantly revised.
This turns out to be impossible. At least in our own universe, all scientific entities - phenomena, theories, methods et cetera - are intrinsic by definition. Indeed, as we cannot ``step outside'' of our own universe, all our perceptions are self-referential. - As The Doors put it, ``no one here gets out alive.''
Acceptance of the non existence of Archimedean points (extrinsic descriptions) results in the suspicion that a theoretical modelling which is solely based on self-referential methods is weakly founded. Our primary interface seems to be our body and its senses, which can be connected or related to secondary interfaces, such as measurement apparata or virtual reality ``eye phones;'' also theoretical models to organise phenomena. If we change the interface, we create a different view of the world, which is sometimes incongruent with earlier views. Plato's cave metaphor sounds a little overoptimistic: we not merely perceive shadows by an interface, but shadows created maybe by some light which we may have created by ourselves. Shakespeare's ``¼ like the baseless fabric of this vision ¼ We are such stuff / As dreams are made of ¼'' suits this picture pretty well. I would like to call this feeling of suspense, followed by the tendency to turn to absoluteness, the ``horror vacui'' of self-referential perception.
Yet, a priori, there is nothing wrong with self-referential perception. A typical example of a formal self-referential statement is ``this statement contains five words.'' Some self-referential statements are not well-founded, e.g., ``this statement is true,'' and others may be self-contradictory, e.g., ``this statement is false.'' Could something be learned from inconsistencies and paradoxa? The message of metamathematics and recursion theory seems to be that paradoxical statements can be utilised: they can be reformulated to express the limits of self-referential methods.
In the spirit of algorithmic physics - the perception of the world as a (universal) computer - any preparation or manipulation of a (physical) system can be perceived as a programming task. Stated differently, experimental acts by an observer might be considered as the self-programming of a pseudo-autonomous agent from within the system. In this sense, such acts or interventions may be viewed as the ``creation'' of potential phenomena. Cf. I. Hacking's remarks in Representing and Intervening ([], p. 226-229):
¼ I suggest, Hall's effect did not exist until, with great ingenuity, he had discovered how to isolate, purify it, create it in the laboratory.¼
But the phenomena of physics - the Faraday effect, the Hall effect, the Josephson effect - are the keys that unlock the universe. People made the keys - and perhaps the locks in which they turn.
¼
Talk about creating phenomena is perhaps made most powerful when the phenomenon precedes any articulated theory, but that is not necessary. Many phenomena are created after theory.
One may even speculate that - as we are living in a universal computational surrounding - any tasks which could be performed on a universal computer could be translatable into physical phenomena and vice versa.
A. Einstein has expressed the following opinion []:
Insofar mathematical theorems refer to reality, they are not sure, and insofar they are sure, they do not refer to reality.
In a talk entitled ``The Unreasonable Effectiveness of Mathematics in the Natural Sciences'' [], Eugene P. Wigner has stated a somewhat different view:
¼ the enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and that there is no rational explanation for it.Probably it will be difficult to specify exactly what the term ``rational'' means in this context. Nevertheless, one explanation for the enormous usefulness of mathematics in the natural sciences might be the assumption that nature ``actually'' is a machine! For then, our formal machine concept can be adapted to perfectly suit nature; the only problem remaining is to find the machine specifications, which are interpreted as ``natural laws'' in physics. In this view, the arena of natural phenomena is nothing but an ``inner view'' of a complex automaton. One could even argue that, by the nature-as-machine metaphor, E. P. Wigner's statement becomes a tautology: mathematical modelling, such as the concept of what actually is mechanically computable / recursive, has been tailored to fit the natural sciences. Cf. Oswald Wiener's remark ([], p. 631),
Introspection clearly supports the hypothesis that understanding natural processes amounts to being capable of simulating them by some ``internal'' Turing machines-the clearest indication might perhaps be seen in our arrival of the Turing machine concept itself.
Unlike in mathematics whose domain is not finitely axiomatisable [,,,,,,], in physics one could still attempt to argue that although we presently do not know every natural law, there is only a finite number of them. Yet, once we have found them all, we may just stop experimenting because we have found the ultimate & true laws of nature, and there is nothing left but deriving their consequences, which is a purely syntactic task of deduction. Thus in physics one could still cling to the idea of ultimate finite truth, the ultimate theory of everything, a Hilbertean utopia which is not conceivable for mathematics ever more. - Indeed, people, among them many scientist, who can barely bear the painful recognition of the temporal, historic status of physical modelling tend to believe that indeed we are almost at the very edge of such a theory of everything! This ``over-stretch'' of scientific methods, which are applied to extreme (space-time-matter) domains characterises the beginning of the creation of ``scientific fairy tales'' which share a huge publicity; in particular when the public relation campaign manages to convey a mysterious image of both author & subject.
Mathematics seems to contain an infinity of possible universes, most of which (e.g., those containing arithmetic) are too rich in structure to be finitely axiomatisable. The physical laws specify the mathematical universe we live in. I.e., whereas there is no a priori criterion to select one universe over the other, there is only one physical universe; see Fig. , based on the alternative set theory view of mathematics [].
This seems strange.
One way to avoid this arbitrariness in the selection of one physical universe is to assume that there are an (uncountable?) infinity of parallel universes; all based on different mathematical models. This is somewhat related to the Everett interpretation of quantum mechanics.
It is as if some external minds were shopping in a huge fair, an exhibition of all possible (virtual) universes, much like J. L. Borges' Library of Babel. The mind chooses to get ``a taste'' of some particular universe by ``entering'' it: by being born there without memory of its past, nor knowledge of its future. (Otherwise the scenery would be only half the fun.)
Another way of eliminating the arbitrariness is the following. One cannot be sure whether the multitudes of mathematical universes are consistent. Probably there is only one consistent mathematical universe, and this is the same universe in which we physically live in.
It it not entirely unreasonable to speculate that the continuity of space and/or time or some action variable is an illusion. Almost all color prints consist of ``microscopic'' pixels or color dots. To an ``uninformed audience,'' images in cinema, in television or on computer screens, although build out of discrete cadres of events, appear continuous - in fact, they have been constructed to be perceived in that way. It is only from some context-abnormal experience, e.g., waving one's hand in front of a television screen et cetera, that discreteness reveals itself. Even the brain appears to organise the seemingly continuous flow of consciousness into discrete time frames of approximately 70 ms [].
<<cellular.m; a=Flatten[Append[{"-","-","-",">"},Table[_,{i,60}]]]; TableForm[EvolveCA[a,{{l_,_,_}->l},19], TableSpacing->{0,0}, TableDirections->{Column}]
<<cellular.m; a={"_","_","_",I,">","_","_","_","_","_","_",I,"_","_","_"}; Print[TableForm[EvolveCA[a, { {">","_",_} -> ">", {_,"_","<"} -> "<", {"_","_","_"} -> "_", {_,"_",">"} -> "_", {"<","_",_} -> "_", {"_",">","_"} -> "_", {"_","<","_"} -> "_", {"_",">",I} -> "<", {I,"<","_"} -> ">", {_,I,_} -> I, {"_","_",I} -> "_", {I,"_","_"} -> "_", {I,">","_"} -> "_", {"_","<",I} -> "_" }, 19], TableSpacing->{0,0},TableDirections->Column]];
LogS[x_] := Module[{z = x,y,t = 0}, While[ (y = N[Log[2,z]]) > 0, t=t+y ;z=y] ;t] LgS[x_] := Module[{z = x,y,t = 0}, While[ (y = Floor[N[Log[2,z]]]) > 0, t=t+Floor[N[Log[2,z]]] ;z=y] ;t] (* Plot[{LgS[x],LogS[x]},{x,0.01,20}] *)
A very elegant implementation of functions for automaton analysis has been created by Ch. F. Strnadl. It makes use of the package ``DiscreteMath`Combinatorica`'' which is contained in Mathematica, version 2. ``DiscreteMath`Combinatorica`'' is documented in Implementing Discrete Mathematics by St. Skiena [], its creator. Caution: althought ``DiscreteMath`Combinatorica`'' is ingenious in many other respects, the function HasseDiagram contains a bug and has to be modified. The corrected function, called HasseD is contained in the automaton package. A copy of the automaton package can be obtained via anonymous ftp: ftp.univie.ac.at, directory packages/mathematica, file automata.m; or from the author by e-mail: e1360dab@awiuni11.bitnet, or e1360dab@awiuni11.edvz.univie.ac.at, or by sending in a formatted MS-DOS diskette.
|
(* * Package Automata.m * * Mathematica 2.x version * * * Package implementing the Automaton Propositional Calculus * for Moore and Mealy automata. * * Literature: K. SVOZIL, Randomness and Undecidability in Physics, * 1992, p137-150 * * * Christoph F. Strnadl (strnadl@tph01.tuwien.ac.at) * Insitute for Theoretical Physics, TU Vienna * Wiedner Hauptstrasse 8-10, A-1040 AUSTRIA * * 25-Sep-92 Started implementing Automaton Propositional Calculus * functions. * 29-Sep-92 Changed from the Xxxx[AM] functions to generic version * which differentiate between the automata via the * Automat[[1]] entry. * 30-Sep-92 Finished the implementation of the generic version. * A final test with a (532) Moore automaton succeeded (in * comparison with the original Moore.m package). * 02-Oct-92 A test with Mathematica v2.0 on a 386-PC showed, that * Wolfram has once again changed the order of context * search rules. Automata.m won't work. We have naming * conflicts with Path[] and Partitions[]. These two * have been renamed to PathA[] and PartitionsA[] to re- * flect the Automaton-nature inherent to them :-) * 08-Oct-92 K.Svozil suggested the possibility of an error in the * construction of the Hasse-Diagram as implemented in * St. SKIENA's HasseDiagram[]. I confirmed that error, * which actually lies in the fact, that one cannot use * RankGraph[] to construct the rank-ing of the vertices. * An overworked HassD[] function is developed in this * packages. But now we face the additional difficulty of * having connected vertices where no edges are defined -- * this being due to a special geometrical aliasing problem. * With HasseD[g,factor] we are able to deal with such types * of graphs. * 09-Oct-92 Finished implementation of HasseD[]. And again we had to * face some difficulties regarding the $ContextPath de- * pendencies. Changed the functionality of PropCalc[]: It now * generates the Graph[] object of the Propositional Calculus. * To display the Hasse-Diagram one now uses ShowPropCalc[]. This * behaviour is in accordance with the big classification task * undertaken by K.SVOZIL, who needs the Graph[] object and not * the picture :-) *) BeginPackage["Automata`","DiscreteMath`Combinatorica`"] Automat::usage := "Automat[ type, transition-table, output-table,intern, {i1,i2,..} ] is the generic form how an automaton is stored. <type> = Moore | Mealy for a Moore or Mealy automaton <transition-table> = a list of rules in the form { {state1,input1} -> next1 , {state2,input2} -> next2 ,... } which specifies the transition of the automaton from state1 into state next1 upon reading input-symbol input1 and so on. <output-table> is different for the Moore and Mealy automata: Moore: { i1 -> o1, i2 -> o2, ... } a list of rules specifying for each internal state (e.g. i1) the corresponding output symbol (e.g. o1). Mealy: { {i1,s1} -> o1, {i2,s2} -> o2 ,...} a list of rules specifying for each internal state i1 *and* input symbol s1 the corresponding output symbol o1. <intern> = the number of internal states of the automaton {i1,i2,...} = the list of all recognized input symbols of the automaton." aMoore::usage := "The most well known (422) Moore automaton." aCounter::usage := "A Moore automaton whose automaton propositional calculus is no lattice." aMealy::usage := "aMealy is a generic (332) Mealy automaton." MooreQ::usage := "MooreQ[aut] is True if automaton aut is a Moore automaton, False otherwise." MealyQ::usage := "MealyQ[aut] is True if automaton aut is a Mealy automaton, False otherwise." Feed::usage := "Feed[a,e,{s1,s2,...}] returns the list {i1, i2, ...} of internal states automaton a is in when reading the input string {s1, s2,..} from initial internal state e." PathA::usage := "PathA[a,e,{s1,s2,...}] returns the list {o1, o2,...} of output symbols automaton a produces when reading the input string {s1,s2,...} from initial state e. PathA[a,{e1,e2,...},{s1,s2,..}] returns a list of the output-lists automaton a produces when reading input string {s1, s2,..} from initial states {e1, e2,...}." StateFromInput::usage := "StateFromInput[a,{s1,s2,...}] returns the partition v({s1,s2,..}) of all input states which can be identified by automaton a upon feeding a with input string {s1,s2,..}." PropsFromInput::usage := "PropsFromInput[a,{s1,s2,..}] returns a list of all propositions which can be identified by feeding automaton a with input string {s1, s2,..}." StateFromLevel::usage := "StateFromLevel[aut,l] computes all possible partitions of input states which can be identified with automaton aut upon performing an experiment with length l on it." PartitionsA::usage := "PartitionsA[aut,l] computes all possible partitions of input states which can be identified by automaton aut, when performing all possible experiments of input strings up to (and including) the length l." Propositions::usage := "Propositions[aut,l] generates the set of all propositions which can be identified by automaton aut upon performing experiments up to the length l with it. Propositions[partlist] generates the set of all propositions for the given partition partlist. partlist may be generated by PartitionsA[]." PropCalc::usage := "PropCalc[v,len] computes the Graph[] object for the Propositional calculus of the automaton with partition list v. Input strings up to the length len are considered. The graph is the simple graph, no reduction to a Hasse-Diagram is done! PropCalc[aut,len] computes the same as above, but for Automaton aut." ShowPropCalc::usage := "ShowPropCalc[aut,l] makes the Graph[] of the automaton propositional calculus for automaton aut. Only experiments up to length l are considered. The graph itself is rendered as the Hasse Diagram of the partial order induced by the automaton propositional calculus. ShowPropCalc[aut,l,fac] makes the same as above, but with each point's position stretched by a factor a fac according to the stage the point is in the hierarchy. ShowPropCalc[partlist] renders the Graph[] for the given list of partitions. ShowPropCalc[partlist,fac] renders the Graph[] for the given partition-list, the positions of the vertices on the graph is stretched by factor fac to avoid geometrical aliasing." HasseD::usage := "HasseD[g] renders the Hasse-Diagram of graph g. HasseD[g,fac] renders the Hasse-Diagram of graph g with each points location stretched by a factor fac according to it's stage." Begin["`Private`"] (* * aMoore is the MOORE automaton * * Syntax is: * Automat[ type, transition-table, output-table, internal-states, * input-symbols ] * * type = Moore | Mealy * * For the sake of convenience there is no MakeAutomaton[] * function, so we cannot easily convert between the Global`Moore * (or Global`Mealy) symbols and private Automata`Private`Moore. * So we just use the Global context to keep user efforts to a minimum. *) aMoore := Automat[Global`Moore, {{1, 0} -> 4, {1, 1} -> 3, {2, 0} -> 1, {2, 1} -> 3, {3, _} -> 4, {4, _} -> 2}, {1 -> 0, 2 -> 0, 3 -> 0, 4 -> 1}, 4, {0,1} ] (* * aCounter is the automaton of SVOZIL 1992, p150, which has a propositional * structure which is NOT a lattice and whose implication is NO * partial ordering. It is the so called counter example. *) aCounter := Automat[Global`Moore,{{1, _} -> 2, {2, 0} -> 3, {2, 1} -> 2, {3, _} -> 4, {4, 0} -> 4, {4, 1} -> 1}, {1 -> 0, 2 -> 0, 3 -> 1, 4 -> 1}, 4, {0,1} ] (* * aMealy is a generic Mealy automaton *) aMealy := Automat[Global`Mealy, {{1, _} -> 1, {2, _} -> 2, {3, _} -> 3}, {{1, 1} -> 1, {1, 2} -> 0, {1, 3} -> 0, {2, 1} -> 0, {2, 2} -> 1, {2, 3} -> 0, {3, 1} -> 0, {3, 2} -> 0, {3, 3} -> 1}, 3, {1, 2, 3}] (* * MooreQ[aut] is TRUE if aut is a Moore Automaton * MealyQ[aut] is TRUE if aut is a Mealy type Automaton * * If we don't use Global`Moore here, we'd have Automata`Private`Moore, * which means, that the user would have to type the full qualified name if * HE wants to create an automaton of his/her own! *) MooreQ[a_Automat] := a[[1]]===Global`Moore MealyQ[a_Automat] := a[[1]]===Global`Mealy (* * Feed[a,e,i] returns the list of internal states the automaton a * is in when reading the input string i from initial state e. *) Feed[m_Automat, i_, l_List] := FoldList[{#1, #2} /. m[[2]] & , i, l] (* * PathA[a,e,i] returns the list of output-symbols the automaton a * emits, when reading input string input starting at initial state e. * PathA[a,{e1,e2,...,i}] returns the list of output-symbols for * different initial states e1, e2, ... *) PathA[a_Automat, e_Integer, input_List] := (Feed[a, e, input] /. a[[3]] ) /; MooreQ[a] PathA[a_Automat, e_List, i_List] := (PathA[a, #1, i] & ) /@ e (* There is a bug in the Apollo Domain/OS Mathematica 1.2 implementation, * which causes Inner[ List,{a,b,c},{1,2,3},List] not to correctly work. * Instead of { {a,1},{b,2},{c,3} } we get { {a,b,c},{1,2,3} }. * This workaround does produce the correct result even on the Apollos. *) PathA[a_Automat, e_Integer, in_List] := ( Apply[ List, Inner[f, Drop[Feed[a, e, in], -1], in, List], 2 ] /. a[[3]] ) /; MealyQ[a] (* Correct and Simpler Version follows: PathA[a_Automat, e_Integer, in_List] := Inner[ List, Drop[ Feed[a,e,in], -1], in, List] /; MealyQ[a] *) (* * StateFromPath[io,z] gives the set of initial states which * generate the output-string z. The list io must contain an internal * representation of the input/output analysis of an automaton. * (actually, the output of FullPath[] will right do it) So, * do not use this function from outside. Use StateFromInput[] * instead! *) StateFromPath[io_, z_] := (#1[[1]] & ) /@ io[[(#1[[1]] & ) /@ Position[io, z]]] (* * FullPath[m,e,i] returns a list { {e1,e1path},{e2,e2path},...} * were together with the initial state ei the corresponding output * string eipath for an input list i is displayed. This form is used * by StateFromPath[]. *) FullPath[m_Automat, e_List, i_List] := Thread[{e, PathA[m, e, i]}] (* * StateFromInput[m,e,z] returns the partition v(z) (cf. SVOZIL 1992, * p140) of initial states, which can be identified by automaton m * upon feeding m with the input list z. * * Note: A Mealy automaton produces output only when feeding it with at * least one input symbol. *) StateFromInput[m_Automat,e_List,{}] := { {} } /; MealyQ[m] StateFromInput[m_Automat, e_List, z_List] := Block[ {iopath, zpath}, iopath = FullPath[m, e, z]; zpath = Union[(#1[[2]] & ) /@ iopath]; (StateFromPath[iopath, #1] & ) /@ zpath ] (* * StateFromInput[m,z] is the same as above, but with implicitely * taking all possible input-states of the automaton into account *) StateFromInput[m_Automat, z_List] := StateFromInput[m, Range[ m[[4]] ], z] (* * PropsFromInput[m,e,z] generates the list of all discernible initial * states, which can be identified by performing the experiment z * (i.e. input string is z) on the automaton m with initial states * e. The output is in the form of { {}, {e1}, {e1,e2},...}. *) PropsFromInput/: PropsFromInput[m_Automat, e_List, i_List] := PowerSet[StateFromInput[m, e, i]] (* * PropsFromInput[m,i] generates a list of all discernible initial * states of automaton m when feeding it input-list i. *) PropsFromInput[m_Automat, i_List] := PropsFromInput[m, Range[ m[[4]] ], i] (* * PowerSet[ { s1, s2, ..}] produces the power set of all combinations * of unions of sets s1, s2,.. : { {1},{2},{3,4},.. } -> * { {1},{2},{3,4},{1,2},{1,3,4},{2,3,4},{1,2,3,4},... }. *) PowerSet[e_List] := Union[ Apply[Union, Subsets[e], {1}] ] (* * MakeInput[ level, {s1, s2, ...} ] generates all lists of input- * sequences with level input-symbols in each list. This function * is the same as St. SKIENA's Strings[l,n] function, but implemented * differently. *) MakeInput[0,_] := {} MakeInput[level_Integer, s_List] := Flatten[Apply[Outer, Join[{List}, Table[s, {i, level}]]], level - 1] (* * StateFromLevel[m,l] computes all different partitions available * for any input string of length l of automaton m. *) StateFromLevel[m_Automat,0] := { StateFromInput[m,{}] } /; MooreQ[m] (* consistency with higher StateFromLevel[] list output needs surrounding braces *) StateFromLevel[m_Automat,0] := { {} } /; MealyQ[m] StateFromLevel[m_Automat, l_Integer] := Union[(StateFromInput[m, #1] & ) /@ MakeInput[l, m[[5]] ] ] (* * PartitionsA[m,l] determines all initial state partitions, which * can be identified experimentally with input-strings up to length * l for automaton m. * The difference between the MOORE and MEALY automata is more subtil * for this function: A MOORE automat emits an output symbol even when * presented with no input (therfore we have an iterator {j,0,l}). * A MEALY automat just shows output when eating an input symbol, so * the iterator starts at 1: {j,1,l}. * * The Union[ Sort /@ ... ] construct eliminates identical state partitions * via first bringing all the state partitions into canoncial (= ordered) * form (Sort[]) and then making a union thereof. *) PartitionsA[m_Automat, l_Integer] := Union [ Sort /@ Flatten[ Union[(StateFromLevel[m, #1] & ) /@ Table[j, {j, 0, l}]], 1 ] ] /; MooreQ[m] PartitionsA[m_Automat, l_Integer] := Union[ (* sort out multiple entries *) Sort /@ Flatten[ Union[(StateFromLevel[m, #1] & ) /@ Table[j, {j, 1, l}]], 1] ] /; MealyQ[m] (* * Propositions[v] generates a list of all possible automaton * propositions. v is the PartitionList as returned by Partitions[]. *) Propositions[v_List] := Union[Map[Sort, Flatten[(Flatten /@ Subsets[#1] & ) /@ v, 1], 1]] (* * Propositions[m,l] generates the automaton propositional calculus * for automaton m with a maximum length of input strings of l. *) Propositions[m_Automat, l_Integer] := Propositions[PartitionsA[m, l]] (* * PropCalc[v] renders the Automaton Propositional Calculus for the * partition list v. v is in the same format as returned from PartitionsA[]. * *) PropCalc[v_List] := Block[{vfull, vp}, vp = Propositions[v]; vfull = Map[Sort, (Flatten /@ Subsets[#1] & ) /@ v, 2]; DiscreteMath`Combinatorica`MakeGraph[vp, Intersection[#1, #2] === #1 && Intersection[first[Position[vfull, #1]], first[Position[vfull, #2]]] =!= {} & ] ] PropCalc[m_Automat,i_Integer] := PropCalc[ PartitionsA[m,i] ] (* * ShowPropCalc[] generates the Hasse-Diagram of the Automaton Propositional * calculus, which is the displayed. fac is a factor (default = 1), with * which each vertex x-position is multiplied according to it's stage to * avoid geometrical aliasing. *) ShowPropCalc[v_List,fac_:1] := Block[ {g, vp}, vp = Propositions[v]; (* need only for labelling the graph *) g = PropCalc[v]; DiscreteMath`Combinatorica`ShowLabeledGraph[ HasseD[g,fac],vp] ] ShowPropCalc[m_Automat,i_Integer,fac_:1] := ShowPropCalc[ PartitionsA[m,i], fac] (* * first[{ {a,1,..},{b,2,...},{c,...},...}] generates the list * {a,b,c,...} consisting of all the first elements of the sublists of * list. *) first[l_List] := (#1[[1]] & ) /@ l (* * SetLevel[{p1,p2,...},lvl,rank] sets the positions p1, p2,.. of * list rank to the level lvl, if the old entry at that position * is less than level. *) SetLevel[l_List,lvl_,rank_List] := Block[ {r=rank}, If[ r[[#]] < lvl, r[[#]] = lvl ] & /@ l; r ] (* * MakeLevel[l,level,adjm,rank] constructs recursively the ranks of * each vertex according to the adjacency matrix adjm of the graph. * rank is the current ranking, level the new level to assign and * l = {v1,v2,..} the list of vertices to be set to level. *) MakeLevel[{},_,_,rank_] := rank MakeLevel[l_List,lvl_,adjm_List,r_List] := Block[ {rank=r, v, lst=l }, rank = SetLevel[lst,lvl,rank]; (* make this level ready *) While[ lst != {}, v = First[lst]; rank = MakeLevel[adjm[[v]], lvl+1,adjm,rank]; lst = Rest[lst]; ]; rank ] (* * HasseD[g] renders a graph corresponding to the HasseDiagram of * the partial order induced by the directed graph g. * HasseD[g,fac] renders the HasseDiagram in which each vertex' * position is stretched by factor fac. In each stage that factor * is taken to the power of the distance to the 1 element. * * This function also uses some functions of Combinatorica. * Unfortunately, St. SKIENA's implementation HasseDiagram[] is faulty * for certain types of posets! *) HasseD[g_,fak_:1] := Block[{r, rank, m, stages, freq=Table[0,{DiscreteMath`Combinatorica`V[g]}], adjm, first}, r = DiscreteMath`Combinatorica`TransitiveReduction[ DiscreteMath`Combinatorica`RemoveSelfLoops[g] ]; adjm = DiscreteMath`Combinatorica`ToAdjacencyLists[r]; rank = Table[ 0,{ DiscreteMath`Combinatorica`V[g]} ]; first = Select[ Range[ DiscreteMath`Combinatorica`V[g]], DiscreteMath`Combinatorica`private`InDegree[r,#]==0& ]; rank = MakeLevel[ first, 1, adjm, rank]; first = Max[rank]; stages = DiscreteMath`Combinatorica`Distribution[ rank ]; DiscreteMath`Combinatorica`Graph[ DiscreteMath`Combinatorica`Edges[r], Table[ m = ++ freq[[ rank[[i]] ]]; { ((m-1) + (1-stages[[rank[[i]] ]])/2) fak^(first-rank[[i]]), rank[[i]] }, {i, DiscreteMath`Combinatorica`V[g]} ] ] ] /; DiscreteMath`Combinatorica`AcyclicQ[ DiscreteMath`Combinatorica`RemoveSelfLoops[g], DiscreteMath`Combinatorica`Directed ] End[] (* `Private` *) EndPackage[] (* Automata.m *) (* Automata.m --------------------------------------------------------*)
Fig. 10.8, p. pageref shows the Hasse
diagrams of generic logics for automata up to 4 states. They have been obtained
by a similar procedure as before, despite the fact that the set of state
partitions has been generated by permutation; e.g., by the following
Mathematica code.
|| ||
$DefaultFont={"Helvetica",20};
<<\math\packages\Discrete\Combinat.m ;
<<Automata.m;
Print["----"];
Compress[h_]:=
Block[{n,index,iset,i1,in,i,j},
n=Length[h];
hh={h[[1]]};
index={1};
iset={1};
Do[
If[Intersection[index,{i}]==={i}, index,
Do[
If[Intersection[index,{j}]==={j}, index,
If[i===j,
index=Append[index,j];
iset=Append[iset,j];
hh=Append[hh, h[[j]] ],
If[
ii=False; in=Length[iset];
Do[ii=Or[ii,IsomorphicQ[ h[[j]],hh[[i1]] ]],
{i1,in}];ii,
index=Append[index,j], index
];
];
];
,{j,i,n}];
];
Print[i, " // ",index];
Print[i, " ",iset];
,{i,n}];
hh]
Print["----"];
a=Map[ToCycles,Permutations[{1,2,3,4}]];
Print[a,Length[a]];
a1=Rest[a];a=a1;
b={};
Do[If[First[Dimensions[a[[ii]]]]!=1,b1=Append[b,a[[ii]]];b=b1],{ii,
Length[a]}];
a=b;
b=Table[{},{Length[a]}];
Do[b[[i]]=Map[Sort,a[[i]]],{i,Length[a]}];
aa=Union[b];
(* reduction of trivial state partitions*)
Print[b];
sa=Subsets[aa];
Print["Length=",Length[sa]];
(*
g = Union[PropCalc[#]& /@ sa];
*)
g={};
Do[g=Union[Append[g,PropCalc[sa[[i]]]]],
{i,Length[sa]}];
(* graphs of the propositional calculi *)
Print["(* graphs of the propositional calculi *)"];
h=Table[HasseD[g[[i]],1.2],{i,2,Length[g]}];
(* construction of Hasse diagrams *)
Print["(* construction of Hasse diagrams *)"];
hh=Compress[h];
(* construction of set of nonisomorphic Hasse diagrams *)
hh >> result. ;
hh=Prepend[hh,
Graph[{{0, 1}, {0, 0}}, {{0, 1}, {0, 2}}]
];
(*
* Generate the pictorial representation of all nonisomorphic graphs
*)
hhg=Table[ShowGraph[ShakeGraph[hh[[i]],0.00]],{i,Length[hh]}];
(*
* Generate a partition of these pictures with mm numbers of rows
*)
mm=5;
hhp=Partition[hhg,mm];
(*
* Generate an array of graphics objects
*)
hh1=Append[hhp,Complement[hhg,Flatten[hhp]]];
If[MemberQ[hh1,{}],hh2=Delete[hh1,Length[hh1]],hh2=hh1];
xxa=Show[GraphicsArray[hh2]];
(*
* Generate PostScript description of array of graphics objects on file
*)
HardcopyF[xxa,"ga.ps"];
(* random iteration algorithm for the Cantor set iteration nit times, starting ``seed'' xin *) cantor[nit_,xin_]:= Block[{i}, (* iterated function system data *) a={1/3, 1/3}; b={0, 2/3}; x=xin; Do[ ra=Random[Integer,{1,2}]; (* apply affine transformation *) newx=a[[ra]]*x + b[[ra]]; x=newx; ,{i,20}]; (* iteration starts nit times! *) gr={}; Do[ ra=Random[Integer,{1,2}]; (* apply affine transformation *) newx=a[[ra]]*x + b[[ra]]; x=newx; (* append point to graphics *) ge=Append[gr,Graphics[AbsoluteThickness[1],Line[{{x,0},{x,1}}]]]; gr=ge; ge=Flatten[gr]; ,{i,nit}]; (* display graphics *) Show[ge,Frame->True,FrameTicks->None (*,PlotLabel->d, DisplayFunction->Identity*)] ];
The following Mathematica programs generate random
fractal patterns; see chapter 15, p. pageref.
|| ||
(*
* Bursts.m
*
* Package implements one-dimensional fractals, both randomly
* and sequentially generated.
*
* Christoph F. Strnadl (strnadl@tph01.tuwien.ac.at)
* Institute for Theoretical Physics
* Wiedner Hauptstrasse 8 - 10, TU Vienna
* A-1040 VIENNA, AUSTRIA / Europe
*
* HISTORY:
* 09-Dec-92 First implementation on Apollo Domain/OS Mma v1.2
* 10-Dec-92 Added functionality for the generation of
* Sequential bursts (like the Cantor Dust).
*
* 03-Mar-93 Added functionality for evaluation of the
* ``box-counting'' fractal dimension:
* function Dim[ ] by Franz Pichler, TU-Wien
*
*
*)
BeginPackage["Bursts`"]
UniqueRandom::usage =
"UniqueRandom[typ, n, {from,to}] generates a list of n different
random items of type typ in the range of {from,..., to}.
typ is the same as for the built-in Random[Integer,...] etc.
functions."
RandomBursts::usage =
"RandomBursts[dim, ninter, npart] generates a one-dimensional random
burst of (approximate) fractal dimension dim by dividing the unit
intervall into ninter sub-intervalls and choosing (according to the
fractal dimension dim) sub-intervalls which are again divided, and
so on... for npart times.The resulting random fractal is then
displayed as a Graphic[]s object.
Default values: ninter = 10, npart = 3."
RandomBurstsList::usage =
"RandomBurstsLists[] functions as RandomBursts[] but it returns the
list of the subintervalls in the last level instead of the Graphics-
object."
ShowBursts::usage =
"ShowBursts[{e1, e2,...}] shows the list of values ei = 0 | 1 as a
Burst-Graph."
SequentialBurstsList::usage =
"SequentialBurstsList[lst,patt,lvl] makes a list corresponding to a
sequential burst pattern, which is generated by applying the
substitution pattern patt to the initial list lst for lvl times.
SequentialBurstsList[patt] has a default initial list {1} and goes
for 3 levels deep.
SequentialBurstsList[patt,lvl] has a default initial list {1}."
SequentialBursts::usage =
"SequentialBursts[] has the same functionality of SequentialBurstsList[]
the only difference being the fact that the final list of intervalls
is displayed as a Graphic[]s object and not outputted in form of a
list."
Dim::usage=
"Dim[ ldat ] evaluates the ``box-counting'' fractal dimension of the
list ldat."
Begin["`Private`"]
(*
* UniqueRandom[typ,n,{from,to}] makes a list of n different random
* items of type typ in the range from ... to.
*
* Of course, one could have written UniqueRandom[] more functional
* like, but -- again -- the SameTest-Option for FixedPoint[] is
* missing in v1.2 :-(
*)
UniqueRandom[type_,n_Integer,{from_,to_}] :=
Block[ { r = {} },
While[ Length[r] =!= n,
r = Union[ r, {Random[ type, {from,to} ]} ]
];
r
]
(*
* zoom[lst, n, r] expands the list lst according to the (integer)
* ratio r into sublists consisting of n elements.
* Every '0' in lst -- representing a discarded intervall -- is replaced
* by n empty subintervalls {0,....,0}.
* Every '1' in lst -- representing an intervall which has been kept --
* is replaced by a list of subintervalls, of which exactly r of the n
* intervalls being kept, the other being discarded.
*)
zoom[l_List,n_Integer,r_Integer] :=
( # /. { 0 -> Table[0,{n}],
1 -> ReplacePart[ Table[0,{n}], (* list to be replaced *)
1, (* replace with '1' *)
{#}& /@ UniqueRandom[Integer, r,{1,n}]
(* positions at which to replace *)
]
} )& /@ l //Flatten
(*
* RandomBursts[dim,n,lvl] generates a random burst of approximate
* dimension dim in which the unit intervall is divided into n
* (default 10) subintervalls for lvl (default = 3) times.
*)
RandomBursts[d_,n_Integer:10,lvl_Integer:3] :=
ShowBursts[ RandomBurstsList[d,n,lvl] ]
(*
* RandomBurstsList[] as RandomBursts[] but returns the list of
* intervalls instead of the Graphic[]s object.
*)
RandomBurstsList[d_,n_Integer:10,lvl_Integer:3] :=
Block[ {dim, r, l},
r = Round[ n^d ]; (* number of substitutions *)
dim = Log[r] / Log[n]; (* correct dimension *)
Print["Exact similarity dimension is ",dim//N];
Print["Substitution ratio r = ",r,":",n," (",r/n//N,")"];
l = Nest[ zoom[#, n, r]&, {1}, lvl]
]
(*
* ShowBursts[]
*
* Note, that we must set the PlotRange for the x-axis to be the
* full length of the list. Otherwise a lot of trailing (or leading)
* empty stripes wouldn't be displyed, which is not what any User
* would expect.
*
* Of course, one could have written a much more elegant and stream-
* lined functional (tail-recursive ... insert your favorite style here)
* version of ShowBursts[] ;-)
*)
ShowBursts[l_List] :=
Block[ {gr={}, i},
Do[ If[ l[[i]]===1,
gr = Append[gr,Line[{{i,0},{i,1}}] ]
],
{i, Length[l] }
];
Show[Graphics[ gr ],PlotRange->{{0,Length[l]},Automatic} ]
]
(*
* szoom[lst,p] expands the list lst in the following way: Each '0'
* in lst is replaced by a list {0,0,...,0} of length Length[p],
* each '1' is replaced with the pattern p. After these substitutions
* the inner lists are Flatten'ed out.
*
* We use a szoom1[] function just for more computational speed, which,
* at least in the 2.x versions, could be boosted much more efficiently
* by Compiling[] the code. BTW the 's' stands for sequential.
*)
szoom[lst_List,p_] := szoom1[lst,p,Length[p] ]
szoom1[l_,p_,n_] :=
( # /. { 0 -> Table[0,{n}], 1 -> p } )& /@ l //Flatten
(*
* SequentialBursts[lst,patt,lvl] generates a sequential burst pattern
* out of starting list lst, wherein each '1' is replaced by the
* pattern patt. This process is repeated for lvl times.
*)
SequentialBursts[p_,lvl_Integer:3] := SequentialBursts[{1},p,lvl]
SequentialBursts[l_List,p_,lvl_Integer:3] :=
ShowBursts[ SequentialBurstsList[ l,p,lvl ] ]
SequentialBurstsList[p_,lvl_Integer:3] := SequentialBurstsList[{1},p,lvl]
SequentialBurstsList[l_List,p_,lvl_Integer:3] :=
Nest[ szoom[#,p]&, l, lvl ]
(*
* Dim[list,s,name] evaluates the ``box-counting'' fractal dimension
* of the list 'list', with optional cut-off parameter 's'
* and plot label 'name'.
*)
Dim[list_,s_Real:.3,name_String:"plot"]:=
Module[{n,werte,anz,jmx,jg,delta},
$DefaultFont={"Helvetica",20};
n=Length[list];
If[n<2,Print[ bad data ! ];Abort[] ];
werte={};
jmx=Floor[Log[2 n/3]/Log[4/3]]//N; (* number of steps *)
jg=Floor[Log[n/9]/Log[4/3]]//N; (* parameter for low & high contrast *)
Do[If[j<=jg,
delta=Round[E^.4*(4/3)^j], (* length of the counting-intervalls *)
delta=Floor[n/(jmx-j)],
];
anz=0;i=1;
While[i<=n,If[Part[list,i]>s,anz++;i+=delta,i++]]; (* box-counting *)
AppendTo[werte,{Log[delta],Log[anz]}], (* plotpoints for graphics *)
{j,0,jmx-1}
];Clear[x];
gerade=Fit[werte,{1,x},x]; (* interpolation *)
dim=-Coefficient[gerade,x,1];
sum=0;q=Length[werte];
Do[x=werte[[j,1]]//N;
y=werte[[j,2]]//N;
sq=(gerade-y)^2;sum+=sq, (* sum of errors^2 *)
{j,1,q}
];
sigq=Sqrt[sum/q]*Cos[ArcTan[dim]]; (* confidention-interval *)
Print[dim,"+-",sigq];
grp=ListPlot[werte,DisplayFunction->Identity];
grg=Plot[gerade,{x,0,Log[n]},DisplayFunction->Identity]
;
Show[grp,grg,
Graphics[Text[ToString[StringForm[ (* graphics *)
" dim = `` +- ``",dim,sigq]],
{Log[n]/2,.9*Log[n]}]],
DisplayFunction->$DisplayFunction,
AxesLabel->{"log(delta)","log(N)"},
AspectRatio->Automatic,
PlotRange->{0,Log[n]},
PlotLabel->ToString[StringForm[" `` , s = ``",name,s]]
]
]
End[] (* Private *)
EndPackage[] (* Bursts.m *)
|
|
|
|
|
|
|
Chaos is characterised by both undecidability and randomness. Gödel incompleteness is translated into the physical context: for example, the behavior of deterministic processes which are computable on a step-by-step basis is generally impossible to predict, even if the evolution laws and initial states are exactly known. Complementarity is investigated by very elementary automaton models. Several mathematical concepts of randomness are discussed with respect to their applicability for a characterisation of chaotic physical motion.
Algorithmic physics: the Universe as a computer
Algorithmics and
recursive function theory
Mechanism and determinism
Discrete physics
Source coding
Lattice theory
Extrinsic-intrinsic concept
Algorithmic information
Computational complexity
Undecidability
Classical results
Complementarity
Extrinsic indeterminism
Intrinsic indeterminism
Weak physical chaos
Randomness
Randomness in mathematics
Random fractals and 1/f
noise
Chaotic systems are optimal analogues of themselves
Quantum chaos
Algorithmic entropy
Epilogue: Afterthoughts, speculations &
metaphysics
Born 1956 in Vienna, Austria, the author studied physics in Vienna and Heidelberg. He was visiting researcher, among other institutions, at the University of California at Berkeley and the Lawrence Berkeley Laboratory, at Moscow State University and the Lebedev Institute. He holds a tenured position at the Institute of Theoretical Physics at the Technische Universität Wien.
1 There exist
theories with an infinite number of axioms, e.g., the Peano
axioms or the induction rules
|
2 Formally, one may define [] an experiment on an automaton as a function e:O*® IÈ\D , where O* is the set of all output sequences generated by the automaton, I is the set of all input symbols, and \D is the set of all ``conclusions,'' or ``answers,'' or propositions. This can be understood as follows. The effect of some output sequence is either some ``conclusion'' or ``answer'' in \D about the automaton, or the further input of a symbol in an attempt to reach a ``conclusion'' or ``answer'' in \D . I.e., the experimenter concludes from a particular sequence of outputs certain propositions, or ``statements'' about the automaton.
3 The ``implies'' is understood here as the usual ``implies'' operation of the classical propositional calculus.
4 The ``or'' is understood here as the usual ``or'' operation of the classical propositional calculus.
5 The ``and'' is understood here as the usual ``and'' operation of the classical propositional calculus.
6 ¼ let these cells lie ``outside'', in the external, otherwise
quiescent, region of the [[CA]] crystal ¼ such that
they will normally not disturb (stimulate or otherwise transform) each other or
the surrounding quiescent cells.