Turing Machines

Turing Machines (TMs)

We saw that finite automata recognize regular languages and pushdown automata recognize context-free languages, but both models have fundamental limitations. The Turing machine, introduced by Alan Turing, removes these limitations by providing an infinite tape that serves as both input and unbounded read/write memory. It is the standard model for defining what it means to compute.

Definition: Turing Machine

A Turing machine is a 7-tuple \((Q, \Sigma, \Gamma, \delta, q_0, q_{\mathrm{accept}}, q_{\mathrm{reject}})\) where \(Q, \Sigma, \Gamma\) are finite sets and

\(Q\) is the set of states,
\(\Sigma\) is the input alphabet, not containing the blank symbol \(\sqcup\),
\(\Gamma\) is the tape alphabet, where \(\sqcup \in \Gamma\) and \(\Sigma \subseteq \Gamma\),
\(\delta: Q \times \Gamma \to Q \times \Gamma \times \{L, R\}\) is the transition function,
\(q_0 \in Q\) is the start state,
\(q_{\mathrm{accept}} \in Q\) is the accept state,
\(q_{\mathrm{reject}} \in Q\) is the reject state, where \(q_{\mathrm{accept}} \neq q_{\mathrm{reject}}\).

The key differences between a Turing machine and a finite automaton are:

The tape has infinite length. Initially it contains the input string, with blank symbols (\(\sqcup\)) filling all remaining cells.
A Turing machine can both read symbols from the tape and write symbols on it.
The read-write head can move both left (L) and right (R). To store information, the machine writes it on the tape; to retrieve it later, the head moves back to where it was written.
The accept and reject states take effect immediately upon entry. A computation therefore has three possible outcomes: accept, reject, or loop (run forever without halting).

A Turing machine is strictly more powerful than any finite automaton or pushdown automaton, and is believed to capture the full power of any real computer. If a problem cannot be solved by a Turing machine, it is beyond the theoretical limits of computation.

Definition: TM Configuration, Computation, and Acceptance

Let \(M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\mathrm{accept}}, q_{\mathrm{reject}})\) be a Turing machine. A configuration of \(M\) is a triple \((q, t, i)\) consisting of the current state \(q \in Q\), the current tape contents \(t \in \Gamma^{\mathbb{N}}\) (with all but finitely many cells equal to the blank symbol \(\sqcup\)), and the current head position \(i \in \mathbb{N}\) (using 1-indexed positions throughout, in keeping with the site convention \(\mathbb{N} = \{1, 2, 3, \ldots\}\)). We write configurations in the linear notation \(u\, q\, v\), meaning the tape contents are \(uv\) (followed by infinitely many blanks), the machine is in state \(q\), and the head is positioned on the first symbol of \(v\). For example, the configuration \(1001\, q_5\, 0110\) represents a tape containing \(10010110\), with \(M\) in state \(q_5\) and the head on the fifth symbol (the first \(0\) after the state label).

Configuration \(C\) yields configuration \(C'\), written \(C \vdash C'\), if \(C'\) is obtained from \(C\) by a single application of \(\delta\) to the current state and the symbol under the head: writing the new symbol, moving the head one cell left or right (clamped at the left end of the tape), and updating the state. The relation \(\vdash\) is undefined when the current state is \(q_{\mathrm{accept}}\) or \(q_{\mathrm{reject}}\): the machine halts immediately upon entering either, regardless of the value of \(\delta\) at that configuration.

The start configuration of \(M\) on input \(w \in \Sigma^*\) is \(q_0 w\) (state \(q_0\), tape contents \(w\) followed by blanks, head on the first symbol of \(w\)). An accepting configuration is any configuration whose state is \(q_{\mathrm{accept}}\); a rejecting configuration is any whose state is \(q_{\mathrm{reject}}\). \(M\) accepts input \(w\) if there exists a finite sequence of configurations \(C_1, C_2, \ldots, C_k\) with \(C_1\) the start configuration on \(w\), \(C_i \vdash C_{i+1}\) for each \(i\), and \(C_k\) accepting. The language recognized by \(M\) is \[ L(M) = \{ w \in \Sigma^* \mid M \text{ accepts } w \}. \]

\(M\) halts on \(w\) if it reaches an accepting or rejecting configuration in finitely many steps; otherwise \(M\) loops on \(w\). \(M\) rejects \(w\) if it halts on \(w\) by reaching a rejecting configuration.

Definition: Turing-Recognizable and Turing-Decidable

A language \(L\) is Turing-recognizable (or recursively enumerable) if some Turing machine recognizes it - that is, the machine accepts every string in \(L\), but may loop forever on strings not in \(L\).

A language \(L\) is Turing-decidable (or simply decidable) if some Turing machine decides it — that is, the machine halts on every input, accepting strings in \(L\) and rejecting strings not in \(L\).

Every decidable language is Turing-recognizable, but the converse does not hold - some Turing-recognizable languages are not decidable. The distinction hinges on whether the machine is guaranteed to halt.

Example:

Consider a language \[ L_1 = \{0^{2^n} \mid n \geq 0\} \] which generates all strings of \(0\)s whose length is a power of 2.

A Turing machine \(M_1\) must decide \(L_1\):

\(M_1 = \) "On input string \(w\): "

Read left to right across the tape, crossing out every other \(0\).
If in Stage 1, the tape contained a single \(0\), accept.
If in Stage 1, the tape contained more than a single \(0\) and the number of \(0\)s was odd, reject.
Return the head to the left-hand end of the tape.
Go back to Stage 1."

Note: in the state diagram below, Stage 1's "crossing out every other \(0\)" is realized by first replacing the leftmost \(0\) with a blank \(\sqcup\) (which serves as the left-end marker the head later returns to in Stage 4) and then replacing every other subsequent \(0\) with an "x". The trace that follows illustrates exactly this behavior.

\(Q = \{q_1, q_2, q_3, q_4, q_5, q_{\mathrm{accept}}, q_{\mathrm{reject}}\}, \, \Sigma = \{0\}, \, \Gamma = \{0, \text{ x }, \sqcup\}\), and the state diagram is as follows:

For example, if input is \(0000\), the sequence of configurations is as follows \[ \begin{align*} &q_1 0 0 0 0 \vdash \sqcup q_2 0 0 0 \vdash \sqcup \text{x} q_3 0 0 \vdash \sqcup \text{x} 0 q_4 0 \\\\ &\vdash \sqcup \text{x} 0 \text{x} q_3 \sqcup \vdash \sqcup \text{x} 0 q_5 \text{x} \sqcup \vdash \sqcup \text{x} q_5 0 \text{x} \sqcup \\\\ &\vdash \sqcup q_5 \text{x} 0 \text{x} \sqcup \vdash q_5 \sqcup \text{x} 0 \text{x} \sqcup \vdash \sqcup q_2 \text{x} 0 \text{x} \sqcup \\\\ &\vdash \sqcup \text{x} q_2 0 \text{x} \sqcup \vdash \sqcup \text{x} \text{x} q_3 \text{x} \sqcup \vdash \sqcup \text{x} \text{x} \text{x} q_3 \sqcup \\\\ &\vdash \sqcup \text{x} \text{x} q_5 \text{x} \sqcup \vdash \sqcup \text{x} q_5 \text{x} \text{x} \sqcup \vdash \sqcup q_5 \text{x} \text{x} \text{x} \sqcup \\\\ &\vdash q_5 \sqcup \text{x} \text{x} \text{x} \sqcup \vdash \sqcup q_2 \text{x} \text{x} \text{x} \sqcup \vdash \sqcup \text{x} q_2 \text{x} \text{x} \sqcup \\\\ &\vdash \sqcup \text{x} \text{x} q_2 \text{x} \sqcup \vdash \sqcup \text{x} \text{x} \text{x} q_2 \sqcup \vdash \sqcup \text{x} \text{x} \text{x} \sqcup q_{\mathrm{accept}}. \end{align*} \]

Multitape Turing Machines

A natural extension of the basic Turing machine model is to allow several tapes, each with its own independent read/write head. Initially the input appears on tape 1, and the remaining tapes start out blank. The transition function is generalized to allow simultaneous reading, writing, and head movement across all tapes.

Definition: Multitape Turing Machine

A multitape Turing machine with \(k\) tapes is a Turing machine whose transition function has the form \[ \delta\colon Q \times \Gamma^k \to Q \times \Gamma^k \times \{L, R, S\}^k, \] where \(k \geq 1\) is the number of tapes. The expression \[ \delta(q_i, a_1, \ldots, a_k) = (q_j, b_1, \ldots, b_k, L, R, \ldots, L) \] means that if the machine is in state \(q_i\) and the heads on tapes \(1\) through \(k\) are reading symbols \(a_1\) through \(a_k\), then the machine transitions to state \(q_j\), writes \(b_1\) through \(b_k\) on the corresponding tapes, and directs each head to move left (\(L\)), right (\(R\)), or stay in place (\(S\)) as specified.

Multitape Turing machines appear more powerful than the single-tape model — for instance, a two-tape machine can recognize the language \(\{0^k 1^k \mid k \geq 0\}\) in linear time, whereas any single-tape machine requires \(\Omega(n \log n)\) time. Yet in terms of what can be computed (recognizability and decidability), the two models are equivalent.

Theorem: Multitape–Single-Tape Equivalence

Every multitape Turing machine has an equivalent single-tape Turing machine — that is, one that recognizes the same language.

Proof sketch.

Given a \(k\)-tape Turing machine \(M\), we construct a single-tape machine \(S\) that simulates \(M\). The single tape of \(S\) represents the contents of all \(k\) tapes of \(M\) by storing them consecutively, separated by a fresh delimiter symbol \(\#\): \[ \# \, \text{tape}_1 \, \# \, \text{tape}_2 \, \# \, \cdots \, \# \, \text{tape}_k \, \#. \] To track each head's position, \(S\) marks the cell where \(M\)'s head currently sits on each tape with a "dotted" version of that symbol (a new symbol added to the tape alphabet). For instance, if \(M\)'s tape \(i\) contains \(010\) with the head on the middle \(1\), \(S\) records \(\#0\dot{1}0\#\).

On input \(w\), \(S\) first formats its tape into this representation (with \(w\) on tape 1 and the remaining tapes blank). To simulate one step of \(M\), \(S\) scans across its tape from the leftmost \(\#\) to the \((k+1)\)-st \(\#\) to read the symbols under all virtual heads, then makes a second pass to update the contents and head positions according to \(M\)'s transition function. If a virtual head moves rightward onto a delimiter \(\#\), \(S\) shifts the tape contents to the right of that delimiter one cell further right, opening up a fresh blank cell for the head to occupy. \(S\) accepts (or rejects) precisely when \(M\) does. Hence \(S\) recognizes the same language as \(M\).

This equivalence justifies our continued use of the single-tape model as the "standard" Turing machine: multitape machines may be more convenient to design, but they offer no additional computational power. However, when we turn to time complexity, the choice of model does matter — see single-tape vs. multitape simulation, which quantifies the time overhead of this simulation.

Algorithms

An algorithm is an effective procedure - a finite sequence of precise instructions that, given an input, produces an output in a finite number of steps. Although algorithms have been used throughout the history of mathematics (Euclid's algorithm for GCD dates to ~300 BCE), they were not given a precise mathematical definition until 1936, when Alonzo Church (via the \(\lambda\)-calculus) and Alan Turing (via the Turing machine) independently formalized the concept.

Thesis: Church-Turing Thesis

Every effective algorithm can be implemented as a Turing machine. Equivalently, any function computable by an algorithm (in the intuitive sense) is computable by a Turing machine.

This is not a theorem in the formal sense: it cannot be proved, because "algorithm" in the intuitive sense is not a formal concept. It is, however, universally accepted, supported by the fact that every proposed alternative model of computation — \(\lambda\)-calculus, recursive functions, register machines, cellular automata, modern programming languages — has turned out to be equivalent to the Turing machine in power.

Now we have a fundamental question: are there problems that no algorithm can solve? Remarkably, the answer is yes. Studying unsolvability is valuable for two reasons. First, recognizing that a problem is algorithmically unsolvable guides us to modify or relax it into a tractable variant. Second, understanding the absolute limits of computation provides deep insight into the structure of problems themselves.

We summarize the key facts about the decidability hierarchy. These results establish the relationships among the language classes we have studied.

Theorem: Decidability Hierarchy

Every context-free language (CFL) is decidable.
Some Turing-recognizable languages are undecidable.
A language \(L\) is decidable if and only if both \(L\) and its complement \(L^c = \Sigma^* \setminus L\) are Turing-recognizable. (A language whose complement is Turing-recognizable is called co-Turing-recognizable; thus the condition is that \(L\) is simultaneously Turing-recognizable and co-Turing-recognizable.)

Proof Sketch.

We sketch each claim in turn.

Claim 1 (every CFL is decidable). Given a CFL \(L\) generated by a context-free grammar \(G\), the membership question "is \(w \in L\)?" can be decided by the Cocke–Younger–Kasami (CYK) algorithm: assuming \(G\) has been pre-converted into Chomsky Normal Form (a static, design-time transformation, not part of the input-processing), the algorithm uses dynamic programming to build up the set of variables that can derive each substring of \(w\) and accepts iff the start variable derives \(w\) itself. The procedure halts on every input in time \(O(|w|^3)\), so the Turing machine implementing it decides \(L\).

Claim 2 (some Turing-recognizable languages are undecidable). The canonical example is the acceptance problem \[ A_{\mathrm{TM}} = \{ \langle M, w \rangle \mid M \text{ is a Turing machine that accepts } w \}. \] \(A_{\mathrm{TM}}\) is Turing-recognizable: a universal Turing machine takes \(\langle M, w \rangle\) as input, simulates \(M\) on \(w\), and accepts whenever \(M\) accepts. However \(A_{\mathrm{TM}}\) is undecidable. The proof, classically due to Turing, uses a diagonalization argument analogous to the one we develop in the next section: assuming a decider for \(A_{\mathrm{TM}}\) exists, one constructs a Turing machine that contradicts itself when run on its own description.

Claim 3 (decidable iff Turing-recognizable and co-Turing-recognizable). (\(\Rightarrow\)) If \(L\) is decidable, a decider \(M\) for \(L\) recognizes \(L\) (so \(L\) is Turing-recognizable); swapping \(q_{\mathrm{accept}}\) and \(q_{\mathrm{reject}}\) in \(M\) yields a decider, hence recognizer, for \(L^c\). (\(\Leftarrow\)) Suppose \(M_1\) recognizes \(L\) and \(M_2\) recognizes \(L^c\). Build a new Turing machine \(M\) that, on input \(w\), simulates \(M_1\) and \(M_2\) in parallel step by step (advancing each machine by one step in alternation, on a shared input). Since every \(w\) lies in exactly one of \(L\) or \(L^c\), exactly one of \(M_1, M_2\) eventually accepts; \(M\) halts and accepts (resp. rejects) when \(M_1\) (resp. \(M_2\)) accepts. Thus \(M\) decides \(L\).

Note: Modern computers are essentially finite versions of Turing machines (more precisely, random-access machines (RAM) with bounded memory). Any algorithm that runs on a real computer can, in principle, be translated into a Turing machine, although it may be inefficient.

Unsolvability

The existence of undecidable languages still leaves open the possibility that every language is at least recognizable. The following theorem shows that even this weaker condition fails - there are languages beyond the reach of any Turing machine whatsoever.

The proof below uses Cantor's diagonalization method, a technique for revealing the limits of any system by constructing an object that disagrees with every candidate the system can produce. We first develop the method in its purest set-theoretic form.

Theorem: Cantor's Theorem (Power Sets)

For any set \(X\), there is no bijection between \(X\) and its power set \(\mathcal{P}(X)\). In particular, \(\mathcal{P}(\mathbb{N})\) is uncountable.

Proof.

Suppose for contradiction that \(f: X \to \mathcal{P}(X)\) is a bijection. Define the set \[ D = \{ x \in X \mid x \notin f(x) \} \in \mathcal{P}(X). \] Since \(f\) is surjective, there exists \(d \in X\) with \(f(d) = D\). Now ask: is \(d \in D\)?

If \(d \in D\), then by definition of \(D\) we have \(d \notin f(d) = D\) — contradiction.
If \(d \notin D\), then by definition of \(D\) we have \(d \in f(d) = D\) — contradiction.

Either case is contradictory, so no such bijection \(f\) exists. Taking \(X = \mathbb{N}\) gives \(\mathcal{P}(\mathbb{N})\) uncountable, since by definition a set is countable if and only if it is in bijection with \(\mathbb{N}\) (or a subset thereof).

We now apply the same cardinality contrast to Turing machines: there are only countably many machines but, by the theorem above, uncountably many languages.

Theorem: Existence of Non-Recognizable Languages

Some languages are not Turing-recognizable.

Proof.

First, we show that the set of all Turing machines is countable. Each Turing machine \(M\) can be encoded as a finite string \(\langle M \rangle\) over some fixed alphabet \(\Sigma\) — for instance, by serializing its 7-tuple specification with appropriate delimiters, in such a way that distinct machines produce distinct strings (i.e., the encoding is injective). The set of all strings \(\Sigma^*\) over the alphabet \(\Sigma\) can be written as \[ \Sigma^* = \bigcup_{n \geq 0} \Sigma^n, \] where \(n\) is any nonnegative integer, and \(\Sigma^n\) is the set of strings of length \(n\). Since each \(\Sigma^n\) is finite and a countable union of finite sets is countable, it follows that \(\Sigma^*\) is countable. The encoding \(M \mapsto \langle M \rangle\) is injective, so the set of all Turing machines is in bijection with a subset of \(\Sigma^*\), and hence is also countable.

Next, we show that the set of all languages over \(\Sigma\) is uncountable. A language over \(\Sigma\) is, by definition, any subset of \(\Sigma^*\), so the set of all languages is exactly \(\mathcal{P}(\Sigma^*)\). Since \(\Sigma^*\) is countably infinite, it is in bijection with \(\mathbb{N}\); transporting Cantor's theorem (above) along this bijection gives that \(\mathcal{P}(\Sigma^*)\) is uncountable.

Therefore, since the set of all Turing machines is countable and the set of all languages is uncountable, there exist languages that cannot be recognized by any Turing machine.

This is one of the most profound results in mathematics: there exist well-defined problems that no algorithm can ever solve. However, many practically important problems that are undecidable or intractable in the worst case can still be addressed through approximation algorithms and heuristics, which find good (if not provably optimal) solutions in reasonable time.

Having established what is and is not computable, we next ask: among the decidable problems, how efficiently can they be solved? This is the subject of time complexity.

Loading...

Turing Machines (TMs)

Multitape Turing Machines

Algorithms

Unsolvability