All posts in this series.
A PDF version of this post will be published with a delay, but if you’d like to have it soon let me know.
Contents
1.1 Information bottleneck: Palindromes requires quadratic time on TMs
1.2 Counting: impossibility results for non-explicit functions
1.3 Diagonalization: Enumerating machines
1.3.1
1.4 Circuits
1.4.1 The circuit won’t fit in the universe: Non-asymptotic, cosmological results
1.5 Problems
As mentioned in Chapter ??, our ability to prove impossibility results related to efficient computation appears very limited. We can now express this situation more precisely with the models we’ve introduced since then.
- in Time
on a TM, and
- in Time
on a 2-TM, and
- by circuits of size
.
Note that 2. implies 1. by Theorem ??.
In this chapter we begin to present several impossibility results, covering a variety of techniques which will be used later as well. As hinted above, they appear somewhat weak. However, jumping ahead, there is a flip side to all of this:
- At times, contrary to our intuition, stronger impossibility results are actually false. One example appears in Chapter ??. A list will be given later.
- Many times, the impossibility results that we can prove turn out to be, surprisingly, just “short” of proving major results. Here by “major result” I mean a result that would be phenomenal and that was in focus long before the connection was established. We will see several examples of this (section º??, section º??).
- Yet other times, one can identify broad classes of proof techniques, and argue that impossibility results can’t be proved with them (section º??).
Given this situation, I don’t subscribe to the general belief that stronger impossibility results are true and we just can’t prove them.
1.1 Information bottleneck: Palindromes requires quadratic time on TMs
Intuitively, the weakness of TMs is the bottleneck of passing information from one end of the tape to the other. We now show how to formalize this and use it show that deciding if a string is a palindrome requires quadratic time on TMs, which is tight and likely matches the time in Exercise ??. The same bound can be shown for other functions; palindromes just happen to be convenient to obtain matching bounds.
More precisely, for every and
, an
-state TM that decides if an
-bit input is a palindrome requires time
.
The main concept that allows us to formalize the information bottleneck mentioned above is the following.
Definition 1.1. A crossing sequence of a TM M on input and boundary
, abbreviated
-CS, is the sequence of states that
is transitioning to when crossing cell boundary
(i.e., going from Cell
to
or vice versa) during the computation on
.
The idea in the proof is very interesting. If accepts inputs
and
and those two inputs have the same
-CS for some
, then we can “stitch together” the computation of
on
and
at boundary
to create a new input
that is still accepted by
. The input
is formed by picking bits from
to the left of cell boundary
and bits from
to the right of
:
The proof that is still accepted is left as an exercise.
Now, for many problems, input should not be accepted by
, and this gives a contradiction. In particular this will be be the case for palindromes. We are going to find two palindromes
and
that have the same
-CS for some
, but the corresponding
is not a palindrome, yet it is still accepted by
. We can find these two palindromes if
takes too little time. The basic idea is that if
runs in time
, because
-CSs for different
correspond to different steps of the computation, for every input there is a value of
such that the
-CS is short, namely has length at most
. If
is much less than
, the length of this CS is much less than
, from which we can conclude that the number of CSs is much less than the number of inputs, and so we can find two inputs with the same CS.
Proof. Let be divisible by four, without loss of generality, and consider palindromes of the form
where and
is the reverse of
.
Assume there are in
and
in the middle part, i.e.,
, so that the
-CS of
on
and
is the same. Then we can define
which is not a palindrome but is still accepted by
, concluding the proof.
There remains to prove that the assumption of Theorem 1.1 implies the assumption in the previous paragraph. Suppose runs in time
. Since crossing sequences at different boundaries correspond to different steps of the computation, for every
there is a value of
in the middle part such that the
-CS of
on
has length
. This implies that there is an
in the middle s.t. there are
inputs
for which the
-CS of
on
has length
.
For fixed , the number of
-CS of length
is
.
Hence there are for which
and
have the same
-CS whenever
. Taking logs one gets
. QED
Exercise 1.1. For every and
describe an
-state TM deciding palindromes in time
(matching Theorem 1.1).
One may be tempted to think that it is not hard to prove stronger bounds for similar functions. In fact as mentioned above this has resisted all attempts!
1.2 Counting: impossibility results for non-explicit functions
Proving the existence of hard functions is simple: Just count. If there are more functions than efficient machines, some function is not efficiently computable. This is applicable to any model; next we state it for TMs for concreteness. Later we will state it for circuits.
Theorem 1.2. There exists a function that cannot be computed by a TM with
states unless
, regardless of time.
Proof. The number of TMs with states is
and each TM computes at most one function (it may compute none, if it does not stop). The number of functions on
bits is
. Hence if
some function cannot be computed. QED
Note this bound is not far from that in Exercise ??.
It is instructive to present this basic result as an application of the probabilistic method:
Proof. Let us pick uniformly at random. We want to show that
Indeed, if the probability is less than than some function exists that cannot be computed. By a union bound we can say that this probability is
where the sum is over all -state machines. Each probability in the sum is
, since
is fixed. The number of
-state machines is
. So the sum is
, and we can conclude as before taking logs. QED
1.3 Diagonalization: Enumerating machines
Can you compute more if you have more time? For example, can you write a program that runs in time and computes something that cannot be computed in time
? The answer is yes for trivial reasons if we allow for non-boolean functions.
The answer is more interesting if the functions are boolean. Such results are known as time hierarchies, and a generic technique for proving them is diagonalization, applicable to any model.
We first illustrate the result in the simpler case of partial functions, which contains the main ideas. Later we discuss total functions.
In other words, .
Proof. Consider the TM that on input
of length
runs
on
until it stops and then complements the answer. (We can use a simple encoding of these pairs, for example every even-position bit of the description of
is a
.)
Now define to be the subset of pairs s.t.
runs in time
on inputs of length
, and
.
On these inputs, runs in time
, as desired. To accomplish this,
can begin by making a copy of
in time
. Then every step of the computation of
can be simulated by
with
steps, always keeping the description of
to the left of the head.
Now suppose computes the same function as
in time
. Note that
falls in the domain
of the function, for
sufficiently large, using that
. Now consider running
on
. We have
by supposition, but
is the complement of
, contradiction. QED
This proof is somewhat unsatisfactory; in particular we have no control on the running time of on inputs not in
. It is desirable to have a version of this fundamental result for total functions. Such a version is stated next. It requires additional assumptions on
and a larger gap between the running times. Perhaps surprisingly, as we shall discuss, both requirements are essential.
Theorem 1.4. Let be a function. Suppose that
is in
.
There is a total function in that cannot be computed by any TM
in time
.
The assumption about is satisfied by all standard functions, including all those in this book. (For example, take
. The corresponding
is then
. To compute
on input
of
bits we can first compute
in time
(Exercise ??). This is a number of
bits. We can then square this number in time
. Note that the time to compute
is dominated by the
term coming from computing
, which does not depend on
and is much less than the
in the assumption.) The assumption cannot be removed altogether because there exist pathological functions
for which the result is false.
The proof is similar to that of Theorem 1.3. However, to make the function total we need to deal with arbitrary machines, which may not run in the desired time or even stop at all. The solution is to clock in a manner similar to the proof of the universal machine, Lemma ??.
Also, we define a slightly different language to give a stronger result – a unary language – and to avoid some minor technical details (the possibility that the computation of erases the input).
We define a TM that on input
obtains a description of a TM
, computes
, and then simulates
on input
for
steps in a way similar to Lemma ??, and if
stops then
outputs the complement of the output of
; and if
does not stop then
stops and outputs anything. Now
computes a function in time about
. We argue that this function cannot be computed in much less time as follows. Suppose some TM
computes the function in time somewhat less than
. Then we can pick an
for which
obtains the description of this
, and the simulation always stops. Hence, for that
we would obtain
, which is a contradiction.
However, there are interesting differences with the simulation in Lemma ??. In that lemma the universal machine was clocking the steps of the machine
being simulated. Now instead we need to clock the steps of
itself, even while
is parsing the description of
to compute its transition function. This is necessary to guarantee that
does not waste time on big TM descriptions.
Whereas in Lemma ?? the tape was arranged as
it will now be arranged as
which is parsed as follows. The description of is
,
is in state
, the tape of
contains
and the head is on the left-most symbol of
. The integer
is the counter decreased at every step
Proof. Define TM that on input
:
- Compute
.
- Compute
. Here
is obtained from
by removing all left-most zeroes until the first
. I.e., if
then
. This is similar to the fact that a program does not change if you add, say, empty lines at the bottom.
- Simulate
on
, reducing the counter
at every step, including those parsing
, as explained before.
- If
stops before the counter reaches
, output the complement of its output. If the counter reaches
stop and output anything.
Running time of .
- Computing
is similar to Exercise ??. By assumption
is computable in time
. Our definition of computation allows for erasing the input, but we can keep
around spending at most another
factor. Thus we can compute
in time
. Finally, in case it was erased, we can re-compute
in time
by keeping a counter (initialized to a copy of
).
- This takes time
: simply scan the input and remove zeroes.
- Decreasing the counter takes
steps. Hence this simulation will take
time.
Overall the running time is .
Proof that the function computed by requires much time. Suppose some TM
computes the same function as
. Consider inputs
where
. Parsing the description of
to compute its transition function takes time
, a value that depends on
only and not on
. Hence
will simulate
steps of
. If
stops within that time (which requires
to be larger than
, and so
and
sufficiently large compared to
) then the simulation terminates and we reach a contradiction as explained before. QED
The extra factor cannot be reduced because of the surprising result presented in Theorem 1.5 showing that, on TMs, time
equals time
for computing total functions.
However, tighter time hierarchies hold for more powerful models, like RAMs. Also, a time hierarchy for total functions for is… an open problem! But a hierarchy is known for partial functions.
Exercise 1.4. (1) State and prove a tighter time hierarchy for Time (which recall corresponds to RAMs) for total functions. You don’t need to address simulation details, but you need to explain why a sharper separation is possible.
(2) Explain the difficulty in extending (1) to .
(3) State and prove a time hierarchy for for partial functions.
1.3.1 
In this subsection we prove the result in the title, which we also mentioned earlier. First let us state the result formally.
Note that time is barely enough to scan the input. Indeed, the corresponding machines in Theorem 1.5 will only move the head in one direction.
The rest of this section is devoted to proving the above theorem. Let be a machine for
witnessing the assumption of the theorem. We can assume that
stops on every input (even though our definition of time only applies to large enough inputs), possibly by adding
to the time, which does not change the assumption on
. The theorem now follows from the combination of the next two lemmas.
Proof. For an integer let
be a shortest input such that there exists
for which the
-CS in the computation of
on
has length
. Let
.
There are tape boundaries within or bordering
. If we pick a boundary uniformly at random, the average length of a CS on
is
. Hence there are
choices for
s.t. the
-CS on
has length
.
The number of such crossing sequences is
Hence, the same crossing sequence occurs at positions
, using that
is large enough.
Of these four, one could be the CS of length from the definition of
. Of the other three, two are on the same side of
. We can remove the corresponding interval of the input without removing the CS of length
. Hence we obtained a shorter input with a CS of length
. Contradiction. QED
Lemma 1.2. Suppose is computable by a TM such that on every input
, every
-CS with
has length
. Then
is computable in time
by a TM with
states that only moves the head in one direction.
1.4 Circuits
The situation for circuits is similar to that of 2-TMs, and by Theorem ?? we know that proving bounds on circuits is harder than for 2-TMs. Even a bound of
is unknown. The following is a circuit version of Theorem 1.2. Again, the bound is close to what we saw in Theorem ??.
Theorem 1.6. [3] There are functions that require circuits of size
, for every
.
One can prove a hierarchy for circuit size, by combining Theorem 1.6 and Theorem ??. As stated, the results give that circuits of size compute more functions than those of size
. In fact, the “
” in the theorems is small, so one can prove a sharper result. But a stronger and more enjoyable argument exists.
Theorem 1.7. [Hierarchy for circuit size] For every and
there is a function
that can be computed by circuits of size
but not by circuits of size
.
Proof. Consider a path from the all-zero function to a function which requires circuits of size , guaranteed to exist by Theorem 1.6, changing the output of the function on one input at each step of the path. Let
be the first function that requires size
, and let
be the function right before that in the path. Note that
has circuits of size
, and
can be computed from
by changing the value on a single input. The latter can be implemented by circuits of size
. QED
Exercise 1.7. Prove a stronger hierarchy result for alternating circuits, where the “” in Theorem 1.7 is replaced with “
.”
In fact, this improvement is possible even for (non aternating) circuits, see Problem 1.2.
1.4.1 The circuit won’t fit in the universe: Non-asymptotic, cosmological results
Most of the results in this book are asymptotic, i.e., we do not explicitly work out the constants because they become irrelevant for larger and larger input lengths. As stated, these results don’t say anything for inputs of a fixed length. For example, any function on bits is in Time
.
However, it is important to note that all the proofs are constructive and one can work out the constants, and produce non-asymptotic results. We state next one representative example when this was done. It is about a problem in logic, an area which often produces very hard problems.
On an alphabet of size , the language used to write formulas includes first-order variables that range over
, second-order variables that range over finite subsets of
, the predicates “
” and “
” where
and
denote first-order variables and
denotes a set variable, and standard quantifiers, connectives, constants, binary relation symbols on integers, and set equality. For example one can write things like “every finite set has a maximum:”
Theorem 1.8. [4] To decide the truth of logical formulas of length at most in this language requires a circuit containing at least
gates. So even if each gate were the size of a proton, the circuit would not fit in the known universe.
Their result applies even to randomized circuits with error , if
is replaced with
. (We can define randomized circuits analogously to BPTime.)
1.5 Problems
Problem 1.1. Hierarchy Theorem 1.4 only gives a function that cannot be computed fast on all large enough input lengths: it is consistent with the theorem that
can be computed fast on infinitely many input lengths.
Give a function mapping
to
that is computable in time
but such that for any TM
running in time
the following holds. For every
and every
such that
.
Hint: Note the range of . Can this result hold as stated with range
?
Problem 1.2. Replace “” in Theorem 1.7 with “
.”
This shows Theorem 1.5 is tight, and gives an explicit problem witnessing the time-hierarchy separation in Theorem 1.4, for a specific time bound. For the negative result, don’t use pumping lemmas or other characterization results not covered in this book.
Problem 1.4. The following argument contradicts Theorem 1.4; what is wrong with it?
“By Theorem 1.5, . By padding (Theorem 1.5),
. Hence
”
References
[1] F. C. Hennie. One-tape, off-line turing machine computations. Information and Control, 8(6):553–578, 1965.
[2] Kojiro Kobayashi. On the structure of one-tape nondeterministic turing machine time hierarchy. Theor. Comput. Sci., 40:175–193, 1985.
[3] Claude E. Shannon. The synthesis of two-terminal switching circuits. Bell System Tech. J., 28:59–98, 1949.
[4] Larry Stockmeyer and Albert R. Meyer. Cosmological lower bound on the circuit complexity of a small problem in logic. J. ACM, 49(6):753–784, 2002.