Intermediate Form

Reduction of Problems

Previous Entry | Home | Next Entry

A useful technique for proving the hardness of problems is reduction. This technique consists of showing that, if a hard problem can be easily mapped onto a problem of unknown hardness, the unknown problem must be at least as hard to solve as the problem of known hardness.


Unfortunately, actually applying this technique is an exercise fraught with peril. In a post dated yesterday, super-blogger Steven Den Beste made a mistake in his use of a reduction. In this post, I'll try to show how to properly think about and apply reduction, and then critique some of the flaws in his post.

Reduction occurs between a known hard problem (we'll call this K) and a problem of unknown hardness (U). What we define as hard here varies with what we're trying to prove. For a proof of NP-completeness, we consider a problem had if we know that it is NP-complete. For a proof of uncomputability (that is, a problem cannot be solved at all), we consider a problem hard if it is not computable.

A problem is considered easy if it is not hard.

A proof by reduction consists of showing that, if we can solve U, it is easy to solve K. That is, we need to give an easy way of mapping the known hard problem onto the problem of unknown hardness.

We can then reason that, if we could easily solve U, then we could easily solve K. But we know that K is hard to solve, so U must be at least as hard to solve as K.

This does assume that we've proven, or at least accepted, that K is actually hard to solve. This has not actually been done for NP-completeness, which means that all such proofs may be faulty, or at least meaningless. But for computability, the halting problem for arbitrary Turing machines has been shown to be uncomputable, so any problem that it can be reduced to must also be uncomputable.

In short, to reason by reduction one must show that if one could easily solve an unknown problem, one could also easily solve a problem that is known hard.

In his post on analyzing the genome, Den Beste writes:

So what this means is that you have to simulate the activation state machine in order to determine whether a given regulator gene gets activated, thus activating the end product gene you're interested in.

And that means you run right smack into Turing's Stopping Problem, if you're trying to do it with computers. It can be shown that the x86 I'm using now can be simulated by a Turing machine (though it would be painful to actually try doing such a thing, and no one ever actually will do it). So we need to construct a hierarchy of simulators, each containing the previous one.

First, we assume the existence of a program for the x86 which can process the genome and simulate the regulator gene state machine and can determine if a suspected gene is ever activated. Second, we assume that there exists a program for the Turing Machine which simulates the x86.

We then modify that program so that it knows where the code exists in that x86 program which decides that the gene is indeed active, and we make it so that the Turing Machine program halts in that case.

Now it's just a matter of answering this question: will that Turing Machine program ever halt? Sad to say, there's no general way for a Turing Machine to solve that problem.

Here, he reduces the problem of processing the genome into the problem of determining if a Turing machine will halt. (This is done through the easy steps of writing a program for the x86 that will halt if the gene is activated, and then simulating the x86 on a Turing machine.)

Unfortunately, this is the reduction of a problem of unknown hardness into a problem of known hardness. U is the genome processing problem, and K is the halting problem. He's shown that a problem can be made harder, which is true, but not useful in determining the actual hardness of a problem.

A correct argument would be showing that, if one could determine the genes that are active, one could determine if a universal Turing machine could halt. Doing so would mean that the gene activation mechanism is more powerful than a Turing machine.

The halting problem only speaks of the impossibility of proving that an arbitrary Turing machine will halt. It's quite possible to analyze individual Turing machine programs to show that they will halt.

Indeed, it's possible to look at any x86 program running on any x86 computer and determine if it will halt. The defining characteristic of a Turing machine is that it has an infinite amount of storage. By contrast, any x86 computer has a finite amount of storage, including the processor state, cache, memory, and disk. We can consider the contents of all this storage as the state of this system. It's quite a bit of data, but a Turing machine can store an infinite amount.

We can think of the execution of an x86 program as a series of transitions between these states. To determine if an x86 program will halt, we can simply store each of these states in the storage of the Turing machine, and check to see if the same state has been reached twice. If it has, the program will never halt. Otherwise, we continue stepping through the program until it either halts or repeats a state.

There are a finite (if very large) number of states any given x86 computer can be in, so it will be possible, in finite time, to determine if such a program halts. For the purposes of determining computability, any running time is fine. (Of course, such a method is almost certainly not practical.)

I'm not sure if it's possible, in principle, to simulate the way genes are activate. I suspect that it may be possible, if the universe is finite and discrete, but it may be a computationally intractable problem. That one can reduce the gene activation problem to the halting problem, however, says nothing about the hardness of the gene activation problem.

(For more on this, check out the Wikipedia entry for the Halting Problem.)

- Tom | permalink | changelog | Last updated: 2003-09-20 00:36

Previous Entry | Home | Next Entry


Commenting has been suspended due to spam.