Clockhands For Faster CPU Execution

When you design your first homebrew CPU, you probably are happy if it works and you don’t worry as much about performance. But, eventually, you’ll start trying to think about how to make things run faster. For a single CPU, the standard strategy is to execute multiple instructions at the same time. This is feasible because you can do different parts of the instructions at the same time. But like most solutions, this one comes with a new set of problems. Japanese researchers are proposing a novel way to work around some of those problems in a recent paper about a technique they call Clockhands.

Suppose you have a set of instructions like this:

LOAD A, 10
LOAD B, 20
SUB A,B
LOAD B, 30
JMPZ  DONE
INC B

If you do these one at a time, you have no problem. But if you try to execute them all together, there are a variety of problems. First, the subtract has to wait for A and B to have the proper values in them. Also, the INC B may or may not execute, and unless we know the values of A and B ahead of time (which, of course, we do here), we can’t tell until run time. But the biggest problem is the subtract has to use B before B contains 30, and the increment has to use it afterward. If everything is running together, it can be hard to keep straight.

The normal way to do this is register renaming. Instead of using A and B as registers, the CPU uses physical registers that it can call A or B (or something else) as it sees fit. So, for example, the subtraction won’t really be SUB A,B but — internally — something like SUB R004,R009. The LOAD instruction for 30 writes to B, but it doesn’t really. It actually assigns a currently unused register to B and loads 30 into that (e.g., LOAD R001,30). Now the SUB instruction will still use 20 (in R009) when it gets around to executing.

This is a bit of an oversimplification, but the point is there’s plenty of circuitry in a modern CPU thinking about which registers are in use and which one corresponds to a logical register for this particular instruction. One proposed way to do this is to stop referring to registers directly and, instead, refer to them by how far away they are in the code (e.g., SUB A-2, B-1). This can be easier for the hardware, but more difficult for the compiler.

Where Clockhands is different is it refers to the number of writes to the register, not the number of instructions. It is somewhat like using a stack for each register and allowing the instructions to refer to a specific value on the stack. The hardware becomes easier, there is less for the compiler to do. This could potentially reduce power consumption as well.

Confused? Read the paper if you want to know more. Some background from Wikipedia might help, too. It reminded us of a CPU architecture from way back called The Mill (dead link inside, but there’s always a copy). If you didn’t know your CPU registers aren’t what you think they are, it is even worse than you think.

This post was originally published on this site