HRMc optimizations: jump compression

tl;dr: this post explains a small optimization, that I call "jump compression", that is implemented in the HRM compiler.

the issue

When I started using the HRMc compiler, I noticed that there were several useless jumps and labels all over the place, sometimes leading to huge chains of jumps that could just be removed in the generated code.

Let’s start with this simple program that puts in the output tape every non-zero item from the input tape.

# example.hrm
start:
# read from inbox
emp = inbox
# put in the output queue if emp is non-zero
if nz then
    outbox
endif
# start again
jmp start

The program is then compiled (without optimizations!) to this “assembly” snippet. As you can see, the compiler here generates some ad-hoc labels to implement the if operation with jumps.

The little ASCII diagram of the jumps lets us understand two things about the flow of the program

  1. the jumps both reach the same instruction (the last jump), because it’s tagged with two labels (_hrm_1, _hrm_endif_1)
  2. the last jump goes back to the first instruction, tagged start
# hrmc example.hrm --no-unreachable --no-jump-compression
start:        inbox      <----(2)------|
              jez _hrm_1 ----------|   |
              outbox               |   |
              jmp _hrm_endif_1 --| |   |
_hrm_1,                          | |   |
_hrm_endif_1: jmp start  <--(1)--|-|   |
                     |-----------------|

The problem with this code here is that the game CPU would spend nearly a third of its time following jumps. Running it with hrm-interpreter, with random inputs we’d notice that jumps are more than half of all instructions executed from the CPU. Isn’t it a waste of time? What if we could avoid the “jump chain” and go directly to the right instruction?

the algorithm

Let’s define a “jump chain”: if a jump A, either conditional or unconditional, refers to a label that tags another jump instruction B, then we have a jump chain. An example of a jump chain is (1) in the previous example: it’s safe to replace jmp _hrm_endif_1 with jmp start, as they’re clearly equivalent in that context.

The idea is then to shrink/compress the jump chain: every time we find a jump, either conditional or unconditional, we find the first non-jump operation directly reachable from that node in the AST, then take the label associated with that instruction and replace the original jump’s label with it.

Let’s then look at a commented version of the algorithm, as implemented in conversion.py of hrm-compiler

# compress_jumps: Instruction[] -> Instruction[]
def compress_jumps(ast):
    # labels_in_ast : Instruction[] -> Map<label: string, position: number>
    # labels_in_ast maps every label to the position of the instruction tagged by the label
    labels_positions = labels_in_ast(ast)

    compressed_ast = []

    for index, ast_item in enumerate(ast):
        if type(ast_item) in [p.JumpOp, p.JumpCondOp]:
            jump_going_nowhere = False
            try:
                # get the first position executable by `_label`
                _label = ast_item.label_name
                next_pos = labels_positions[_label]
            except KeyError:
                # `_label` does not tag any instruction.
                # save the instruction: removing jumps, either conditional or
                # unconditional, modified the semantics of the source program
                compressed_ast.append(ast_item)
                # go to the next instruction
                continue

            # let's visit the jump chain:
            # a jump chain stops when
            # (1) the current instruction is not an unconditional jump,
            # (2) the instruction was already visited (countermeasure to loops)
            # (3) the jump instruction refers to a label that does not tag any instruction.
            # the algorithm updates `_label` until the jump chain stops.
            visited = [False for i in ast]
            while type(ast[next_pos]) == p.JumpOp and \
                not visited[next_pos] and \
                not jump_going_nowhere:
                visited[next_pos] = True
                _label = ast[next_pos].label_name
                try:
                    next_pos = labels_positions[_label]
                except KeyError:
                    jump_going_nowhere = True

            # no matter why the previous loop stopped,
            # _label is guaranteed to contain the label that is located
            # nearest to the first non-`jmp` instruction.
            # let's then create add the new jump instruction to compressed_ast
            if type(ast_item) == p.JumpOp:
                compressed_ast.append(p.JumpOp(_label))
            else:
                compressed_ast.append(p.JumpCondOp(_label, ast_item.condition))
        else:
            compressed_ast.append(ast_item)

    return compressed_ast

As hinted by the previous ASCII diagram, the algorithm exploits the graph-like features of the instruction array (named AST here, but it's more similar to a compiler-specific Intermediate Representation, or IR): the algorithm does a depth-first visit of the graph defined by the IR: the visit stops when one of three conditions become true:

  1. the currently visited instruction is not a conditional jump
  2. the instruction was already visited
  3. it is not possible to visit the next instruction

When the visit stops, _label is the name of the label that is nearest to the non-jmp instruction that the CPU is going to execute at the end of the jump chain.

There are two possible cases:

  1. we reached the non-jmp instruction: _label is the name of the label tagging that instruction
  2. we couldn't reach the non-jmp instruction: the algorithm stopped at the jmp instruction, and we can get the name of the label from it

the result

If we enable this optimization in the compiler, it will generate the following code:

# hrmc example.hrm --no-unreachable # jump-compression is enabled
start:        inbox     <------|
              jez start  ---->-|
              outbox           |
              jmp start  ---->-|
_hrm_1,                        |
_hrm_endif_1: jmp start  ---->-|

Now we see that every jump instruction now refers to the right label: it means we are not going to waste time jumping around!

Conclusions

In this article, I tried to explain a simple optimization I've implemented in HRMc, a simple compiler for the Human Resource Machine game's assembly language. Thank you for spending your time reading these notes.

Did you enjoy this explanation? Does this optimization have a real/academic name? Feel free to tell me on Twitter ! If you liked this article, please consider offering me a Ko-fi !