tl;dr: this post explains a small optimization, that I call "jump compression", that is implemented in the HRM compiler.
When I started using the HRMc compiler, I noticed that there were several useless jumps and labels all over the place, sometimes leading to huge chains of jumps that could just be removed in the generated code.
Let’s start with this simple program that puts in the output tape every non-zero item from the input tape.
# example.hrm start: # read from inbox emp = inbox # put in the output queue if emp is non-zero if nz then outbox endif # start again jmp start
The program is then compiled (without optimizations!) to this “assembly” snippet. As you can see, the compiler here generates some ad-hoc labels to implement the if operation with jumps.
The little ASCII diagram of the jumps lets us understand two things about the flow of the program
- the jumps both reach the same instruction (the last jump), because it’s tagged with two labels (_hrm_1, _hrm_endif_1)
- the last jump goes back to the first instruction, tagged start
# hrmc example.hrm --no-unreachable --no-jump-compression start: inbox <----(2)------| jez _hrm_1 ----------| | outbox | | jmp _hrm_endif_1 --| | | _hrm_1, | | | _hrm_endif_1: jmp start <--(1)--|-| | |-----------------|
The problem with this code here is that the game CPU would spend nearly a third of its time following jumps. Running it with hrm-interpreter, with random inputs we’d notice that jumps are more than half of all instructions executed from the CPU. Isn’t it a waste of time? What if we could avoid the “jump chain” and go directly to the right instruction?
Let’s define a “jump chain”: if a jump A, either conditional or unconditional, refers to a label that tags another jump instruction B, then we have a jump chain. An example of a jump chain is (1) in the previous example: it’s safe to replace jmp _hrm_endif_1 with jmp start, as they’re clearly equivalent in that context.
The idea is then to shrink/compress the jump chain: every time we find a jump, either conditional or unconditional, we find the first non-jump operation directly reachable from that node in the AST, then take the label associated with that instruction and replace the original jump’s label with it.
Let’s then look at a commented version of the algorithm, as implemented in conversion.py of hrm-compiler
# compress_jumps: Instruction -> Instruction def compress_jumps(ast): # labels_in_ast : Instruction -> Map<label: string, position: number> # labels_in_ast maps every label to the position of the instruction tagged by the label labels_positions = labels_in_ast(ast) compressed_ast =  for index, ast_item in enumerate(ast): if type(ast_item) in [p.JumpOp, p.JumpCondOp]: jump_going_nowhere = False try: # get the first position executable by `_label` _label = ast_item.label_name next_pos = labels_positions[_label] except KeyError: # `_label` does not tag any instruction. # save the instruction: removing jumps, either conditional or # unconditional, modified the semantics of the source program compressed_ast.append(ast_item) # go to the next instruction continue # let's visit the jump chain: # a jump chain stops when # (1) the current instruction is not an unconditional jump, # (2) the instruction was already visited (countermeasure to loops) # (3) the jump instruction refers to a label that does not tag any instruction. # the algorithm updates `_label` until the jump chain stops. visited = [False for i in ast] while type(ast[next_pos]) == p.JumpOp and \ not visited[next_pos] and \ not jump_going_nowhere: visited[next_pos] = True _label = ast[next_pos].label_name try: next_pos = labels_positions[_label] except KeyError: jump_going_nowhere = True # no matter why the previous loop stopped, # _label is guaranteed to contain the label that is located # nearest to the first non-`jmp` instruction. # let's then create add the new jump instruction to compressed_ast if type(ast_item) == p.JumpOp: compressed_ast.append(p.JumpOp(_label)) else: compressed_ast.append(p.JumpCondOp(_label, ast_item.condition)) else: compressed_ast.append(ast_item) return compressed_ast
As hinted by the previous ASCII diagram, the algorithm exploits the graph-like features of the instruction array (named AST here, but it's more similar to a compiler-specific Intermediate Representation, or IR): the algorithm does a depth-first visit of the graph defined by the IR: the visit stops when one of three conditions become true:
- the currently visited instruction is not a conditional jump
- the instruction was already visited
- it is not possible to visit the next instruction
When the visit stops, _label is the name of the label that is nearest to the non-jmp instruction that the CPU is going to execute at the end of the jump chain.
There are two possible cases:
- we reached the non-jmp instruction: _label is the name of the label tagging that instruction
- we couldn't reach the non-jmp instruction: the algorithm stopped at the jmp instruction, and we can get the name of the label from it
If we enable this optimization in the compiler, it will generate the following code:
# hrmc example.hrm --no-unreachable # jump-compression is enabled start: inbox <------| jez start ---->-| outbox | jmp start ---->-| _hrm_1, | _hrm_endif_1: jmp start ---->-|
Now we see that every jump instruction now refers to the right label: it means we are not going to waste time jumping around!
In this article, I tried to explain a simple optimization I've implemented in HRMc, a simple compiler for the Human Resource Machine game's assembly language. Thank you for spending your time reading these notes.