I really like programming languages as tools to express thoughts and intention. It also means I'd love to write a compiler, but didn't want to create it just for the sake of it.
The need for a compiler arose when a friend of mine suggested to try Human Resource Machine, a simple game when you have to program a... human employee to get things from an inbox tape, do something useful with them and put the result on an outbox tape. Human Resource Machine (HRM) is fun, but I don't like the fact you have to use a visual tool to write some sort of simplified assembly language. I don't really like assembly, and I feel lost when the execution flow starts to jump here and there. I needed something to let me write the algorithm I have in mind and then convert it to code. "So" - I thought - "why don't I create a stupid simple language and compile to this stupid version of assembly?"... and I started creating this little compiler. The result -still a work in progress!- is a simple "compiler" that I used to solve some of the problems/exercises in the game (checkout the examples folder!).
How does it work?
The source language is a superset of the target language, so the translation is easy. The translation from the source to the target language is done in multiple passes, and each pass removes a part of the superset language or rewrites part of the AST to make the next pass' work simpler. At the end of the process, the AST that the "code generation" routines work on is only made of target language's instructions and statements, making the "code generation" phase very simple and straightforward.`
Let me explain what I mean by looking at this simple example: put in the OUTBOX only the items in the INBOX that are not ZERO, and keep a counter of non-zero items.
# let's define an alias for a memory cell, to make it easier # to understand what the code does. The cell is already # set to zero by the game. alias 1 counter pickone: # retrieve an item from the INBOX emp = inbox # if `emp` is not ZERO, then put the value stored # inside `emp` into the OUTBOX if nz then incr counter outbox endif jmp pickone
The compiler would parse it, generate the IR and perform two conversions. The first one, in our case, changes all the if nz (if nonzero value in emp) to if ez (if emp contains a zero value), to normalize the IR for the second pass.
alias 1 counter pickone: emp = inbox # transformation here! if ez then # nothing to do here! else incr counter outbox endif jmp pickone
The second conversion, well, converts the if to target-language structures: conditional and unconditional jumps. The previous pass makes the work of this one easier, making the code simpler, as it has to handle just this case.
alias 1 counter pickone: emp = inbox # if -> jump here! simplify code generation! jez _hrm_if_1 incr counter outbox jmp _hrm_endif_1 _hrm_if_1: _hrm_endif_1: # the `if` is fully transformed, and ends here. jmp pickone
After that, the code generation phase converts our IR to the target language. Again, the code generation' implementation is simpler because we translated higher level IR into equivalent IR that is nearer to the target language and thus easier to translate.
As it is a tool to help me think about the problem I'm trying to solve, this language is far from being a real, complete language with all the usual things you may be used to find in other languages. If you look at the code in the repository, you may discover I didn't implement a while construct: it's not a problem, you can build it with if and jmp, or conditional jumps. Instead, I preferred to create some helpers to let me think about a problem without worrying about "oh I don't have jnz, just jez, so I have to write this thing differently", I can just throw a if nz and let the compiler handle that.
Things to work on
The compiler can generate better code, to fix some problems due to the new instructions and operations added to the base language. After this post, I wrote a couple of optimizations, such as jump compression (to remove useless unconditional jumps) and unreachable code removal (to remove instructions that will be never executed and make the generated code smaller), but I'm sure there are other optimizations I can implement.
Another thing to consider is that the "emulated processor" the gamer is programming in Human Resource Machine is very limited: it only has one register, and its instruction set is very small. Those limitations make it difficult to create a richer language, but I think it'll be fun to overcome these challenges.