HRM compiler (or: Winter wants to build a compiler)

HRM Compiler on github

I really like programming languages as tools to express thoughts and intention. It also means I'd love to write a compiler, but didn't want to create it just for the sake of it.

The need for a compiler arose when a friend of mine suggested to try Human Resource Machine, a simple game when you have to program a... human employee to get things from an inbox tape, do something useful with them and put the result on an outbox tape. Human Resource Machine (HRM) is fun, but I don't like the fact you have to use a visual tool to write some sort of simplified assembly language. I don't really like assembly, and I feel lost when the execution flow starts to jump here and there. I needed something to let me write the algorithm I have in mind and then convert it to code. "So" - I thought - "why don't I create a stupid simple language and compile to this stupid version of assembly?"... and I started creating this little compiler. The result -still a work in progress!- is a simple "compiler" that I used to solve some of the problems/exercises in the game (checkout the examples folder!).

How does it work?

The source language is a superset of the target language, so the translation is easy. The translation from the source to the target language is done in multiple passes, and each pass removes a part of the superset language or rewrites part of the AST to make the next pass' work simpler. At the end of the process, the AST that the "code generation" routines work on is only made of target language's instructions and statements, making the "code generation" phase very simple and straightforward.`

Let me explain what I mean by looking at this simple example: put in the OUTBOX only the items in the INBOX that are not ZERO, and keep a counter of non-zero items.

# let's define an alias for a memory cell, to make it easier
# to understand what the code does. The cell is already
# set to zero by the game.
alias 1 counter
# retrieve an item from the INBOX
emp = inbox
# if `emp` is not ZERO, then put the value stored
# inside `emp` into the OUTBOX
if nz then
    incr counter
jmp pickone

The compiler would parse it, generate the IR and perform two conversions. The first one, in our case, changes all the if nz (if nonzero value in emp) to if ez (if emp contains a zero value), to normalize the IR for the second pass.

alias 1 counter
emp = inbox
# transformation here!
if ez then
    # nothing to do here!
    incr counter
jmp pickone

The second conversion, well, converts the if to target-language structures: conditional and unconditional jumps. The previous pass makes the work of this one easier, making the code simpler, as it has to handle just this case.

alias 1 counter
emp = inbox
# if -> jump here! simplify code generation!
jez _hrm_if_1
incr counter
jmp _hrm_endif_1
# the `if` is fully transformed, and ends here.
jmp pickone

After that, the code generation phase converts our IR to the target language. Again, the code generation' implementation is simpler because we translated higher level IR into equivalent IR that is nearer to the target language and thus easier to translate.

As it is a tool to help me think about the problem I'm trying to solve, this language is far from being a real, complete language with all the usual things you may be used to find in other languages. If you look at the code in the repository, you may discover I didn't implement a while construct: it's not a problem, you can build it with if and jmp, or conditional jumps. Instead, I preferred to create some helpers to let me think about a problem without worrying about "oh I don't have jnz, just jez, so I have to write this thing differently", I can just throw a if nz and let the compiler handle that.

Things to work on

The compiler can generate better code, to fix some problems due to the new instructions and operations added to the base language. After this post, I wrote a couple of optimizations, such as jump compression (to remove useless unconditional jumps) and unreachable code removal (to remove instructions that will be never executed and make the generated code smaller), but I'm sure there are other optimizations I can implement.

Another thing to consider is that the "emulated processor" the gamer is programming in Human Resource Machine is very limited: it only has one register, and its instruction set is very small. Those limitations make it difficult to create a richer language, but I think it'll be fun to overcome these challenges.

Do you like this project? Whisper (or shout!) your opinions via Twitter, or consider offering me a coffee on Ko-fi !

Experimenting with Rust and CloudFoundry buildpacks

In this post I'll write some notes about CloudFoundry buildpacks, which files are needed to make them work, and in general how I used an Heroku buildpack to deploy a simple Rust application to GE Predix (that uses CF).

Note: for those who did not heard of CloudFoundry, it is an open source platform that you can deploy on your own servers, AWS or other IaaS services to run your services. Using it lets you migrate applications from Predix to Bluemix or other PaaSes easily.

So, let's start with the problem I wanted to solve: I needed a simple, hackable backend to create some random data for a single-page Javascript application I'm building. Wouldn't it be the perfect time to test how Rust (and its web stack) performs in webdev?

Unfortunately, Rust is not supported "out of the box" in CloudFoundry. CF has some so-called system buildpacks for Java, Golang, Javascript single-page applications and other languages, but there is not one of them for Rust: it means we need to look at some custom buildpacks.

Before we go further... what is a buildpack?

A buildpack is a collection of scripts that prepares a virtual machine (dyno in Heroku slang) to build and run your application in the cloud. In this article we're going to see how to build one of these.

Ok, let's see...

The documentation about custom buildpack explains how a buildpack works, and which files are necessary to successfully build and run a project in the cloud (foundry).

Custom buildpacks need three (executable) files:

  • bin/detect

    This script determines whethever or not to apply the buildpack to an application. Basically it checks if you are using the wrong buildpack for your project, and stops you before you try to use the Java buildpack for a Rust application. Buildpacks can also be "framework-specific", so it makes sense to check if you're trying to run a Gradle buildpack to build an application meant to be built by Maven.

    It is run with only one parameter: the path to the build directory, where the source code and the files loaded during cf push are loaded. In a Rust application, it will contain your source code.

    CloudFoundry expects it to return 0 if everything is ok and the buildpack can continue, 1 otherwise.

  • bin/compile

    This script is in charge of configuring the environment and generating a runnable binary from sources. A Rust buildpack would download rustup, install it and use cargo build to build your application.

    This script is run with two parameters: the build directory and the cache directory. The last one will contain the assets created during the build process. A Rust buildpack we will speak about later used to store the compiled libraries there, in order to "cache" dependencies and make deploys faster.

  • bin/release

    This script must generate a .yml file that describes how the application should be executed. An example of such a .yml follows:

        web: ./target/release/YOUR_APPLICATION_NAME # in Rust, this suffices.

    This script only takes one parameter, the build directory. Please note: some developers use this script as a simple template, in order to automatically generate the correct .yml file during the execution of bin/compile.

An example of a buildpack that has these three scripts is the Rust buildpack built by Aaron Santavicca. If we look at it, we can see that:

  • bin/detect checks if our project has a Cargo.toml file. If it doesn't, it cannot be built with cargo.
  • bin/compile installs rustup, builds your application with cargo and generates the real bin/release script.
  • bin/release is a template to be modified when running bin/compile.

...and that's it. CloudFoundry runs these scripts, reads the .yml file and runs everything as requested. Yay!

Hey, I'm used to Heroku, do I need to change my trusted buildpack?

No, you don't need it.

CloudFoundry buildpacks are compatible with Heroku ones: the good guy HoverBear provides a working buildpack for Rust that relies on multirust and caches build artifacts (mostly dependencies), so you don't have to lose five minutes to build the world every time you launch cf push (at least, it used to do that before RustFest, to brutally fix an unexpected problem with removed dependencies and broken slug size limits).

However, making them run is a bit trickier. When I was evaluating Rust buildpacks, I looked at the HoverBear one, and... bin/release script wasn't there. What kind of magic was going here? How did it even work?

After some testing (and reading the docs), I found out that there are... two ways to make buildpacks missing bin/release work.

The first one is adding the command attribute to the manifest.yml file you wrote for your application. A command: ./target/release/YOUR_APPLICATION_NAME should suffice.

The second one is to add a Procfile to the project. A Procfile is a file that tells CloudFoundry (or Heroku) which command has to be executed to start your application. A simple Procfile for a Rust app may just contain the line

web: ./target/release/YOUR_APPLICATION_NAME


My team and I are developing web apps on Predix (a CloudFoundry instance managed by GE), and I use Rust to build some quick-and-dirty test backends to generate random responses for our frontend. To run Rust on CF, I needed to understand how to build a new application on it, and I spent some time to fully understand how things work. I hope these notes can be useful for you too!

__attribute__((packed)) on Windows is ignored (with MinGW)!

tl;dr by default, MinGW on win32 behaves as if it ignores __attribute__((packed)), you need to add -mno-ms-bitfields to make it work as expected.

Some days ago, I was tasked with a port of a simple networked application to Windows 7. These messages are just... a packed structure put into a buffer and then thrown into the TCP stack. Don't blame me, I trusted a bad advice, I was young and stupid!

We may get away with it (it works, somehow!). The big problem here is that everything falls apart if the structure layout is not like we expect it to be! In the best case, the application segfaults because it tried to access some memory we should not access, or some internal checks fire because the values are so wrong they simply are out of range. In the worst case... I don't know and don't want to discover it.

Let's say I'm passing this packed structure around...

typedef struct _mypackedstruct {
  uint32_t magic_number;
  uint8_t  version;
  uint32_t interesting_property;
  uint64_t creation_time_sec;
  uint64_t creation_time_usec;
  uint32_t first_info_no;
  uint32_t second_info_no;
} __attribute__((packed)) MyPackedStructure;

... and the code receives and parses the serialized structure, as in the following pseudo-C++ example

// a very simplified sample of the code that highlighted the problem
MyPackedStructure destination;
char* buffer = new char[sizeof(destination)];

// read some data from a previously created socket
// and push them into a buffer

// push the unparsed data into destination
// to perform an automatic "deserialization".
// Don't try this at home. Please, use serialization libraries.
memcpy(&destination, buffer, sizeof(destination));

// perform validity check on the structure's content.

On various Linux distros, everything works fine. Once I built the project on Windows, the is_packet_ok assert fired. Why?

After looking at the code, trying to understand the problem, I had the idea of checking the size of the structure. On Windows (using MinGW), sizeof(MyPackedStructure) returns 40, while on Linux (gcc 4.6.3, gcc 4.9.2, clang/llvm 3.8) the same expression returns 33. Seven bits sure do a big difference! I want you to focus your attention on that 40. It's easy to recognize that 40 mod 8 = 0: the structure was aligned to 8 bytes. Why was the packing attribute ignored?

A simple search on the Wired showed me that this strange behavior is known since April 2012, but the ticket is still marked as new. In the comments, you'll find a workaround: set -mno-ms-bitfields.

What is this -mno-ms-bitfiels option? The 6.36.5 i386 Variable Attributes section of the manual says

"If packed is used on a structure, or if bit-fields are used, it may be that the Microsoft ABI lays out the structure differently than the way GCC normally does. Particularly when moving packed data between functions compiled with GCC and the native Microsoft compiler (either via function call or as data in a file), it may be necessary to access either format."


"2. Every data object has an alignment requirement. The alignment requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either the aligned attribute or the pack pragma), whichever is less. For structures, unions, and arrays, the alignment requirement is the largest alignment requirement of its members. Every object is allocated an offset so that: offset % alignment_requirement == 0"

What does it mean? It means that, by default, MinGW uses the Microsoft algorithm to calculate packed structures requirements, and it doesn't work as expected.

The expected behavior (as described in the same link!) is:

The packed attribute specifies that a variable or structure field should have the smallest possible alignment—one byte for a variable, and one bit for a field, unless you specify a larger value with the aligned attribute.

To explain it, I will write a simple program that creates a packed structure and initializes it with known values. The program will be later inspected with GDB in order to check the actual memory layout of the structure.

// file: main.cpp
// MyPackedStructure is the same structure we defined at the start of the blog post
int main(){
  // print the size of the structure
  std::cout << "sizeof(MyPackedStructure) = " << sizeof(MyPackedStructure) << std::endl;

  // prepare a structure variable...
  MyPackedStructure obj;
  obj.magic_number         = 0xa5a61ff5;
  obj.version              = 2;
  obj.interesting_property = 0x33;
  obj.creation_time_sec    = 1462224873;
  obj.creation_time_usec   = 4141457;
  obj.first_info_no        = 5;
  obj.second_info_no       = 2;

  // ... and put a breakpoint here to inspect its memory!
  return 0;

I built and ran the code on ArchLinux (x86, GCC 5.3.0, GDB 7.11) and...

[~/projects/experimental/sizeof_packed_struct]> g++ -o test main.cpp -g # let's build it with the latest GCC on ArchLinux, x86
[winter@timeofeve] [/dev/pts/3] [master]
[~/projects/experimental/sizeof_packed_struct]> gdb ./test
Reading symbols from ./test...done.
(gdb) break main.cpp:29
Breakpoint 1 at 0x804875c: file main.cpp, line 29.
(gdb) r
Starting program: /home/winter/projects/experimental/sizeof_packed_struct/test
sizeof(MyPackedStructure) = 33

Breakpoint 1, main () at main.cpp:29
29        return 0;
(gdb) x /33x &obj  # let's dump 33 bytes of memory from the address of `obj`
0xbffff5af:     0xf5     0x1f    0xa6    0xa5    0x02   0x33    0x00    0x00
0xbffff5b7:     0x00     0xe9    0xc7    0x27    0x57   0x00    0x00    0x00
0xbffff5bf:     0x00     0x91    0x31    0x3f    0x00   0x00    0x00    0x00
0xbffff5c7:     0x00     0x05    0x00    0x00    0x00   0x02    0x00    0x00
0xbffff5cf:     0x00

... the memory is laid out as we'd expect it: we can see the magic number in the first 4 bytes, version occupies only the fifth byte. interesting_property is unaligned: the field is 4 bytes long, but the first 3 bytes are in the first row (0xbffff5af), and the last one lies in the second row (0xbffff5b7). Please note that MyPackedStructure, on ArchLinux is 33 bytes long.

If we run it on Windows 7 (x86_64, mingw 4.9.3, gdb 7.6.1, PowerShell x86), we get...

PS C:\Users\WinterHarrison\Desktop> g++ main.cpp -o test -g
PS C:\Users\WinterHarrison\Desktop> gdb .\test
Reading symbols from C:\Users\WinterHarrison\Desktop\test.exe...done.
(gdb) break main.cpp:29
Breakpoint 1 at 0x401498: file main.cpp, line 29.
(gdb) r
Starting program: C:\Users\WinterHarrison\Desktop/.\test.exe
[New Thread 164.0x5c8]
sizeof(MyPackedStructure) = 40

Breakpoint 1, _fu0___ZSt4cout () at main.cpp:29
29        return 0;
(gdb) x /40x &obj
0x28ff08:        0xf5    0x1f    0xa6    0xa5    0x02   (0xff    0x28    0x00)
0x28ff10:        0x33    0x00    0x00    0x00   (0x53    0x40    0x0e    0x64)
0x28ff18:        0xe9    0xc7    0x27    0x57    0x00    0x00    0x00    0x00
0x28ff20:        0x91    0x31    0x3f    0x00    0x00    0x00    0x00    0x00
0x28ff28:        0x05    0x00    0x00    0x00    0x02    0x00    0x00    0x00

... that MyPackedStructure is now 40 bytes long. The fact that my Arch is 32bit doesn't mean anything, as I'm using fixed-size integers.

Please also note that there are no unaligned fields: interesting_property now starts at the first byte of the second row (0x28ff10).

This fact is the real cause of the problem I described at the start of this post: memcpy writes important (and correctly packed!) data inside the padding bytes, that are then not considered nor accessible. When you try to read from the structure, you will get some junk with no meaning at all. Just a simple example: after the memcpy in the pseudocode at the top of the article, we'd expect obj.interesting_property to be 0x33, but it's 0 on Windows. 0x33 would be where we now can see 0xff (first row, sixth byte), but that particular address is not accessible by the client code. Well, I may access it doing some pointer arithmetics, but isn't the point of using a packed structure to avoid pointer arithmetics at all?

Let's now apply the suggested workaround: tell GCC "whatever, I don't care about this MS bitfields algorithm, please do as you do on Linux" using the -mno-ms-bitfields compiler flag.

PS C:\Users\WinterHarrison\Desktop> g++ main.cpp -o test -g -mno-ms-bitfields
PS C:\Users\WinterHarrison\Desktop> gdb .\test
Reading symbols from C:\Users\WinterHarrison\Desktop\test.exe...done.
(gdb) break main.cpp:29
Breakpoint 1 at 0x401498: file main.cpp, line 29.
(gdb) r
Starting program: C:\Users\WinterHarrison\Desktop/.\test.exe
[New Thread 832.0xaf8]
sizeof(MyPackedStructure) = 33

Breakpoint 1, _fu0___ZSt4cout () at main.cpp:29
29        return 0;
(gdb) x /33x &obj
0x28ff0f:       0xf5    0x1f    0xa6    0xa5    0x02    0x33    0x00    0x00
0x28ff17:       0x00    0xe9    0xc7    0x27    0x57    0x00    0x00    0x00
0x28ff1f:       0x00    0x91    0x31    0x3f    0x00    0x00    0x00    0x00
0x28ff27:       0x00    0x05    0x00    0x00    0x00    0x02    0x00    0x00
0x28ff2f:       0x00

Both the size and the memory dump look like the Linux size and dump. This is exactly what we wanted! Everything works now! Mission accomplished, let's call it a day!

So, what did I learn from this experience?

  • Use -mno-ms-bitfields if you want to use packed structures on Windows (using MinGW)
  • Serialization libraries (cereal, messagepack, protocolbuffers, ...) exist for a reason: to avoid writing similar blog posts and to ensure that messages are serialized and deserialized in the same way on every platform
  • Don't think that your code will run the same everywhere: always have a test suite (and don't make it a pain to build and run on a new target platform!)
  • Read your compiler's documentation, especially if you're using language extensions

That's all for today. I hope you will always use the right tool for the job, even if your boss(!) or a teammate strongly advices to adopt a suboptimal solution just because it's fast to build it, or "a memcpy is faster than whatever routine you can implement" (</rant>).