Stadia: the future of the gaming?

Google unveiled its new game streaming service: Stadia.

Stadia, which tagline is "gather around", recognized that there are two "disconnected universes": streamers -people who play games for their audience- and viewers, that maybe cannot play the same games or just enjoy looking someone else's performances.

The company tries to combine both worlds by creating a game streaming service that is also integrated in Youtube.

Interesting parts

Other websites already covered the conference: I will write down something I found interesting or exciting.

AMD designed a custom GPU, just for Stadia. Judging by its 10.7 teraflops, it is more powerful than the GPUs on current gen consoles. Developers can also use more than one GPU in their games, to make the games even more detailed in a transparent way for the user.

Stadia promises an "up to 4k 60fps" experience for the player, and all plays will be streamed on Youtube. The special "share" button on the custom controller should let creators (or random players) share their play and create a "state share" link. Creators can also use Crowd Play to let viewers join their games.

Every game on Stadia will be playable with existing controllers and every device that the user already owns -the user will just interact with a streaming, so it won't need a powerful device to be used.

What's missing?

Stadia is not the first game streaming service in the market, and won't be the last. Hopefully it won't fail as hard as OnLive, but there are several issues that should be resolved, or mitigated, before the launch (in US, UK, Canada and most of Europe).

Let's start with the one I find more pressing: PS Now launched a week ago in Italy, with varying results. Dadobax (an Italian videogame youtuber) experienced a very important input lag while testing Bloodborne on a 100Mbps fiber connection. Will Stadia suffer of the same problem? During the presentation, the Stadia representative said there will be a direct link between the ISP and the Stadia data centers, but I won't believe that everything works fine until I can try it.

Another issue is the access to the service: We don't know how much it will cost, and which titles will be there. At least we know that Doom Eternal, Assassin's Creed Odyssey, NBA 2K19 and Shadow of the Tomb Raider will be playable on there. We don't know if users must buy titles on the Stadia Store even if they bought it on other stores - such as Odyssey's UPlay store-.

I'm also worried that Youtube is going to fill with a lot of digital waste: no one is interested in my gameplays (I'm bad at videogames lol), so that footage won't be ever seen. I hope the service won't store every gameplay ever played on Stadia.

Your blog should be rendered on the server

"You probably don't need a single-page application" - Plausible"

This blog post from Plausible just reminded me of a a pet peeve of mine: blogs that are built as single page applications. I don't like being greeted with a blank page because I don't want to execute whatever code you send to my browser.

There are several use cases for single page applications, but blogs are not one of them, for several reasons. Some of them are already explained in the article, and I'm going to reiterate them too, but I want to add a different perspective to the Plausible's short essay.

I just want to read content

A blog just contains text and some media (images, audio or video). We don't need to download some Javascript, then execute it to request the real content and then show it to the user - why can't I just download the content?

Another advantage is that you don't need to a lot of work to let search engines index your work. Do you think it is a non-issue? Hulu would like to have a word with you. Due to some problems with their client-side-only rendering, Google could not index them anymore, destroying Hulu's previous work on their targeted keywords.

I don't care about the tool you use to generate your blog, whether you build it on your laptop and push via FTP or use Wordpress to write and publish your new eessays - it's just fine, all those workflows are valid - they let your readers (me!) just get the content they want to read.

If you really need to build your entire website using javascript, then please, I beg you: setup Server-side Static Rendering (SSR) to create a static "base" version of the website. Most libraries/frameworks support SSR, so just study your framework's documentation and set it up.

Javascript should enhance the experience

I really love New York Times'-style interactive data visualizations: they are part of the content they want to show you, so it's OK to enable Javascript to enjoy those applets. Even New York Times articles, though, are just text that are _enhanced_ by the interactive applets - you still get the text. It's ok to require Javascript to load comments - especially if you use third-party services such as Disqus - because, y'know, comments are not a core part of the experience.

So, please, stop forcing javascript on your blog readers. Just let me read your content.


I hope you liked this new issue of Interesting Links, a column where I highlight interesting articles, with my own comments and thoughts. If you appreciated this small rant, tell me that on Twitter, or support me on Ko-fi.

Emptying Cashless Keys with Minizinc

/images/20190126_105723_HDR-min.jpg

Since I bought my first cashless key to buy coffee at the company's vending machine, I never managed to spend every cent in the key - always fell short of 3 or 5 cents. I always wondered: which/how many drinks should I buy to reach zero? We will use Minizinc to answer this question.

What is Minizinc?

Minizinc is a free and open source constraint modeling language. It can be used to formally describe (model) constraint satisfaction and optimization problems in a high-level, solver-independent way. It is available for all major platforms as a command line tool, or bundled with an IDE, and several solvers.

In our case, we're going to solve an optimization problem! An optimization problem is the problem of finding the best solution from all solutions that respect the problem's constraints (feasible solutions). How do we define the best solution, though? We use an objective function: a function that, given a feasible solution, returns its value in the system we're studying.

Don't worry if you don't understand every single word right now, I'm going to explain them in a bit!

Let's write the model!

So, let's start by defining the parameters of our problem. We have a given amount in our key, and a list of beverages and their cost.

% the initial amount in our cashless key
int: START_AMOUNT;
% the list of beverages (coffee, tea, chocolate, ...)
enum BEVERAGES;
% how much does a beverage cost?
array[BEVERAGES] of int: COST;

Let's then define how we should spend our money! Our solution is defined as an array of integers - for each beverage b, quantity[b] is the number of times I have to drink b.

/* how many times do I have to drink a beverage B? */
array[BEVERAGES] of var int: quantity;

After defining the structure of the solution, we need to limit the values we can assign to the solution. Assignments to quantity must follow some rules - in this case, we must drink all beverages at least zero times, and we must spend every single cent we have. Every assignment that follows those rules is a feasible solution for the problem.

/* constraint: cannot drink a negative amount of drinks! */
constraint forall(b in BEVERAGES)(quantity[b] >= 0);

/* constraint: we must spend exactly START_AMOUNT */
var int: spent = sum(b in BEVERAGES)(quantity[b] * COST[b]);
constraint START_AMOUNT - spent = 0;

At the end, we want to check how we are going to spend our money. Our objective function is how many drinks we are going to buy. We want to find an assignment of quantity that maximizes the number of coffees we have to drink - hard earned money must not be wasted!

/* the objective function! */
var int: how_many_drinks;
constraint how_many_drinks = sum(b in BEVERAGES)(quantity[b]);

% use `maximize` to drink as many coffees as possible
% use `minimize` to cut down on coffees and get to zero euro as fast as possible
solve maximize how_many_drinks;

Results

Let's feed our model with a simplified instance of the problem: it will tell us that we can buy 8 coffees and 2 chocolates, for a total of 10 drinks. If we want to cut down on coffees, we can order 3 tea cups and 4 ginseng coffees.

shell % cat data.dzn
START_AMOUNT = 260; % 2.60 euro

BEVERAGES = { coffee, tea, ginseng, chocolate };
COST      = [     25,  40,      35,        30 ];
shell % minizinc model.mzn --data data.dzn
quantity = array1d(BEVERAGES, [8, 0, 0, 2]);
how_many_drinks = 10;
----------
==========
shell % # update the model to solve the minimization problem
shell % minizinc model.mzn --data data.dzn --all
quantity = array1d(BEVERAGES, [8, 0, 0, 2]);
how_many_drinks = 10;
----------
quantity = array1d(BEVERAGES, [2, 0, 0, 7]);
how_many_drinks = 9;
----------
quantity = array1d(BEVERAGES, [0, 0, 4, 4]);
how_many_drinks = 8;
----------
    quantity = array1d(BEVERAGES, [0, 3, 4, 0]);
how_many_drinks = 7;
----------
==========

Nothing stops us from adding more constraints, such as "I need to drink at least one chocolate", or "I don't want to buy more than one cup of tea", or that the number of drinks must be even because I offer coffees to a colleague. Try to experiment, add your own constraints, add more beverages and modify the prices!

References


If you liked this article, Tag me in a photo of your favorite morning beverage, or consider offering me a Ko-fi !

About "Programmer as wizard, programmer as engineer"

Programmer as wizard, programmer as engineer - Tedinski.com

We have different design goals and constraints when we’re doing one [wizarding] or the other [engineering]. With engineering, we’re tasked with the long term maintenance of a piece of software, and we also become concerned with many lower-priority properties, such as its performance. With wizarding, we’re often best served by designing for ease of throwing code away rather than maintaining it. Well-engineered code is irreplaceable, but well-cast spells should be transient.

In this post, Ted Kaminski talks about two different styles of programming: wizarding and engineering. At first, all startup prototypes/Minimum Viable Products are created as fast as possible: we need to see if our startup idea is viable first. If the business grows, the developers need to transform this prototype to a stable product, able to grow and include new features. Developers need to shift from the wizarding attitude (everything is temporary) to an engineering mindset (long term maintenance, the software needs to work for a long time and is used by many users).

These two "styles" -and the transition between them- reminds me of the Makers vs Menders difference. A Maker is a developer who love to create stuff quickly and are excited by deadlines and time pressures. On the other hand, a Mender likes older codebases, is satisfied by fixing longstanding bugs and making the whole system more stable and more secure.

How do we do switch from one style to the other, though? The author suggests we should write enough tests to be sure everything is working, separate big applications into microservices when it makes sense, and rewrite critical parts of your system in languages that can offer static (or gradual) typing and good tooling.

Separating concerns into their own services -when it makes sense and does not lead to a distributed monolith- can help reason about the system as a whole, and making each service faster and more stable, especially when paired with the right dose of testing.

I'm also a fan of static languages such as Rust (even in the browser!), that can feature good tools such as cargo, and an expressive language that helps developers implement critical systems with fewer errors.

I'd also like to suggest to recognize when the project should move from wizarding to engineering. Failure to understand when the project needs to transition to the engineering state, and plan for it, is probably the most unappreciated problem I encountered in my career so far. If the team can't slow down and fix growing pains, they will only get stronger and mutate into outages, and the team won't stop firefighting production issues. My suggestion is to keep note of small issues, or tasks that are taking longer than expected for technical reasons, and expose them during a retrospective. Having senior engineers in the team also helps, especially if they are experienced with the project's technology stack.


So, I hope you liked the first post of Interesting Links, a new column where I highlight interesting articles, with my own comments and thoughts. If you liked this post, you should totally share it on Twitter, or support me on Ko-fi.

Rust and WebAssembly - what I learnt from porting hrm-interpreter to wasm

Some time ago, I wrote a compiler for a fictional assembly-like language. As this compiler (named hrm-compiler) performs some optimizations on the code, I wanted to know if the optimizations were breaking the compiled code. To test them, I wrote a small interpreter (hrm-interpreter) to run a JSON version of the compiled code, and then embedded the webassembly version in the bare-bone debugger I wrote as a web application.

If you want to try, the best way to get started is the Rust and Webassembly Book , a tutorial that lets you write a Rust crate that can interact with Javascript and publish it to npm. I followed the tutorial, and managed to create a wasm wrapper of hrm-interpreter, named (such fantasy! amaze!) hrm-interpreter-wasm. Let me explain what I learnt while creating it.

Write a different crate to support wasm

If you need to convert Javascript types to Rust ones, I'd suggest to write a specific crate. It's better to think about it as if you're writing a wrapper. This new crate lets you write wasm-specific code, without modifying your library to support this new platform.

Not every library needs to create an ad-hoc crate, though: handlebars-rust offers users a flag to turn off filesystem-based I/O. This may work if your main library does not expose complex types, or you are already handling platform-specific stuff.

Don't force file I/O in the main library

What does an interpreter do?

  1. read a file
  2. convert its content to a list of operations
  3. execute each operation until the program ends or some error is raised
  4. writes the output of the program to a file (or a terminal)

Unfortunately, you cannot read or write files in a wasm environment - you cannot perform I/O! So we need to find another way: let's look at the list again.

  1. read a file and extracts its text as a huge string
  2. convert the content of the string to a list of operations
  3. execute each operation until the program ends or some error is raised
  4. captures the output of the program into a string
  5. writes the output string to a file (or a terminal)

So, we may notice that the focus is now on strings, not files.

As javascript can perform I/O (read files via FileReader, generate new files via FileWriter, or even do HTTP requests!), our wrapper crate may just take a string, pass it to the main crate and return the output string. In my case, I had to create some new methods to sidestep I/O from files.

There are other cases where you may need to modify your crate to port it to wasm: I suggest reading Which Crates Work with Wasm? to learn more.

try to expose as little state as possible - use functions instead

Javascript cannot manipulate complex stuff, such as options, so you cannot export whatever you want. Before wasting several hours trying to understand why you cannot "export" your beautiful struct with your awesome Option<Result<Beatiful<Type>>>, check out wasm-bindgen's Supported Types documentation chapter. Trust me :).

To work around it, in hrm-interpreter-wasm...

  • the main structure has no public fields
  • the internal state is serialized to a JSON-formatted string
  • data from Javascript are also JSON-formatted strings

This is not the best interface, but it works for now. "Now" means that I'll fix it once I integrate some work-in-progress stuff, such as the new Rust-based compiler and sourcemap support... If you can, try to avoid serialization, as it's pretty expensive... and we're using Rust in the browser to run code faster! ;)

Browser-side: use webpack 4!

If you want to test your new crate in a real application, you may want to use webpack. Starting from version 4, Webpack ships an internal wasm loader: it makes integrating wasm-based libraries a breeze!

react-create-app offers a very simple way to get started. If you have an older project, I recommend you to switch to react-scripts ^2.1.1 and webpack ^4.26. I am not sure you need to do this in new projects, but I had to "eject" the webpack configuration (webpack.config.dev.js) and exclude ".wasm" files from the extensions managed by the "file" loader.

// "file" loader makes sure those assets get served by WebpackDevServer.
// When you `import` an asset, you get its (virtual) filename.
// In production, they would get copied to the `build` folder.
// This loader doesn't use a "test" so it will catch all modules
// that fall through the other loaders.
{
// Exclude `js` files to keep "css" loader working as it injects
// its runtime that would otherwise be processed through "file" loader.
// Also exclude `html` and `json` extensions so they get processed
// by webpacks internal loaders.
// PLEASE NOTE THE .wasm REGEX HERE!!!
exclude: [/\.(js|mjs|jsx|ts|tsx)$/, /\.html$/, /\.json$/, /\.wasm$/],
loader: require.resolve('file-loader'),
options: {
    name: 'static/media/[name].[hash:8].[ext]',
  },
}

If you found those notes useful, or you want to suggest something else, please tell me on Twitter , or consider offering me a Ko-fi !

New release: ScreenOn v1.1

A new version of ScreenOn is now available! ScreenOn is an application to measure and evaluate how much are you using your smart phone.

The main change in this version is a revamp of the "distribution view", as seen in the screenshot below. I decided to replace the ugly scatter chart of the previous version: it was hard to read, and the "logarithmic/normal" stuff was annoying and unclear.

This new chart separates usage intervals into several classes. It lets the user understand some patterns of its phone usage: a lot of very short changes may mean they were anxious, and several long sessions (over 5 minutes / 300 seconds) may mean that they spend a lot of time reading on their phone, scrolling timelines/feeds or watching videos.

/images/screenon_v1.1_new_distribution_chart.png

This version also features some bug fixes in the data manipulation functions, and handles the "no data" case better. There is some work left to do - we have to teach new users how to feed data to the application from the Automate applet - but it's a first step nonetheless!

We also have an Hacktoberfest contribution! A new contributor integrated ScreenOn with TravisCI, a platform for continuous integration. It will help us understand if some new code introduces regressions or does not build, catching annoying problems before they have a chance to slip into releases!

Changelog

Introduced by Pull Request #2 (thanks @angelbarrera92):

  • TravisCI integration, based on Docker

Introduced by Pull Request #3:

  • (database) Add index on _screenon_ table
  • (frontend) automatically load data from a week ago to today
  • (frontend) handle "no data" case in distribution view and daily usage view
  • (frontend) write tests for client-side data manipulation functions
  • (frontend) rework distribution view: replace scatter chart with horizontal bar chart to show classes of usage (under 5 seconds, 5-10 seconds, 10-30 seconds, 30 seconds to one minute, 1-5 minutes, over 5 minutes)

If you want to test ScreenOn, read its source code and contribute bugfixes... head over to Github! You can also help this project by offering me a Ko-Fi

How to set timezone in px-vis-timeseries

/images/px-vis-timeseries-example.png

px-vis-timeseries is one of the most used web components in the Predix toolkit set. This must-know component is essential, as it lets chart time-based data and events! Unfortunately, it's not so easy to configure it to show the correct timezone. Let's see how to set it right, then.

Disclaimer: in this post, we are discussing px-vis-timeseries, version 4.0.0. This post may not apply to future versions of the components discussed here.

Gotcha #1: the default x-axis timezone is UTC

The default timezone of the X axis is UTC. While it may be the norm in engineering disciplines or technical users, it may confuse normal users: they are used to the timezone they live in, and UTC may confuse them (if they're even aware of it!)

To see the time in the browser's timezone, you need to set x-axis-type="timeLocal". By default, it's set to x-axis-type="time".

<px-vis-timeseries
    x-axis-type="timeLocal"
    chart-data="[[chartData]]"
    height="450"
    show-tooltip
    hide-register
    series-config="[[seriesConfig]]">
</px-vis-timeseries>

Gotcha #2: tooltip and register timezones can be configured separately!

The tooltip and the register are independent components, and px-vis-timeseries does not configure them automatically, nor offers a single configuration point.

While this design choice makes the parent implementation simpler and allows for more flexibility, the users have to remember to configure each sub-component.

The configuration can be provided by setting tooltip-config and register-config, to set the tooltip and the register.

<px-vis-timeseries
    tooltip-config="[[timezoneConfig]]"
    register-config="[[timezoneConfig]]"
    chart-data="[[chartData]]"
    height="450"
    show-tooltip
    hide-register
    series-config="[[seriesConfig]]">
</px-vis-timeseries>

To set the timezone, you can either use a timezone string (such as "Europe/Rome") or use the browser-provided Intl global object to retrieve the current timezone.

this.timezoneConfig = {
    // timezone: "Europe/Rome"
    timezone: Intl.DateTimeFormat().resolvedOptions().timeZone
};

Proof of Concept

I've prepared a minimal proof of concept to experiment with. This PoC is hosted on wintermade.it: it lets you test the code for both gotchas, all in one component. If you want to modify it, there is a live-editable version on Glitch, a code editing service that lets you host your experiments/small websites. In both cases, you need to enable javascript to see the example :D.

Did these notes save your day? Offer me a Ko-Fi ! Or drop me a tweet if you find errors or typos!

ScreenOn: Measuring phone usage with Automate and Python

Some time ago, I was a bit nervous, so I started to browse twitter on my phone. I didn't start using it because I received a notification or a friend sent me an interesting post, but to divert my attention to the task I should have been working on. After some minutes spent scrolling the timeline, it seemed to me that I "woke up" and understood what I was doing. I understood that it was not the first time it happened, but it was just the last occurrence of a long series of "procrastination attack".

I needed to stop wasting my time reading rants of randos on the internet. Before working on a solution, though, I wanted to know how much time I actually was spending on the phone.

So, I started working, and created ScreenOn. The solution is composed of three parts:

  1. an Automate flow to read screen status and save every change to a local sqlite3 database
  2. an Automate flow to push screen changes from sqlite3 to a Python web application
  3. a web application (Python+javascript) to store and see visualizations/stats

The code and the applet are available on Github.

Automate flows

/images/screenon_automate_flow.jpg

Automate is an Android application that lets you develop your own programs via a visual language. To create a program, you add, configure and connect blocks, and each block corresponds to a single action or condition. In a single program, you can create different flows, that are executed in parallel.

As you can see, there are different flows:

  1. Start: detect screen status changes (when it turns on or off), and save them
    to an sqlite database
  2. Backup: save screen changes records to a Python webservice on the same network
  3. Config: configures the database

When the screen turns on or off, the applet saves the current time as a unix timestamp, and the status of the screen (1 = on, 0 = off).

Web application

ScreenOn has a web application, with a Polymer frontend The data are provided by the Python webservice that is also invoked by the Backup Automate flow.

The web application lets us interact with some charts, so let's see them!

/images/screenon_usage_per_day.png

Usage per Day is the default view: for each day in the selected period, it shows the time spent on the phone and how many times the screen was turned on and off that day.

/images/screenon_screen_changes.png

Screen Changes is a cumulative chart of the phone usage time in the selected day, and how many changes triggered the changes. The "screen status changes" line should help users understand short-timed phone usages.

/images/screenon_distribution.png

Distribution of Usage aims to highlight the duration of the usage, using a xy-scatter chart. On the x-axis, we can find the interval of time when the phone screen was turned on. On the y-axis, we can find how many times that interval "happened".

There are two "modes": the default one (Logarithmic) aims to highlight the short-timed usages, by applying the log operation to the time intervals. This operation compresses the larger values, while leaving more space to the smaller values.

The other one, Normal does not apply any compression to the data, and is better suited when we need to analyze longer time intervals.

Analysis

/images/screenon_analysis_usage_april_to_june.png

This is the daily usage chart, from April to June 2018. We can see that the behavior is more or less the same every week: on work days (from Monday to Friday) I use the phone less than during the week-end (Saturday and Sunday). The only exception is 25th of April, that is a national holiday in Italy.

/images/screenon_analysis_screenchanges_20180606.png

This is an example of a Wednesday, a normal work day. There are some things I can show you:

  • My work hours are from 9 AM to 6 PM, so you can see that I don't use the phone very much during that time. The increase near 2 PM is my lunch break: I scroll Twitter, check my private email account, and show small videos to friends.
  • At the time, I was using Youtube as my music player when on the road. It was before the Premium stuff, so you had to keep your screen on even if the phone was in the pocket and you weren't looking at it.
  • Every Wednesday evening, a streamer I follow on Twitch.tv streams a lore walk-through of the whole Metal Gear Solid saga, and every stream is ~3 hours long...
/images/screenon_analysis_screenchanges_20180617_masszoom.png

This screenshot will help us see how the Screen Changes chart looks like when there are a lot of small changes.

As we can see, between 11 AM and 1 PM I used the phone several times in a row for short period of times. We can understand that by seeing that the black line (the time the screen was turned on) grows slowly, but the orange line (the number of changes) grows very fast in the same interval.

Next Steps

No software is ever finished. ScreenOn can benefit from additional work. Some of the next steps are purely technical, others aim to capture more data in order to better understand why the user is using their phone and how they're spending their time.

1) CSV Exports

Sure, an user could just stop the web application, dump the database and restart the application, but why should they do that? What if you want to perform some other kind of analysis on the data? I want to empower the users and let them create their own analyses.

2) Handle timezones better

Right now, the Automate flow that records the screen changes only records an unix timestamp. It may works in a narrow set of conditions, such as the phone never changes the timezone and the browser is in the same timezone of the phone. ScreenOn can do better: the user should be able to set a preferred timezone, or set the chart to use data's timezone!

3) Twitch/Youtube is not Twitter/Mastodon scrolling!

In the analysis, we saw that the phone was used for long period of times during evenings and nights. Personally, I prefer to turn on Youtube or Twitch on my mobile device, either as TV-like device or to actively see my favourite content creators on those platforms. I don't know how to implement it yet, but it should not be very difficult.

4) Ask why you are using the phone, if using it for more that X minutes or X times in an hour

There is a reason if someone is using their phone so much. Maybe they're stuck waiting in traffic, or waiting for the build to finish. What if they just could be anxious or don't want to think about something that makes them nervous?

If this is the case, "setting a goal to use the phone less" is just treating the symptoms instead of curing the disease: the phone became an anxiety-relief device, nervous people will use something else instead. A tracker like ScreenOn cannot register context, so the Automate applet may periodically ask the user how they feel and why are they using the phone.

If you want to propose a new kind of chart, or point out some errors and typos, don't hesitate and let me know! Contact me on Twitter. If you liked this article, why don't you offer me a Ko-Fi ?

HRMc optimizations - unreachable code removal

tl;dr: this post explains a compiler optimization based on graph visits to remove the code that won’t be executed in the generated source code

In HRMc I try to keep every transformation as small as possible, have a minimum set of assumptions about the input, and avoid having to think about future transformations. This philosophy means that every pass does just what it should do, but it also implies that some stuff that will be considered “useless” after the transformation is not removed.

Let’s look at the jump compression optimization: after the execution of that transformation, we can see that the last jmp start is not going to be executed.

# hrmc example.hrm --no-unreachable # jump-compression is enabled, unreachable-code is not
start:        inbox     <------|
              jez start  ---->-|
              outbox           |
              jmp start  ---->-|
_hrm_1,
_hrm_endif_1: jmp start  <-------- won't be executed! ever!

Obviously, an optimization named “jump compression” should not bother removing that instruction. We’d need a different pass to deal with that.

traversing the execution graph

Let’s talk about instructions, the execution flow of a program and graphs.

# A simple program in HRM that for each item
# in the inbox, filters and writes all the "zero"s.
start:  inbox
        jez write
        jmp start
        copyto 1   # this won't be executed! pay attention!
write:  outbox
        jmp start

In most programming languages, instructions are executed sequentially, from start to end. To be able to express complex programs, a programming language must leave a way to select different instructions according to a specific state (if <condition>) and a way to repeat instructions until some condition happens (while <condition>). In lower-languages, they are implemented using jumps. Conditional jumps (j<condition> <label>) set the next instruction to the instruction tagged with <label> only if the <condition> is respected, continues to the successive instruction otherwise. Unconditional jumps (jmp <label>) set the next instruction to the one tagged with <label>.

start
+------------+
| inbox      <----------+
+------------+   |      |
      |          |      |
+-----v------+   |      |
| jez write  |   |      |
+------------+   |      |
   |      |false |      |
   |      |      |      |
   |   +--v---------+   |
   |   | jmp start  |   |
   |   +------------+   |
   |true                |
   |                    |
   |   +------------+   |
   |   | copyto 1   |   |
   |   +------------+   |
   |                    |
   |     write          |
+--v---------+          |
|  outbox    |          |
+------------+          |
    |                   |
+---v--------+          |
| jmp start  +----------+
+------------+

This is the execution flow of the program written before: we can see it looks like a graph, and in fact it is! Most of boxes have one outward arrow to the next box, and the box containing a “conditional jump” (jez write) has two arrows pointing to two different boxes! It is clear that there is a one-to-one correspondence between boxes and instructions.

Let’s notice a small thing: there is a box with no arrows pointing to it. It means that the instruction inside the box (copyto 1) won’t be executed! Why can I say so? Because that box is not reachable from the first box: there does not exist a path from inbox to copyto 1. Instead, outbox is reachable: the path is inbox -> jez write -> outbox. copyto 1 is marked as unreachable code, or dead code because it won’t be executed. Removing such instructions does not modify what a program does!

How can we understand if an instruction will be executed? The execution flow graph is… a graph, so we can perform a depth-first search on it and see which nodes are visited or not.

There is a more interesting fact, though: we can use that algorithm to generate a new graph without unreachable instructions! The rule to follow is: if the algorithm visits a conditional jump instruction, always visit the branch that would be executed if the condition would evaluate to false. This rule will help us generate an equivalent code if you linearize the graph.

The non-recursive version of the depth-first search algorithm implemented in conversion.py of hrm-compiler identifies visited nodes and keeps track of the labels associated to the visited nodes.

Did you enjoy this explanation? Tell me what you think about it. If it helped you, please consider offering me a Ko-fi !

HRMc optimizations: jump compression

tl;dr: this post explains a small optimization, that I call "jump compression", that is implemented in the HRM compiler.

the issue

When I started using the HRMc compiler, I noticed that there were several useless jumps and labels all over the place, sometimes leading to huge chains of jumps that could just be removed in the generated code.

Let’s start with this simple program that puts in the output tape every non-zero item from the input tape.

# example.hrm
start:
# read from inbox
emp = inbox
# put in the output queue if emp is non-zero
if nz then
    outbox
endif
# start again
jmp start

The program is then compiled (without optimizations!) to this “assembly” snippet. As you can see, the compiler here generates some ad-hoc labels to implement the if operation with jumps.

The little ASCII diagram of the jumps lets us understand two things about the flow of the program

  1. the jumps both reach the same instruction (the last jump), because it’s tagged with two labels (_hrm_1, _hrm_endif_1)
  2. the last jump goes back to the first instruction, tagged start
# hrmc example.hrm --no-unreachable --no-jump-compression
start:        inbox      <----(2)------|
              jez _hrm_1 ----------|   |
              outbox               |   |
              jmp _hrm_endif_1 --| |   |
_hrm_1,                          | |   |
_hrm_endif_1: jmp start  <--(1)--|-|   |
                     |-----------------|

The problem with this code here is that the game CPU would spend nearly a third of its time following jumps. Running it with hrm-interpreter, with random inputs we’d notice that jumps are more than half of all instructions executed from the CPU. Isn’t it a waste of time? What if we could avoid the “jump chain” and go directly to the right instruction?

the algorithm

Let’s define a “jump chain”: if a jump A, either conditional or unconditional, refers to a label that tags another jump instruction B, then we have a jump chain. An example of a jump chain is (1) in the previous example: it’s safe to replace jmp _hrm_endif_1 with jmp start, as they’re clearly equivalent in that context.

The idea is then to shrink/compress the jump chain: every time we find a jump, either conditional or unconditional, we find the first non-jump operation directly reachable from that node in the AST, then take the label associated with that instruction and replace the original jump’s label with it.

Let’s then look at a commented version of the algorithm, as implemented in conversion.py of hrm-compiler

# compress_jumps: Instruction[] -> Instruction[]
def compress_jumps(ast):
    # labels_in_ast : Instruction[] -> Map<label: string, position: number>
    # labels_in_ast maps every label to the position of the instruction tagged by the label
    labels_positions = labels_in_ast(ast)

    compressed_ast = []

    for index, ast_item in enumerate(ast):
        if type(ast_item) in [p.JumpOp, p.JumpCondOp]:
            jump_going_nowhere = False
            try:
                # get the first position executable by `_label`
                _label = ast_item.label_name
                next_pos = labels_positions[_label]
            except KeyError:
                # `_label` does not tag any instruction.
                # save the instruction: removing jumps, either conditional or
                # unconditional, modified the semantics of the source program
                compressed_ast.append(ast_item)
                # go to the next instruction
                continue

            # let's visit the jump chain:
            # a jump chain stops when
            # (1) the current instruction is not an unconditional jump,
            # (2) the instruction was already visited (countermeasure to loops)
            # (3) the jump instruction refers to a label that does not tag any instruction.
            # the algorithm updates `_label` until the jump chain stops.
            visited = [False for i in ast]
            while type(ast[next_pos]) == p.JumpOp and \
                not visited[next_pos] and \
                not jump_going_nowhere:
                visited[next_pos] = True
                _label = ast[next_pos].label_name
                try:
                    next_pos = labels_positions[_label]
                except KeyError:
                    jump_going_nowhere = True

            # no matter why the previous loop stopped,
            # _label is guaranteed to contain the label that is located
            # nearest to the first non-`jmp` instruction.
            # let's then create add the new jump instruction to compressed_ast
            if type(ast_item) == p.JumpOp:
                compressed_ast.append(p.JumpOp(_label))
            else:
                compressed_ast.append(p.JumpCondOp(_label, ast_item.condition))
        else:
            compressed_ast.append(ast_item)

    return compressed_ast

As hinted by the previous ASCII diagram, the algorithm exploits the graph-like features of the instruction array (named AST here, but it's more similar to a compiler-specific Intermediate Representation, or IR): the algorithm does a depth-first visit of the graph defined by the IR: the visit stops when one of three conditions become true:

  1. the currently visited instruction is not a conditional jump
  2. the instruction was already visited
  3. it is not possible to visit the next instruction

When the visit stops, _label is the name of the label that is nearest to the non-jmp instruction that the CPU is going to execute at the end of the jump chain.

There are two possible cases:

  1. we reached the non-jmp instruction: _label is the name of the label tagging that instruction
  2. we couldn't reach the non-jmp instruction: the algorithm stopped at the jmp instruction, and we can get the name of the label from it

the result

If we enable this optimization in the compiler, it will generate the following code:

# hrmc example.hrm --no-unreachable # jump-compression is enabled
start:        inbox     <------|
              jez start  ---->-|
              outbox           |
              jmp start  ---->-|
_hrm_1,                        |
_hrm_endif_1: jmp start  ---->-|

Now we see that every jump instruction now refers to the right label: it means we are not going to waste time jumping around!

Conclusions

In this article, I tried to explain a simple optimization I've implemented in HRMc, a simple compiler for the Human Resource Machine game's assembly language. Thank you for spending your time reading these notes.

Did you enjoy this explanation? Does this optimization have a real/academic name? Feel free to tell me on Twitter ! If you liked this article, please consider offering me a Ko-fi !