How to write generic dissectors in Wireshark

Wireshark is a flexible network analyzer, that can also be extended via plugins or dissectors.

A dissector is a kind of plugin that lets Wireshark understand a protocol - in our case, a protocol that is only used by a certain application. There are several reasons to create your own (application-level) protocol over UDP/IP or TCP/IP, such as efficiency (by sending only binary data, formatted in a certain application-specific format).

Wireshark is a very helpful tool during system integration tests, or while developing a networked application. A dissector helps developers and testers check if the applications under test are sending (or receiving) data correctly - if the structure of a certain message is as defined by the protocol, if some fields have invalid values, if an application is sending more (or fewer) messages than expected in a certain timeframe.

WireShark Generic Dissectors - a declarative approach

Wireshark Generic Dissectors (WSGD) is a plugin that lets you define a dissector for your custom protocol, in a declarative manner.

Being declarative is a cool idea - by just saying what the protocol looks like, the content of the dissector is clear to a technical, but non-developer, user. Such protocol descriptions can also be used as documentation, without having to manage different Wireshark API versions (as it may happen with Lua-based dissectors). It's not all fun and games though: this plugin has some (reasonable) limitations, such as not managing text protocols, or requiring an header common to every kind of message described in the protocol.

Let's write a generic dissector

Let's start with the Wireshark Generic Dissector file: it contains some metadata about the protocol. These metadata, consisting of details such as the protocol name, the structure that sketches the header of all messages in the protocol and the main message, are necessary to be efficient when parsing the messages during the capture.

# file custom.wsgd

# protocol metadata
PROTONAME Custom Protocol over UDP
PROTOSHORTNAME Custom
PROTOABBREV custom

# conditions on which the dissector is applied:
# the protocol will be applied on all UDP messages with port = 8756
PARENT_SUBFIELD udp.port
PARENT_SUBFIELD_VALUES 8756

# the name of the header structure
MSG_HEADER_TYPE                    T_custom_header
# field which permits to identify the message type.
MSG_ID_FIELD_NAME                  msg_id
# the main message type - usually it is a fake message, built of one
#    of the possible messages
MSG_MAIN_TYPE                      T_custom_switch(msg_id)

# this token marks the end of the protocol description
PROTO_TYPE_DEFINITIONS

# refer to the description of the data format
include custom.fdesc;

The second file is the data format description: it described the messages of the protocol we're writing a dissector for.

# file custom.fdesc

# here, we define an enumerated type to list the type of messages
#   defined in our protocol
enum8 T_custom_msg_type
{
    word_message   0
    number_message 1
}

# here, we define the structure of the header.
# The header (the same for each message type) must...
struct T_custom_header
{
    # ... define the order of the data
    byte_order big_endian;
    uint32 counter;
    uint8  size_after_header;
    # ... contain the field defined as MSG_ID_FIELD_NAME
    T_custom_msg_type msg_id;
}

struct T_word_message
{
    T_custom_header header;
    uint8           word_len;
    # array of characters
    char[word_len]  word;
    # "word" messages will always have some unused trailing bytes:
    #   they can be marked as raw(*) - the size is calculated at runtime
    raw(*)          spare;
}

struct T_number_message
{
    T_custom_header header;
    uint8           number;
    bool8           is_even;
}

# T_custom_switch is the main message (as defined in the protocol description)
# according to the parameter msg_id (of type T_custom_msg_type), we define
# the main message to be defined by a single message: either T_word_message or T_number_message.
switch T_custom_switch T_custom_msg_type
{
case T_custom_msg_type::word_message:   T_word_message "";
case T_custom_msg_type::number_message: T_number_message "";
}

Generating some network traffic...

To verify that the dissector we've written is correct, we are going to build a small client to send some UDP messages to a very simple server.

Let's start with the server: it just receives UDP messages on port 8756, and prints the contents of those messages.

import socketserver

class CustomHandler(socketserver.DatagramRequestHandler):
    def handle(self):
        data = self.request[0].strip()
        print(data)

if __name__ == "__main__":
    serv = socketserver.UDPServer(("127.0.0.1", 8756), CustomHandler)
    serv.serve_forever()

The client sends some data to our server - we just need it to generate some traffic to sniff on Wireshark.

import socket
import struct
import random
import string
import time

HOST, PORT = "localhost", 8756

# SOCK_DGRAM is the socket type to use for UDP sockets
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

# refer to `pydoc struct`
HEADER_STRUCT = "".join([
    ">",  # network byte order
    "L",  # counter
    "B",  # message size
    "B",  # message type (0: word, 1: number)
])

PAYLOAD_WORD_TYPE = HEADER_STRUCT + "".join([
    "B",    # word length
    "100s", # string (at most 100 characters)
])
word_struct = struct.Struct(PAYLOAD_WORD_TYPE)

PAYLOAD_NUMBER_TYPE = HEADER_STRUCT + "".join([
    "B",  # number
    "B",  # 0: even, 1: odd
])
number_struct = struct.Struct(PAYLOAD_NUMBER_TYPE)

msg_counter = 0
while True:
    msg_counter += 1

    # prepare data to send
    if random.random() < 0.70:
        num = random.choice(range(256))
        is_even = num & 1
        data = number_struct.pack(msg_counter, 2, 1, num, is_even)
    else:
        string_len = random.choice(range(100))
        the_string = bytes("".join(random.choice(string.ascii_letters+" ") for i in range(string_len)), "ascii")
        data = word_struct.pack(msg_counter, 101, 0, string_len, the_string)

    # send the message
    sock.sendto(data, (HOST, PORT))

    # wait 200ms
    time.sleep(0.2)

Set it up

Wireshark Generic Dissector is a binary plugin, distributed as a .so file - please read the installation procedure. I've summarized what I did to install the plugin and the files we've written so far:

# download the plugin - be sure it's the right one for
# the version of Wireshark installed on your system
wget http://wsgd.free.fr/300X/generic.so.ubuntu.64.300X.tar.gz
# extract the file generic.so
unzip ./generic.so.ubuntu.64.300X.tar.gz
# install the shared object globally by putting in the right folder
sudo cp generic.so  /usr/lib/wireshark/plugins/3.0/epan
# install the dissector files in the right folder - the same of the shared object
sudo cp custom.wsgd /usr/lib/wireshark/plugins/3.0/epan
sudo cp custom.fdesc /usr/lib/wireshark/plugins/3.0/epan

Test drive

/images/wireshark-wsgd-with-dissector.png

As we can see by the screenshot, we are now able to see the content of the messages our application is sending to the server, without writing a single line of code (other than our application, obviously).

References


If you feel that this article helped you, feel free to share it! If you have questions, ask on Twitter, or offer me a coffee to let me keep writing these notes!

Stadia: the future of gaming?

Google unveiled its new game streaming service: Stadia.

Stadia, which tagline is "gather around", recognized that there are two "disconnected universes": streamers -people who play games for their audience- and viewers, that maybe cannot play the same games or just enjoy looking someone else's performances.

The company tries to combine both worlds by creating a game streaming service that is also integrated in Youtube.

Interesting parts

Other websites already covered the conference: I will write down something I found interesting or exciting.

You can access the platform by just pressing a "Play" button at the end of a videogame Youtube video - if you're using Google Chrome, obviously.

AMD designed a custom GPU, just for Stadia. Judging by its 10.7 teraflops, it is more powerful than the GPUs on current gen consoles. Developers can also use more than one GPU in their games, to make the games even more detailed in a transparent way for the user.

Stadia promises an "up to 4k 60fps" experience for the player, and all plays will be streamed on Youtube. The special "share" button on the custom controller should let creators (or random players) share their play and create a "state share" link, to let other people play the same portion of the creator's gameplay. Creators can also use Crowd Play to let their Youtube viewers join their games and better interact with them.

Every game on Stadia will be playable with existing controllers and every device that the user already owns -they will just interact with a streaming, so they won't need a powerful device to use the service.

This new platform now explains two major features of Google products I never really understood: Youtube's videogame channels (containers that automatically gather and categorize videos about specific games - see Sekiro's automatically generated channel as an example) and the WebUSB standard, that is only implemented by Chrome.

What's missing?

Stadia is not the first game streaming service in the market, and won't be the last. Hopefully it won't fail as hard as OnLive, but there are several issues that should be resolved, or mitigated, before the launch (in US, UK, Canada and most of Europe).

Let's start with the one I find more pressing: PS Now launched a week ago in Italy, with varying results. Dadobax (an Italian videogame youtuber) experienced a very important input lag while testing Bloodborne on a 100Mbps fiber connection. Will Stadia suffer the same problem? During the presentation, the Stadia representative said there will be a direct link between the ISP and the Stadia data centers, but I won't believe that everything works fine until I can try it. Other commenters note that, even if there will be no input lag, there is a risk of seeing video artifacts due to video streaming compression algorithms.

Another issue is the access to the service: We don't know how much it will cost, and which titles will be there. At least we know that Doom Eternal, Assassin's Creed Odyssey, NBA 2K19 and Shadow of the Tomb Raider will be playable on the platform. We don't know if users must buy titles on the Stadia Store even if they bought it on other stores - such as Odyssey's UPlay store-.

I'm also worried that Youtube is going to fill with a lot of digital waste: no one is interested in my gameplays (I have to admit I'm bad at videogames), so that footage won't be ever seen, but will still take some space on a random hard drive in the cloud. I hope the service won't store every gameplay ever played on Stadia.

Did you like the conference? Are you hyped? Are you critical? Let me know on Twitter !

Your blog should be rendered on the server

"You probably don't need a single-page application" - Plausible"

This blog post from Plausible just reminded me of a a pet peeve of mine: blogs that are built as single page applications. I don't like being greeted with a blank page because I don't want to execute whatever code you send to my browser.

There are several use cases for single page applications, but blogs are not one of them, for several reasons. Some of them are already explained in the article, and I'm going to reiterate them too, but I want to add a different perspective to the Plausible's short essay.

I just want to read content

A blog just contains text and some media (images, audio or video). We don't need to download some Javascript, then execute it to request the real content and then show it to the user - why can't I just download the content?

Another advantage is that you don't need to a lot of work to let search engines index your work. Do you think it is a non-issue? Hulu would like to have a word with you. Due to some problems with their client-side-only rendering, Google could not index them anymore, destroying Hulu's previous work on their targeted keywords.

I don't care about the tool you use to generate your blog, whether you build it on your laptop and push via FTP or use Wordpress to write and publish your new eessays - it's just fine, all those workflows are valid - they let your readers (me!) just get the content they want to read.

If you really need to build your entire website using javascript, then please, I beg you: setup Server-side Static Rendering (SSR) to create a static "base" version of the website. Most libraries/frameworks support SSR, so just study your framework's documentation and set it up.

Javascript should enhance the experience

I really love New York Times'-style interactive data visualizations: they are part of the content they want to show you, so it's OK to enable Javascript to enjoy those applets. Even New York Times articles, though, are just text that are _enhanced_ by the interactive applets - you still get the text. It's ok to require Javascript to load comments - especially if you use third-party services such as Disqus - because, y'know, comments are not a core part of the experience.

So, please, stop forcing javascript on your blog readers. Just let me read your content.


I hope you liked this new issue of Interesting Links, a column where I highlight interesting articles, with my own comments and thoughts. If you appreciated this small rant, tell me that on Twitter, or support me on Ko-fi.

Emptying Cashless Keys with Minizinc

/images/20190126_105723_HDR-min.jpg

Since I bought my first cashless key to buy coffee at the company's vending machine, I never managed to spend every cent in the key - always fell short of 3 or 5 cents. I always wondered: which/how many drinks should I buy to reach zero? We will use Minizinc to answer this question.

What is Minizinc?

Minizinc is a free and open source constraint modeling language. It can be used to formally describe (model) constraint satisfaction and optimization problems in a high-level, solver-independent way. It is available for all major platforms as a command line tool, or bundled with an IDE, and several solvers.

In our case, we're going to solve an optimization problem! An optimization problem is the problem of finding the best solution from all solutions that respect the problem's constraints (feasible solutions). How do we define the best solution, though? We use an objective function: a function that, given a feasible solution, returns its value in the system we're studying.

Don't worry if you don't understand every single word right now, I'm going to explain them in a bit!

Let's write the model!

So, let's start by defining the parameters of our problem. We have a given amount in our key, and a list of beverages and their cost.

% the initial amount in our cashless key
int: START_AMOUNT;
% the list of beverages (coffee, tea, chocolate, ...)
enum BEVERAGES;
% how much does a beverage cost?
array[BEVERAGES] of int: COST;

Let's then define how we should spend our money! Our solution is defined as an array of integers - for each beverage b, quantity[b] is the number of times I have to drink b.

/* how many times do I have to drink a beverage B? */
array[BEVERAGES] of var int: quantity;

After defining the structure of the solution, we need to limit the values we can assign to the solution. Assignments to quantity must follow some rules - in this case, we must drink all beverages at least zero times, and we must spend every single cent we have. Every assignment that follows those rules is a feasible solution for the problem.

/* constraint: cannot drink a negative amount of drinks! */
constraint forall(b in BEVERAGES)(quantity[b] >= 0);

/* constraint: we must spend exactly START_AMOUNT */
var int: spent = sum(b in BEVERAGES)(quantity[b] * COST[b]);
constraint START_AMOUNT - spent = 0;

At the end, we want to check how we are going to spend our money. Our objective function is how many drinks we are going to buy. We want to find an assignment of quantity that maximizes the number of coffees we have to drink - hard earned money must not be wasted!

/* the objective function! */
var int: how_many_drinks;
constraint how_many_drinks = sum(b in BEVERAGES)(quantity[b]);

% use `maximize` to drink as many coffees as possible
% use `minimize` to cut down on coffees and get to zero euro as fast as possible
solve maximize how_many_drinks;

Results

Let's feed our model with a simplified instance of the problem: it will tell us that we can buy 8 coffees and 2 chocolates, for a total of 10 drinks. If we want to cut down on coffees, we can order 3 tea cups and 4 ginseng coffees.

shell % cat data.dzn
START_AMOUNT = 260; % 2.60 euro

BEVERAGES = { coffee, tea, ginseng, chocolate };
COST      = [     25,  40,      35,        30 ];
shell % minizinc model.mzn --data data.dzn
quantity = array1d(BEVERAGES, [8, 0, 0, 2]);
how_many_drinks = 10;
----------
==========
shell % # update the model to solve the minimization problem
shell % minizinc model.mzn --data data.dzn --all
quantity = array1d(BEVERAGES, [8, 0, 0, 2]);
how_many_drinks = 10;
----------
quantity = array1d(BEVERAGES, [2, 0, 0, 7]);
how_many_drinks = 9;
----------
quantity = array1d(BEVERAGES, [0, 0, 4, 4]);
how_many_drinks = 8;
----------
    quantity = array1d(BEVERAGES, [0, 3, 4, 0]);
how_many_drinks = 7;
----------
==========

Nothing stops us from adding more constraints, such as "I need to drink at least one chocolate", or "I don't want to buy more than one cup of tea", or that the number of drinks must be even because I offer coffees to a colleague. Try to experiment, add your own constraints, add more beverages and modify the prices!

References


If you liked this article, Tag me in a photo of your favorite morning beverage, or consider offering me a Ko-fi !

About "Programmer as wizard, programmer as engineer"

Programmer as wizard, programmer as engineer - Tedinski.com

We have different design goals and constraints when we’re doing one [wizarding] or the other [engineering]. With engineering, we’re tasked with the long term maintenance of a piece of software, and we also become concerned with many lower-priority properties, such as its performance. With wizarding, we’re often best served by designing for ease of throwing code away rather than maintaining it. Well-engineered code is irreplaceable, but well-cast spells should be transient.

In this post, Ted Kaminski talks about two different styles of programming: wizarding and engineering. At first, all startup prototypes/Minimum Viable Products are created as fast as possible: we need to see if our startup idea is viable first. If the business grows, the developers need to transform this prototype to a stable product, able to grow and include new features. Developers need to shift from the wizarding attitude (everything is temporary) to an engineering mindset (long term maintenance, the software needs to work for a long time and is used by many users).

These two "styles" -and the transition between them- reminds me of the Makers vs Menders difference. A Maker is a developer who love to create stuff quickly and are excited by deadlines and time pressures. On the other hand, a Mender likes older codebases, is satisfied by fixing longstanding bugs and making the whole system more stable and more secure.

How do we do switch from one style to the other, though? The author suggests we should write enough tests to be sure everything is working, separate big applications into microservices when it makes sense, and rewrite critical parts of your system in languages that can offer static (or gradual) typing and good tooling.

Separating concerns into their own services -when it makes sense and does not lead to a distributed monolith- can help reason about the system as a whole, and making each service faster and more stable, especially when paired with the right dose of testing.

I'm also a fan of static languages such as Rust (even in the browser!), that can feature good tools such as cargo, and an expressive language that helps developers implement critical systems with fewer errors.

I'd also like to suggest to recognize when the project should move from wizarding to engineering. Failure to understand when the project needs to transition to the engineering state, and plan for it, is probably the most unappreciated problem I encountered in my career so far. If the team can't slow down and fix growing pains, they will only get stronger and mutate into outages, and the team won't stop firefighting production issues. My suggestion is to keep note of small issues, or tasks that are taking longer than expected for technical reasons, and expose them during a retrospective. Having senior engineers in the team also helps, especially if they are experienced with the project's technology stack.


So, I hope you liked the first post of Interesting Links, a new column where I highlight interesting articles, with my own comments and thoughts. If you liked this post, you should totally share it on Twitter, or support me on Ko-fi.

Rust and WebAssembly - what I learnt from porting hrm-interpreter to wasm

Some time ago, I wrote a compiler for a fictional assembly-like language. As this compiler (named hrm-compiler) performs some optimizations on the code, I wanted to know if the optimizations were breaking the compiled code. To test them, I wrote a small interpreter (hrm-interpreter) to run a JSON version of the compiled code, and then embedded the webassembly version in the bare-bone debugger I wrote as a web application.

If you want to try, the best way to get started is the Rust and Webassembly Book , a tutorial that lets you write a Rust crate that can interact with Javascript and publish it to npm. I followed the tutorial, and managed to create a wasm wrapper of hrm-interpreter, named (such fantasy! amaze!) hrm-interpreter-wasm. Let me explain what I learnt while creating it.

Write a different crate to support wasm

If you need to convert Javascript types to Rust ones, I'd suggest to write a specific crate. It's better to think about it as if you're writing a wrapper. This new crate lets you write wasm-specific code, without modifying your library to support this new platform.

Not every library needs to create an ad-hoc crate, though: handlebars-rust offers users a flag to turn off filesystem-based I/O. This may work if your main library does not expose complex types, or you are already handling platform-specific stuff.

Don't force file I/O in the main library

What does an interpreter do?

  1. read a file
  2. convert its content to a list of operations
  3. execute each operation until the program ends or some error is raised
  4. writes the output of the program to a file (or a terminal)

Unfortunately, you cannot read or write files in a wasm environment - you cannot perform I/O! So we need to find another way: let's look at the list again.

  1. read a file and extracts its text as a huge string
  2. convert the content of the string to a list of operations
  3. execute each operation until the program ends or some error is raised
  4. captures the output of the program into a string
  5. writes the output string to a file (or a terminal)

So, we may notice that the focus is now on strings, not files.

As javascript can perform I/O (read files via FileReader, generate new files via FileWriter, or even do HTTP requests!), our wrapper crate may just take a string, pass it to the main crate and return the output string. In my case, I had to create some new methods to sidestep I/O from files.

There are other cases where you may need to modify your crate to port it to wasm: I suggest reading Which Crates Work with Wasm? to learn more.

try to expose as little state as possible - use functions instead

Javascript cannot manipulate complex stuff, such as options, so you cannot export whatever you want. Before wasting several hours trying to understand why you cannot "export" your beautiful struct with your awesome Option<Result<Beatiful<Type>>>, check out wasm-bindgen's Supported Types documentation chapter. Trust me :).

To work around it, in hrm-interpreter-wasm...

  • the main structure has no public fields
  • the internal state is serialized to a JSON-formatted string
  • data from Javascript are also JSON-formatted strings

This is not the best interface, but it works for now. "Now" means that I'll fix it once I integrate some work-in-progress stuff, such as the new Rust-based compiler and sourcemap support... If you can, try to avoid serialization, as it's pretty expensive... and we're using Rust in the browser to run code faster! ;)

Browser-side: use webpack 4!

If you want to test your new crate in a real application, you may want to use webpack. Starting from version 4, Webpack ships an internal wasm loader: it makes integrating wasm-based libraries a breeze!

react-create-app offers a very simple way to get started. If you have an older project, I recommend you to switch to react-scripts ^2.1.1 and webpack ^4.26. I am not sure you need to do this in new projects, but I had to "eject" the webpack configuration (webpack.config.dev.js) and exclude ".wasm" files from the extensions managed by the "file" loader.

// "file" loader makes sure those assets get served by WebpackDevServer.
// When you `import` an asset, you get its (virtual) filename.
// In production, they would get copied to the `build` folder.
// This loader doesn't use a "test" so it will catch all modules
// that fall through the other loaders.
{
// Exclude `js` files to keep "css" loader working as it injects
// its runtime that would otherwise be processed through "file" loader.
// Also exclude `html` and `json` extensions so they get processed
// by webpacks internal loaders.
// PLEASE NOTE THE .wasm REGEX HERE!!!
exclude: [/\.(js|mjs|jsx|ts|tsx)$/, /\.html$/, /\.json$/, /\.wasm$/],
loader: require.resolve('file-loader'),
options: {
    name: 'static/media/[name].[hash:8].[ext]',
  },
}

If you found those notes useful, or you want to suggest something else, please tell me on Twitter , or consider offering me a Ko-fi !

New release: ScreenOn v1.1

A new version of ScreenOn is now available! ScreenOn is an application to measure and evaluate how much are you using your smart phone.

The main change in this version is a revamp of the "distribution view", as seen in the screenshot below. I decided to replace the ugly scatter chart of the previous version: it was hard to read, and the "logarithmic/normal" stuff was annoying and unclear.

This new chart separates usage intervals into several classes. It lets the user understand some patterns of its phone usage: a lot of very short changes may mean they were anxious, and several long sessions (over 5 minutes / 300 seconds) may mean that they spend a lot of time reading on their phone, scrolling timelines/feeds or watching videos.

/images/screenon_v1.1_new_distribution_chart.png

This version also features some bug fixes in the data manipulation functions, and handles the "no data" case better. There is some work left to do - we have to teach new users how to feed data to the application from the Automate applet - but it's a first step nonetheless!

We also have an Hacktoberfest contribution! A new contributor integrated ScreenOn with TravisCI, a platform for continuous integration. It will help us understand if some new code introduces regressions or does not build, catching annoying problems before they have a chance to slip into releases!

Changelog

Introduced by Pull Request #2 (thanks @angelbarrera92):

  • TravisCI integration, based on Docker

Introduced by Pull Request #3:

  • (database) Add index on _screenon_ table
  • (frontend) automatically load data from a week ago to today
  • (frontend) handle "no data" case in distribution view and daily usage view
  • (frontend) write tests for client-side data manipulation functions
  • (frontend) rework distribution view: replace scatter chart with horizontal bar chart to show classes of usage (under 5 seconds, 5-10 seconds, 10-30 seconds, 30 seconds to one minute, 1-5 minutes, over 5 minutes)

If you want to test ScreenOn, read its source code and contribute bugfixes... head over to Github! You can also help this project by offering me a Ko-Fi

How to set timezone in px-vis-timeseries

/images/px-vis-timeseries-example.png

px-vis-timeseries is one of the most used web components in the Predix toolkit set. This must-know component is essential, as it lets chart time-based data and events! Unfortunately, it's not so easy to configure it to show the correct timezone. Let's see how to set it right, then.

Disclaimer: in this post, we are discussing px-vis-timeseries, version 4.0.0. This post may not apply to future versions of the components discussed here.

Gotcha #1: the default x-axis timezone is UTC

The default timezone of the X axis is UTC. While it may be the norm in engineering disciplines or technical users, it may confuse normal users: they are used to the timezone they live in, and UTC may confuse them (if they're even aware of it!)

To see the time in the browser's timezone, you need to set x-axis-type="timeLocal". By default, it's set to x-axis-type="time".

<px-vis-timeseries
    x-axis-type="timeLocal"
    chart-data="[[chartData]]"
    height="450"
    show-tooltip
    hide-register
    series-config="[[seriesConfig]]">
</px-vis-timeseries>

Gotcha #2: tooltip and register timezones can be configured separately!

The tooltip and the register are independent components, and px-vis-timeseries does not configure them automatically, nor offers a single configuration point.

While this design choice makes the parent implementation simpler and allows for more flexibility, the users have to remember to configure each sub-component.

The configuration can be provided by setting tooltip-config and register-config, to set the tooltip and the register.

<px-vis-timeseries
    tooltip-config="[[timezoneConfig]]"
    register-config="[[timezoneConfig]]"
    chart-data="[[chartData]]"
    height="450"
    show-tooltip
    hide-register
    series-config="[[seriesConfig]]">
</px-vis-timeseries>

To set the timezone, you can either use a timezone string (such as "Europe/Rome") or use the browser-provided Intl global object to retrieve the current timezone.

this.timezoneConfig = {
    // timezone: "Europe/Rome"
    timezone: Intl.DateTimeFormat().resolvedOptions().timeZone
};

Proof of Concept

I've prepared a minimal proof of concept to experiment with. This PoC is hosted on wintermade.it: it lets you test the code for both gotchas, all in one component. If you want to modify it, there is a live-editable version on Glitch, a code editing service that lets you host your experiments/small websites. In both cases, you need to enable javascript to see the example :D.

Did these notes save your day? Offer me a Ko-Fi ! Or drop me a tweet if you find errors or typos!

ScreenOn: Measuring phone usage with Automate and Python

Some time ago, I was a bit nervous, so I started to browse twitter on my phone. I didn't start using it because I received a notification or a friend sent me an interesting post, but to divert my attention to the task I should have been working on. After some minutes spent scrolling the timeline, it seemed to me that I "woke up" and understood what I was doing. I understood that it was not the first time it happened, but it was just the last occurrence of a long series of "procrastination attack".

I needed to stop wasting my time reading rants of randos on the internet. Before working on a solution, though, I wanted to know how much time I actually was spending on the phone.

So, I started working, and created ScreenOn. The solution is composed of three parts:

  1. an Automate flow to read screen status and save every change to a local sqlite3 database
  2. an Automate flow to push screen changes from sqlite3 to a Python web application
  3. a web application (Python+javascript) to store and see visualizations/stats

The code and the applet are available on Github.

Automate flows

/images/screenon_automate_flow.jpg

Automate is an Android application that lets you develop your own programs via a visual language. To create a program, you add, configure and connect blocks, and each block corresponds to a single action or condition. In a single program, you can create different flows, that are executed in parallel.

As you can see, there are different flows:

  1. Start: detect screen status changes (when it turns on or off), and save them
    to an sqlite database
  2. Backup: save screen changes records to a Python webservice on the same network
  3. Config: configures the database

When the screen turns on or off, the applet saves the current time as a unix timestamp, and the status of the screen (1 = on, 0 = off).

Web application

ScreenOn has a web application, with a Polymer frontend The data are provided by the Python webservice that is also invoked by the Backup Automate flow.

The web application lets us interact with some charts, so let's see them!

/images/screenon_usage_per_day.png

Usage per Day is the default view: for each day in the selected period, it shows the time spent on the phone and how many times the screen was turned on and off that day.

/images/screenon_screen_changes.png

Screen Changes is a cumulative chart of the phone usage time in the selected day, and how many changes triggered the changes. The "screen status changes" line should help users understand short-timed phone usages.

/images/screenon_distribution.png

Distribution of Usage aims to highlight the duration of the usage, using a xy-scatter chart. On the x-axis, we can find the interval of time when the phone screen was turned on. On the y-axis, we can find how many times that interval "happened".

There are two "modes": the default one (Logarithmic) aims to highlight the short-timed usages, by applying the log operation to the time intervals. This operation compresses the larger values, while leaving more space to the smaller values.

The other one, Normal does not apply any compression to the data, and is better suited when we need to analyze longer time intervals.

Analysis

/images/screenon_analysis_usage_april_to_june.png

This is the daily usage chart, from April to June 2018. We can see that the behavior is more or less the same every week: on work days (from Monday to Friday) I use the phone less than during the week-end (Saturday and Sunday). The only exception is 25th of April, that is a national holiday in Italy.

/images/screenon_analysis_screenchanges_20180606.png

This is an example of a Wednesday, a normal work day. There are some things I can show you:

  • My work hours are from 9 AM to 6 PM, so you can see that I don't use the phone very much during that time. The increase near 2 PM is my lunch break: I scroll Twitter, check my private email account, and show small videos to friends.
  • At the time, I was using Youtube as my music player when on the road. It was before the Premium stuff, so you had to keep your screen on even if the phone was in the pocket and you weren't looking at it.
  • Every Wednesday evening, a streamer I follow on Twitch.tv streams a lore walk-through of the whole Metal Gear Solid saga, and every stream is ~3 hours long...
/images/screenon_analysis_screenchanges_20180617_masszoom.png

This screenshot will help us see how the Screen Changes chart looks like when there are a lot of small changes.

As we can see, between 11 AM and 1 PM I used the phone several times in a row for short period of times. We can understand that by seeing that the black line (the time the screen was turned on) grows slowly, but the orange line (the number of changes) grows very fast in the same interval.

Next Steps

No software is ever finished. ScreenOn can benefit from additional work. Some of the next steps are purely technical, others aim to capture more data in order to better understand why the user is using their phone and how they're spending their time.

1) CSV Exports

Sure, an user could just stop the web application, dump the database and restart the application, but why should they do that? What if you want to perform some other kind of analysis on the data? I want to empower the users and let them create their own analyses.

2) Handle timezones better

Right now, the Automate flow that records the screen changes only records an unix timestamp. It may works in a narrow set of conditions, such as the phone never changes the timezone and the browser is in the same timezone of the phone. ScreenOn can do better: the user should be able to set a preferred timezone, or set the chart to use data's timezone!

3) Twitch/Youtube is not Twitter/Mastodon scrolling!

In the analysis, we saw that the phone was used for long period of times during evenings and nights. Personally, I prefer to turn on Youtube or Twitch on my mobile device, either as TV-like device or to actively see my favourite content creators on those platforms. I don't know how to implement it yet, but it should not be very difficult.

4) Ask why you are using the phone, if using it for more that X minutes or X times in an hour

There is a reason if someone is using their phone so much. Maybe they're stuck waiting in traffic, or waiting for the build to finish. What if they just could be anxious or don't want to think about something that makes them nervous?

If this is the case, "setting a goal to use the phone less" is just treating the symptoms instead of curing the disease: the phone became an anxiety-relief device, nervous people will use something else instead. A tracker like ScreenOn cannot register context, so the Automate applet may periodically ask the user how they feel and why are they using the phone.

If you want to propose a new kind of chart, or point out some errors and typos, don't hesitate and let me know! Contact me on Twitter. If you liked this article, why don't you offer me a Ko-Fi ?

HRMc optimizations - unreachable code removal

tl;dr: this post explains a compiler optimization based on graph visits to remove the code that won’t be executed in the generated source code

In HRMc I try to keep every transformation as small as possible, have a minimum set of assumptions about the input, and avoid having to think about future transformations. This philosophy means that every pass does just what it should do, but it also implies that some stuff that will be considered “useless” after the transformation is not removed.

Let’s look at the jump compression optimization: after the execution of that transformation, we can see that the last jmp start is not going to be executed.

# hrmc example.hrm --no-unreachable # jump-compression is enabled, unreachable-code is not
start:        inbox     <------|
              jez start  ---->-|
              outbox           |
              jmp start  ---->-|
_hrm_1,
_hrm_endif_1: jmp start  <-------- won't be executed! ever!

Obviously, an optimization named “jump compression” should not bother removing that instruction. We’d need a different pass to deal with that.

traversing the execution graph

Let’s talk about instructions, the execution flow of a program and graphs.

# A simple program in HRM that for each item
# in the inbox, filters and writes all the "zero"s.
start:  inbox
        jez write
        jmp start
        copyto 1   # this won't be executed! pay attention!
write:  outbox
        jmp start

In most programming languages, instructions are executed sequentially, from start to end. To be able to express complex programs, a programming language must leave a way to select different instructions according to a specific state (if <condition>) and a way to repeat instructions until some condition happens (while <condition>). In lower-languages, they are implemented using jumps. Conditional jumps (j<condition> <label>) set the next instruction to the instruction tagged with <label> only if the <condition> is respected, continues to the successive instruction otherwise. Unconditional jumps (jmp <label>) set the next instruction to the one tagged with <label>.

start
+------------+
| inbox      <----------+
+------------+   |      |
      |          |      |
+-----v------+   |      |
| jez write  |   |      |
+------------+   |      |
   |      |false |      |
   |      |      |      |
   |   +--v---------+   |
   |   | jmp start  |   |
   |   +------------+   |
   |true                |
   |                    |
   |   +------------+   |
   |   | copyto 1   |   |
   |   +------------+   |
   |                    |
   |     write          |
+--v---------+          |
|  outbox    |          |
+------------+          |
    |                   |
+---v--------+          |
| jmp start  +----------+
+------------+

This is the execution flow of the program written before: we can see it looks like a graph, and in fact it is! Most of boxes have one outward arrow to the next box, and the box containing a “conditional jump” (jez write) has two arrows pointing to two different boxes! It is clear that there is a one-to-one correspondence between boxes and instructions.

Let’s notice a small thing: there is a box with no arrows pointing to it. It means that the instruction inside the box (copyto 1) won’t be executed! Why can I say so? Because that box is not reachable from the first box: there does not exist a path from inbox to copyto 1. Instead, outbox is reachable: the path is inbox -> jez write -> outbox. copyto 1 is marked as unreachable code, or dead code because it won’t be executed. Removing such instructions does not modify what a program does!

How can we understand if an instruction will be executed? The execution flow graph is… a graph, so we can perform a depth-first search on it and see which nodes are visited or not.

There is a more interesting fact, though: we can use that algorithm to generate a new graph without unreachable instructions! The rule to follow is: if the algorithm visits a conditional jump instruction, always visit the branch that would be executed if the condition would evaluate to false. This rule will help us generate an equivalent code if you linearize the graph.

The non-recursive version of the depth-first search algorithm implemented in conversion.py of hrm-compiler identifies visited nodes and keeps track of the labels associated to the visited nodes.

Did you enjoy this explanation? Tell me what you think about it. If it helped you, please consider offering me a Ko-fi !