In a distributed system, understanding whether an event happened before another event is a difficult task, but it's necessary to better understand how the system is behaving: what event caused another event?
One solution was proposed by Lamport in his paper Time, Clocks and the Ordering of Events in a Distributed System: a clock that is updated only via messages sent inside the system, without using external sources (eg physical time).
What is a distributed system?
A distributed system consists of a collection of distinct processes which are spatially separated, and which communicate with one another by exchanging messages. The paper, while acknowledging that the remarks of this paper are more general, considers a system as distributed if the message transmission delay is not negligible compared to the time between events in a single process.
Events can be anything that can be considered important in the system, for example running a certain subroutine, or sending/receiving messages from other processes.
Why do we need to know the order of events?
The knowledge of causal precedence relation among the events of processes helps solve a variety of problems in distributed systems, such as distributed algorithms design, tracking of dependent events, knowledge about the progress of a computation, and concurrency measures. For example, CockroachDB is using a variant of Lamport's timestamps (the logical clocks explained in this article) to order database transactions. In this very article, Lamport shows how his timestamps let him solve a distributed variant of the mutual exclusion problem - we will not discuss it in this blog post, though.
- From the first paragraph of the article:
The concept of the temporal ordering of events pervades our thinking about systems. [...] However, we will see that this concept must be carefully reexamined when considering events in a distributed system.
The first observation is: in distributed systems, we cannot use our intuition about time to decide if something happened before something else, or if some action can be accepted by the system. There is no "now" in distributed systems, especially in the geographically distributed ones.
In a distributed system, it is sometimes impossible to say that one of two events occurred first. We cannot say that something happened in a specific moment, because what your "specific moment" and my "specific moment" may not be the same. We can only say that "happened before" is a partial ordering of the events in the system.
How can we fix, or at least work around, this uncertainty? Let's use something else in place of physical clocks: logical clocks. The rest of the article explains what is it, and the rules behind them.
Defining "Happened before"
It's now time to define the "happened before" relation.
The relation "happened before" → on the set of events of a system is defined as the smallest relation satisfying the following conditions:
- If a and b are events in the same process, and a comes before b, then a → b
- If a is the sending of a message by one process and b is the receipt of the same message by another process, then a → b
- If a → b and b → c, then a → c. Events are said to be concurrent if both a → b and b → a are false.
Giving a number to an event
We now have a relation between events that defines if one of two events happened before the other one. In order to use it, we need to assign a number to an event - let's call this function Clock. Each process Pi has its own Clock Ci. The entire systems of Clocks is represented by the function C, which assigns to any event b the number C(b), where C(b) = Cj(b) if b is an event in the process Pj.
The C function (clock of the system) has to respect the Clock Condition.
- Clock Condition:
for any event a, b, if a → b then C(a) < C(b)
We cannot expect the converse condition to hold as well, since that would imply that any two concurrent events must occur at the same time.
From our definition of happened before, the Clock Condition is satisfied if the following two conditions hold:
- C1: If a and b are events in process Pi, and a comes before b, then Ci(a) < Ci(b)
- C2: If a is the sending of a message by process Pi and b is the receipt of that message by process Pj, then Ci(a) < Cj(b)
The first condition, C1, helps us order two events in the same process: if one happened before the other, then the event that happened before must be assigned a lower number. The second condition, C2, tells us how we should handle the communication between processes. Sending or receiving a message is an event too: sending a message must have a lower number than the receipt of the same message -can you receive a message before it is sent?
To guarantee that the system of clocks satisfies the Clock Condition, we need two implementation rules:
- IR1: Each process Pi increments Ci between any two successive events (the increment itself is not an event!)
- IR2 (a): If event a is the sending of a message m by process Pi, then then message m contains a timestamp Tm = Ci(a)
- IR2 (b): Upon receiving a message m, process Pj sets Cj greater than or equal to its present value and greater than Tm
IR1 insures that the clock is updated by every process (and C1 is satisfied). IR2 insures that C2 is satisfied by describing the value that should be associated to each message and how receiving processes should handle the timestamp in received messages.
From a partial to a total ordering
The Clock function described in the previous paragraph lets us order the events in a partial order. Why is it a problem? It may be a problem because events in different processes may have the same number - which one happened before?
To break ties, we can use any arbitrary total ordering ≺ of the processes. It let us define a new happened before relation, described by the symbol ⇒.
- happened before ⇒:
If a is an event in process Pi and b is an event, then a ⇒ b if and only if either
- Ci(a) < Cj(b)
- Ci(a) = Cj(b) and Pi ≺ Pj
Drawbacks of Lamport's timestamps
So... we just solved all of our problems, right? Unfortunately, no. This kind of logical clocks have some problems. Let's look at the first one: a → b ⟹ C(a) < C(b), but the converse is not true! We cannot use the clock values to order events! We will discuss about a solution of this, named Vector Clock, in a different post.
Anomalous Behavior, aka "out-of-band messages mess up with the ordering!"
There is a different problem, too. The new happened before relation ⇒ does not protect us from anomalous behavior, if the ordering obtained by this algorithm differs from that perceived by the user.
Lamport asks to imagine two friends, A and B, using the same distributed computer system. Let's suppose they perform the following steps:
- Friend A issues a request a on their computer
- Friend A telephones Friend B, telling them to issue a new request
- Friend B issues a request b on their computer (different from Friend A's)
Our system may order the request named b before request a, and it's not even wrong! The problem, in this thought experiment, is that the phone call is not an event we recorded in our system, so we did not assign it a number, so we cannot use it to order the two requests.
Formally, we have two sets of events. L is the set of all system events. Unfortunately, the phone call is not in L, but in L′, the set of events which contains the events in L together with all other relevant external events. Let then ⮕ denote an "happened before" relation on L′. We can see that a → b is false, but a⮕b is true!
It is impossible to avoid anomalous behavior when the event ordering system we set up only uses events in L, but doesn't use the ones in L′. What can we do, then? We have two possibilities: either we explicitely introduce into the system the necessary information about the ordering ⮕, or we construct a system of clocks that satisfy a stronger condition.
In the first case, we give the user the responsibility for avoiding anomalous behavior - in this case, at step 2 Friend B should have asked the timestamp Ta of request a, and specify that the request b should be given a timestamp later than Ta.
We will explore the second possibility in a later post, exploring we can define a stronger clock condition and how we can use physical clocks to build a clock function that satisfies it.
Did you like this article? Did you find an error? Don't hesitate and let me know! Contact me on Twitter, or send me an email. Subscribe to the RSS feed to read the next article! If you want to read more articles like this one, consider offering me a Ko-Fi.
A relation is "irreflexive" if it does not relate any element to itself - an example is "greater than" on the real numbers: a real number cannot be greater than itself↩︎
A "partial ordering" is a relation that defines an order between some, not all, of the elements in the set the relation is defined over.↩︎