.. title: "Time, Clocks and the Ordering of Events in a Distributed System"
.. slug: logical-clocks-lamport-timestamps
.. date: 2019-09-23 23:35:00 UTC+02:00
.. tags: logical clocks, distributed systems
.. category: interesting links
.. link: https://www.microsoft.com/en-us/research/publication/time-clocks-ordering-events-distributed-system/
.. description: Let's talk about time, clocks and using counters to order events in a distributed system
.. type: text
In a distributed system, understanding whether an event happened before another event is a difficult task, but it's necessary
to better understand how the system is behaving: what event caused another event?
One solution was proposed by Lamport in his paper `Time, Clocks and the Ordering of Events in a Distributed System`_:
a clock that is updated only via messages sent inside the system, without using external sources (eg physical time).
What is a distributed system?
-----------------------------
A distributed system consists of a collection of distinct processes which are spatially separated,
and which communicate with one another by exchanging messages. The paper, while acknowledging that the remarks of
this paper are more general, considers a system as *distributed* if the message transmission delay
is *not negligible* compared to the time between events in a single process.
Events can be anything that can be considered important in the system,
for example running a certain subroutine, or sending/receiving messages from other processes.
Why do we need to know the order of events?
-------------------------------------------
The knowledge of causal precedence relation among the events
of processes helps solve a variety of problems in distributed
systems, such as distributed algorithms design, tracking
of dependent events, knowledge about the progress of a computation,
and concurrency measures.
For example, CockroachDB is using a variant of Lamport's timestamps (the logical clocks explained in this
article) to order database transactions. In this very article, Lamport shows how his timestamps
let him solve a distributed variant of the mutual exclusion problem - we will not discuss it in this blog post, though.
About time
----------
From the first paragraph of the article:
The concept of the temporal ordering of events pervades our thinking about systems.
[...]
However, we will see that this concept must be carefully reexamined when considering
events in a distributed system.
The first observation is: in distributed systems, we cannot use our intuition about time
to decide if something happened before something else, or if some action can be accepted
by the system. `There is no "now"`_ in distributed systems, especially in the geographically
distributed ones.
In a distributed system, it is sometimes impossible
to say that one of two events occurred first.
We cannot say that something happened in a specific moment, because
what your "specific moment" and my "specific moment" may not be the same.
We can only say that **"happened before"** is a partial ordering of the events in the system.
How can we fix, or at least work around, this uncertainty? Let's use something else in place of
physical clocks: logical clocks. The rest of the article explains what is it, and the rules behind them.
Defining "Happened before"
--------------------------
It's now time to define the **"happened before"** relation.
The relation "happened before" :math:`\rightarrow` on the set of events of a system is defined as the smallest relation satisfying
the following conditions:
1. If *a* and *b* are events in the same process, and *a* comes before *b*, then :math:`a \rightarrow b`
2. If *a* is the sending of a message by one process and *b* is the receipt of the same message by
another process, then :math:`a \rightarrow b`
3. If :math:`a \rightarrow b` and :math:`b \rightarrow c`, then :math:`a \rightarrow c`.
Events are said to be *concurrent* if both :math:`a \rightarrow b` and :math:`b \rightarrow a` are false.
Assuming that events cannot happen before themselves, the relation is an irreflexive [#f1]_ partial ordering [#f2]_
on the set of all events in the system.
Giving a number to an event
---------------------------
We now have a relation between events that defines if one of two events happened before the other one.
In order to use it, we need to assign a number to an event - let's call this function **Clock**.
Each process :math:`P_i` has its own Clock :math:`C_i`.
The entire systems of Clocks is represented by the function C, which assigns to any event b the number :math:`C(b)`,
where :math:`C(b) = C_j(b)` if :math:`b` is an event in the process :math:`P_j`.
The C function (clock of the system) has to respect the *Clock Condition*.
*Clock Condition*:
for any event *a, b*, if :math:`a \rightarrow b` then :math:`C(a) < C(b)`
We cannot expect the converse condition to hold as well, since that would imply that any two concurrent events
must occur at the same time.
From our definition of *happened before*, the Clock Condition is satisfied if the following two conditions hold:
* **C1**: If *a* and *b* are events in process :math:`P_i`, and *a* comes before *b*, then :math:`C_i(a) < C_i(b)`
* **C2**: If *a* is the sending of a message by process :math:`P_i` and *b* is the receipt of that message by process :math:`P_j`, then :math:`C_i(a) < C_j(b)`
The first condition, **C1**, helps us order two events in the same process: if one happened before the other,
then the event that happened before must be assigned a lower number.
The second condition, **C2**, tells us how we should handle the communication between processes. Sending or receiving
a message is an event too: sending a message must have a lower number than the receipt of the same message -
*can you receive a message before it is sent?*
To guarantee that the system of clocks satisfies the Clock Condition, we need two implementation rules:
* **IR1**: Each process :math:`P_i` increments :math:`C_i` between any two successive events (the increment itself is **not** an event!)
* **IR2 (a)**: If event *a* is the sending of a message *m* by process :math:`P_i`, then then message *m* contains a timestamp :math:`T_m=C_i(a)`
* **IR2 (b)**: Upon receiving a message *m*, process :math:`P_j` sets :math:`C_j` greater than or equal to its present value and greater than :math:`T_m`
IR1 insures that the clock is updated by every process (and C1 is satisfied).
IR2 insures that C2 is satisfied by describing the value that should be associated
to each message and how receiving processes should handle the timestamp in received messages.
From a partial to a total ordering
----------------------------------
The **Clock** function described in the previous paragraph lets us order the events in a partial order.
Why is it a problem? It may be a problem because events in different processes may have the same number - which one
happened before?
To break ties, we can use any arbitrary total ordering :math:`\prec` of the processes. It let us define
a new *happened before* relation, described by the symbol :math:`\Rightarrow`.
*happened before* :math:`\Rightarrow`:
If *a* is an event in process :math:`P_i` and *b* is an event, then
:math:`a \Rightarrow b` if and only if either
1. :math:`C_i(a) < C_j(b)`
2. :math:`C_i(a) = C_j(b)` and :math:`P_i \prec P_j`
Drawbacks of Lamport's timestamps
---------------------------------
So... we just solved all of our problems, right? Unfortunately, no.
This kind of logical clocks have some problems.
Let's look at the first one:
:math:`a \rightarrow b \implies C(a) < C(b)`, but the converse is not true!
We cannot use the clock values to order events!
We will discuss about a solution of this, named `Vector Clock`_, in a different post.
Anomalous Behavior, aka "out-of-band messages mess up with the ordering!"
-------------------------------------------------------------------------
There is a different problem, too.
The new *happened before* relation :math:`\Rightarrow` does not protect us from anomalous behavior, if
the ordering obtained by this algorithm differs from that perceived by the user.
Lamport asks to imagine two friends, A and B, using the same distributed computer system.
Let's suppose they perform the following steps:
1. Friend A issues a request :math:`a` on their computer
2. Friend A telephones Friend B, telling them to issue a new request
3. Friend B issues a request :math:`b` on their computer (different from Friend A's)
Our system may order the request named :math:`b` before request :math:`a`, and it's not even wrong!
The problem, in this thought experiment, is that the phone call is not an event we recorded in our system,
so we did not assign it a number, so we cannot use it to order the two requests.
Formally, we have two sets of events. :math:`L` is the set of all system events. Unfortunately,
the phone call is not in :math:`L`, but in :math:`L′`, the set of events which contains the events in :math:`L`
together with all other relevant external events. Let then ⮕ denote an "happened before" relation on :math:`L′`.
We can see that :math:`a \rightarrow b` is false, but :math:`a ⮕ b` is true!
It is impossible to avoid anomalous behavior when the event ordering system we set up only uses events in :math:`L`,
but doesn't use the ones in :math:`L′`. What can we do, then?
We have two possibilities: either we *explicitely introduce into the system the necessary information about the ordering ⮕*,
or we construct a system of clocks that satisfy a stronger condition.
In the first case, we give the user the responsibility for avoiding anomalous behavior - in this case, at step 2 Friend B
should have asked the timestamp :math:`T_a` of request :math:`a`, and specify that the request :math:`b` should
be given a timestamp later than :math:`T_a`.
We will explore the second possibility in a later post, exploring we can define a stronger clock condition
and how we can use physical clocks to build a clock function that satisfies it.
-----------------------------------------------------------------
*Did you like this article? Did you find an error? Don't hesitate and let me know! Contact me on* Twitter_, *or send me an* email_.
*Subscribe to the* `RSS feed`_ *to read the next article!*
*If you want to read more articles like this one, consider offering me a* Ko-Fi_.
.. [#f1] A relation is "irreflexive" if it does not relate any element to itself - an example is "greater than" on the real numbers: a real number cannot be greater than itself
.. [#f2] A "partial ordering" is a relation that defines an order between some, not all, of the elements in the set the relation is defined over.
.. _`Vector Clock`: https://en.wikipedia.org/wiki/Vector_clock
.. _`Time, Clocks and the Ordering of Events in a Distributed System`: https://www.microsoft.com/en-us/research/publication/time-clocks-ordering-events-distributed-system/
.. _`There is no "now"`: https://queue.acm.org/detail.cfm?id=2745385
.. _Twitter: https://twitter.com/alfateam123
.. _email: mailto:winter@wintermade.it
.. _RSS feed: https://wintermade.it/blog/rss.xml
.. _Ko-fi: https://www.ko-fi.com/alessandrobalzano