Concurrent Computing Summary

CS-453, 2023 Autumn, EPFL

This is the quick summary from course material Concurrent Computing.

Registers

A register has two operations: read() and write()

Sequential specification

read():
  return(x)

write(v):
  x <- v; return(ok)

Simplifications

We assume that registers contain only integers

Unless explicitly stated otherwise, registers are initially supposed to contain 0

Dimensions

2 decades of hard work

Theorem: A multivalued MRMW atomic register can be implemented with binary SRSW safe registers

Conventions (1)

The process executing the code is implicitly assumed to be pi

We assume a system of n processes

NB. We distinguish base and high-level registers

Conventions (2)

The operations to be implemented are denoted Read() and Write()

Those of the base registers are denoted read() and write()

We omit the return(ok) instruction at the end of Write() implementations

(1) From (binary) SRSW safe to (binary) MRSW safe

We use an array of SRSW registers Reg[1,…,N]

Read():
  return(Reg[i].read());

Write(v):
  for j = 1 to N 
	  Reg[j].write(v);

The transformation works also for multi-valued registers and regular ones

It does not however work for atomic registers

SRSW safe -> MRSW safe ✅
SRSW regular -> MRSW regular ✅
SRSW atomic -> MRSW atomic ❌

(2) From binary MRSW safe to binary MRSW regular

We use one MRSW safe register

Read():
  return(Reg.read());

Write(v):
  if old  v then 
    Reg.write(v);
    old := v;

The transformation works for single reader registers

It does not work for multi-valued registers

It does not work for atomic registers

(3) From binary to M-valued MRSW regular

We use an array of MRSW registers Reg[0,1,…,M] init to [1,0,…,0] (Unary Coding)

Read():
  for j = 0 to M
	  if Reg[j].read() = 1 then return(j)

Write(v):
  Reg[v].write(1);
    for j = v-1 downto 0
	  Reg[j].write(0);

The transformation would not work if the Write() would first write 0s and then 1

The transformation works for regular but NOT for atomic registers

(4) From SRSW regular to SRSW atomic

We use one SRSW register Reg and two local variables t and x

Read():
  (t,x) := Reg.read();
  if t > t then t := t; x := x; 
  return(x)

Write(v):
  t := t+1;
  Reg.write(v,t);

The transformation would not work for multiple readers

The transformation would not work without timestamps (variable t represents logical time, i.e., timestamp)

(5) From SRSW atomic to MRSW atomic

We use N*N SRSW atomic registers RReg[(1,1),(1,2),…,(k,j),…,(N,N)] to communicate among the readers

Write(v):
  t1 := t1+1;
  for j = 1 to N
    WReg[j].write(v,t1);

Read():
  for j = 1 to N do
    (t[j],x[j]) := RReg[i,j].read();
  (t[0],x[0]) := WReg[i].read();
  (t,x) := highest(t[..],x[..]); // Value with highest timestamp
  for j = 1 to N do
    RReg[j,i].write(t,x);
  return(x)

The transformation would not work for multiple writers

The transformation would not work if the readers do not communicate (i.e., if a reader does not write)

(6) From MRSW atomic to MRMW atomic

We use N MRSW atomic registers Reg[1,…,N]; the writer of Reg[j] is pj

Write(v):
  for j = 1 to N do
    (t[j],x[j]) := Reg[j].read();
  (t,x) := highest(t[..],x[..]);
  t := t+1;
  Reg[i].write(t,v);

Read():
  for j = 1 to N do
    (t[j],x[j]) := Reg[j].read();
  (t,x) := highest(t[..],x[..]);
  return(x)

Transformation Table

The entry in the table shows the base component and the method to form the corresponding register.

(1) Dimension 1 - Binary

Binary SRSW MRSW MRMW
Safe   (1) Safe SRSW  
Regular   (1) Regular SRSW
(2) Safe MRSW
 
Atomic (4) Regular SRSW (5) Atomic SRSW (6) Atomic MRSW

(2) Dimension 1 - M-valued

M-valued SRSW MRSW MRMW
Safe   (1) Safe SRSW  
Regular   (1) Regular SRSW
(3) Binary Regular MRSW
 
Atomic (4) Regular SRSW (5) Atomic SRSW (6) Atomic MRSW

The Power of Registers

Question 1: what objects can we implement with registers?

Question 2: what objects we cannot implement?

Wait-free implementations of atomic objects

Counter

A counter has two operations inc() and read() and maintains an integer x init to 0

Sequential Spec

read():
  return(x)

inc():
  x := x+1; return(ok)

Naive implementation

The processes share one register Reg

read():
  return(Reg.read())

inc():
  temp := Reg.read()+1;
  Reg.write(temp);
  return(ok)

Atomic implementation

The processes share an array of registers Reg[1,…,N]

inc():
  Reg[i].write(Reg[i].read()+1);
  return(ok)

read():
  sum := 0;
  for j = 1 to n do
    sum := sum + Reg[j].read();
  return(sum)

Snapshot

A snapshot has operations update() and scan() and maintains an array x of size N

Sequential Spec

scan():
  return(x)

update(i,v):
  x[i] := v;
  return(ok)

Naive implementation

The processes share one array of N registers Reg[1,…,N]

scan():
  for j = 1 to N do
    x[j] := Reg[j].read();
  return(x)

update(i,v):
  Reg[i].write(v);
  return(ok)

Non-atomic vs atomic snapshot

What we implement here is some kind of regular snapshot:

A scan returns, for every index of the snapshot, the last written values or the value of any concurrent update

We call it collect

Key idea for atomicity

To scan, a process keeps reading the entire snapshot (i.e., it collect), until two results are the same
This means that the snapshot did not change, and it is safe to return without violating atomicity

Same value vs. Same timestamp

Same value ↛ Same timestamp
Same value ← Same timestamp

Enforcing atomicity

The processes share one array of N registers Reg[1,…,N]; each contains a value and a timestamp

We use the following operation for modularity

collect():
  for j = 1 to N do
    x[j] := Reg[j].read();
  return(x)

scan():
  temp1 := self.collect();
  while(true) do
    temp2 := self.collect();
    if (temp1 = temp2) then
      return(temp1.val)
    temp1 := temp2;

update(i,v):
  ts := ts + 1;
  Reg[i].write(v,ts);
  return(ok)

Wait-freedom?

Key idea for atomicity & wait-freedom
The processes share an array of registers Reg[1,…,N] that contains each:

To scan, a process keeps collecting and returns a collect if it did not change, or some collect returned by a concurrent scan
Timestamps are used to check if the collect changes or if a scan has been taken in the meantime

To update, a process scans and writes the value, the new timestamp and the result of the scan

Snapshot implementation

Every process keeps a local timestamp ts

update(i,v):
  ts := ts+1;
  Reg[i].write(v,ts,self.scan());
  return(ok)

scan():
  t1 := self.collect(); 
  t2 := t1
  while(true) do
    t3 := self.collect();
    if (t3 = t2) then return (t3[..].value); // Return value for 1 to N in t3
    for j = 1 to N do
      if (t3[j].timestamp  t1[j].timestamp+2) then // Have update after scan() begin
        return (t3[j].scancopy)
    t2 := t3

Possible execution?

Writing while reading registers

We have seen how to:
Transform Safe SRSW to Atomic MRMW registers:

  1. By writing (the underlying registers) while reading
  2. By using infinite memory/registers (timestamps are forever increasing)

Can we do better?

Bound on SWSR atomic register implementations

Theorem 1. There is no wait-free algorithm that:

Proving an impossibility result

General strategy

Build a “general” counter-example (adversarial model)

For any algorithm (given the hypotheses), you (the adversary) can always create a scenario/example that violates (one of) the properties it is meant to achieve

Building an adversarial counter-example

Good strategies/steps:

  1. Simplify the problem as much as possible (without loss of generality)
  2. Carefully analyse the hypotheses to get ideas

Simplify the problem

Let us assume (without loss of generality) that (Hypothesis):

  1. The higher-level register is binary (i.e., the atomic SWSR register)
  2. Instead of finitely many SWSR regular registers, there is only one SWSR regular register (called reg)

Analyzing the “ingredients” (hypotheses)

We are the adversary crafting a general counter-example. Consider any algorithm that implements the atomic SWSR register.

  1. Wait free → any operation must eventually complete if run infinitely long
  2. Finite number of bounded registers → the underlying register reg can only take a finite (limited) number of different values
  3. Reg is regular → reg.read() can return its previous value when if there is a concurrent write to reg
  4. The base registers can only be written by the writer → the Read() operation does not change reg → the state of reg is controlled entirely by the writer

Constructing the adversarial example

The writer alternates between writing 0 and 1 on the atomic register, forever.

What is the state of reg after each Write(0)?

In Hypothesis 2: Finite number of bounded registers → the underlying register reg can only take a finite (limited) number of different values
There is a value v0 that appears an infinite number of times after Write(0)
There is an infinite number of Write(1) operations, starting after reg in state v0, in which reg ends in state vn after the Write(1).
A Write(1) might perform multiple writes to reg, so reg might not change from v0 to vn immediately.

Claim: there are values v0, v1, … vn, such that reg changes directly from v(i) to v(i+1) infinitely often.

Bound on SWMR atomic register implementations

Theorem 2. There is no wait-free algorithm that:

The Limitations of Registers

Question 1: what objects can we implement with registers?

Question 2: what objects we cannot implement?

Fetch&Inc

The consensus object

One operation propose() which returns a value. When a propose operation returns, we say that the process decides

2-Consensus with Fetch&Inc

Uses two registers R0 and R1, and a Fetch&Inc object C (with one fetch&inc() operation that returns its value) (NB. The value in C is initialized to 0)

Process pI:

propose(vI):
	R[I].write(vI)
	val := C.fetch&inc()
	if(val = 1) then return(vI) // Implying self gets initial increment (winner), decides on own proposed value
	else return(R[1-I].read()) // Implying other gets initial increment (loser), decides on other's proposed value

Impossibility [FLP85,LA87]

Queue

2-Consensus with queues

Uses two registers R0 and R1, and a queue Q; Q is initialized to {winner, loser}

Process pI:

propose(vI):
	R[I].write(vI)
	item := Q.dequeue()
	if item = winner return(vI)
	else return(R[1-I].read())

Correctness

Proof (algorithm):

More consensus implementations

Sequential spec:

test&set():
	y := x; x: = 1; return(y)

Sequential spec:

compare&swap(old,new):
	if x = old then x := new; return(x)

2-Consensus with Test&Set

Uses two registers R0 and R1, and a Test&Set object T

Process pI:

propose(vI):
	R[I].write(vI)
	val := T.test&set()
	if(val = 0) then return(vI)
	else return(R[1-I].read())

N-Consensus with Compare&Swap

Uses a Compare&Swap object C

Process pI:

propose(vI):
	val := C.compare&swap(, vI)
	if(val = ) then return(vI)
	else return(val)

Impossibility (Proof)

Elements of the model

Distributed computing is a game

Consensus

Impossibility (elements)

  1. a (initial) configuration C is a set of (initial) values of p0 and p1 together with the values of the registers: R1 … Rk …;
  2. a step is an elementary action executed by some process pI: it consists in reading or writing a value in a register and changing pI’s state according to the algorithm A;
  3. a schedule S is a sequence of steps; S(C) denotes the configuration that results from applying S to C.

Impossibility (structure)

Lemma 1

The initial configuration C(0,1) is bivalent.

Proof:
Consider C(0,0) and p1 not taking any step:

Hence the bivalency.

Lemma 2

Given any bivalent configuration C, there is an arbitrarily long schedule S such that S(C) is bivalent.

Proof (by contradiction):
Let S be the schedule with the maximal length such as D = S(C) is bivalent; p0(D) and p1(D) are both univalent: one of them is 0-valent (say p0(D)) and the other is 1-valent (say p1(D))

Thus both p0 and p1 write in R to go from D to p0(D) (resp., p1(D)). But then p0(p1(D)) = p0(D) (resp. p1(p0(D)) = p1(D)) — a contradiction.

Conclusion

Reference

Encyclopedia of Algorithms, Second Edition, Page 152, Asynchronous Consensus Impossibility, 1985; Fischer, Lynch, Paterson

Universal constructions

Universality [Her91]

Consensus

Theorem 1: Consensus is universal [Her91]

Shared memory model = Registers (read-write) + Consensus objects

The consensus object

One operation propose() which returns a value. When a propose returns, the process decides.

Universality

Example

Universal construction (1)

Example (FIFO Queue) Non-linearizable execution

Universal algorithm (1)

Shared objects

We use an ordered list of consensus objects:

The algorithm combines the shared registers Lreq[I] and the consensus object list Lcons to ensure that:

Linearization (FIFO Queue)

Local data structures

Every process also uses two local data structures:

Every request is uniquely identified

Tasks

Every process pI executes three // tasks:

While (lInv – lPerf) is not empty

Example (FIFO Queue)

Correctness

Proof (sketch):
Assume by contradiction that pI does not return from that invocation;
pI puts req into Lreq (Task 1);
eventually, every proposed lInv - lPerf contains req (Task 2);
and the consensus decision contains req (Task 3);
the result is then eventually returned (Task 3)

Proof (sketch): the processes agree on the same total order for sets of requests and then use the same order within every set of requests (the linearization order is determined by the integers associated with the consensus)

Proof (sketch): it directly follows from the algorithm that the result of req2 is based on the state of req1

Why not?

Every process pI executes three // tasks:

Universality (1 + 2)

A restricted deterministic type

The type defined by $\delta’$ is deterministic

It is sufficient to implement a type defined by $\delta’$ !

Task 3 (Preserving non-determinism)
While lInv–lPerf is not empty

Implementing the Consensus Object with Timing Assumptions

A modular approach

We implement Wait-free Consensus (Consensus) through:
Lock-free Consensus (L-Consensus) and _Registers

We implement L-Consensus through:
Obstruction-free Consensus (O-Consensus) and <>Leader (encapsulating timing assumptions and sometimes denoted by \(\Omega\))

Consensus

L-Consensus

O-Consensus

O-Consensus

Idea

Data

Functions

To simplify the presentation, we assume two functions applied to Reg[1,…,N]

Algorithm

propose(v): 
  while(true) do
    Reg[i].T.write(ts); // (1)
    val := Reg[1,,n].highestTspValue(); // (2)
    if val =  then val := v;
    Reg[i].V.write(val,ts); // (3)
    if ts = Reg[1,,n].highestTsp() then return(val) // (4)
    ts := ts + n

L-Consensus

We implement L-Consensus using <>leader (leader()) and the O-Consensus algorithm

The idea is to use <>leader to make sure that, eventually, one process keeps executing steps alone, until that process decides

<> Leader

Property: If a correct process invokes leader, then the invocation returns and eventually, some correct process is permanently the only leader

Algorithm

propose(v):
  while(true) do
    if leader() then
      Reg[i].T.write(ts);
      val := Reg[1,,n].highestTspValue();
      if val =  then val := v;
      Reg[i].V.write(val,ts);
      if ts = Reg[1,,n].highestTsp() then return(val)
      ts := ts + n

From L-Consensus to Consensus (helping)

Consensus

propose(v):
  while (Dec.read() = ) do
    if leader() then
      Reg[i].T.write(ts);
      val := Reg[1,,n].highestTspValue();
      if val =  then val := v;
      Reg[i].V.write(val,ts);
      if ts = Reg[1,,n].highestTsp() then Dec.write(val)
      ts := ts + n;
  return(Dec.read())

<> Leader

Property: If a correct process invokes leader, then the invocation returns and eventually, some correct process is permanently the only leader

Algorithm

Shared Variables

Local Variables

Initialization

Algorithm

leader():
  clock := 0;
  while(true) do
    if (leader=self) then
      Reg[i].write(Reg[i].read()+1);
    clock := clock+1;
    if (clock = check) then elect();

elect():
  noLeader := true;
  for j = 1 to (i-1) do
    if (Reg[j].read() > last[j]) then
      last[j] := Reg[j].read();
      if (leader  pj) then delay := delay*2;
      leader:= pj;
      noLeader := false; 
      break (for);
  check := check + delay
  if (noLeader) then leader := self;

Consensus = Registers + <> Leader

<>Leader encapsulates the following synchrony assumption:
There is a time after which a lower and an upper bound hold on the time it takes for every process to execute a step (eventual synchrony)

Minimal Assumptions

Computing with anonymous processes

Counter

Sequential Spec

A counter has two operations inc() and read() and maintains an integer x init to 0

read():
  return(x)
  
inc():
  x := x+1;
  return(ok)

Atomic Implementation

The processes share an array of SWMR registers Reg[1,…,n] ; the writer of register Reg[i] is pi

inc():
  temp := Reg[i].read()+1;
  Reg[i].write(temp);
  return(ok)

read():
  sum := 0;
  for j = 1 to n do
    sum := sum + Reg[j].read();
  return(sum)

Weak Counter

A weak counter has one operation wInc()

wInc():
  x := x+1;
  return(x)

Correctness:

  1. if op1 precedes another op2, then op2 returns a value that is larger than op1;
  2. the value returned does not exceed the number of invocations

NB. Resembles a regular Fetch&Inc object

Lock-free Implementation

The processes share an (infinite) array of MWMR registers Reg[1,…,n,…], init to 0

wInc():
  i := 0;
  while (Reg[i].read()  0) do
    i := i+1;
  Reg[i].write(1);
  return(i);

Wait-free Implementation

The processes also use a MWMR register L

wInc():
  i := 0;
  while (Reg[i].read()  0) do
    // if L has been updated n times then return the largest value seen in L
    i := i+1;
  L.write(i);
  Reg[i].write(1);
  return(i);

Organized Vers:

wInc():
  t := l := L.read();
  i := k := 0;
  while (Reg[i].read()  0) do
    i : = i+1;
    if L.read()  l then
      l := L.read();
      t := max(t,l);
      k := k+1;
      if k = n then return(t);
  L.write(i);
  Reg[i].write(1);
  return(i);

Snapshot

Sequential Spec

A snapshot has operations update() and scan() and maintains an array x of size n

scan():
  return(x)

NB. No component is devoted to a process

update(i,v):
  x[i] := v;
  return(ok)

Key idea for atomicity & wait-freedom

Implementation

Every process keeps a local timestamp ts

update(i,v):
  ts := Wcounter.wInc();
  Reg[i].write(v,ts,self.scan());
  return(ok)

scan():
  ts := Wcounter.wInc();
  while(true) do
    // If some Reg[j] contains a collect with a higher timestamp than ts, then return that collect
    // If n+1 sets of reads return identical results then return that one

Consensus

Obstruction-free

We consider binary consensus

The processes share two infinite arrays of registers: Reg0[i] and Reg1[i]

Every process holds an integer i init to 1

Idea: to impose a value v, a process needs to be fast enough to fill in registers Reg{v}[i]

propose(v):
  while(true) do
    if Reg{1-v}[i] = 0 then
      Reg{v}[i] := 1;
      if i > 1 and Reg{1-v}[i-1] = 0 then return(v);
    else v := 1-v;
    i := i+1;

solo process & lock-step

Binary

propose(v):
  while(true) do
    if Reg{1-v}[i] = 0 then
      Reg{v}[i] := 1;
      if i > 1 and Reg{1-v}[i-1] = 0 then return(v);
    else if Reg{v}[i] = 0 then v := 1-v;
    if v = 1 then wait(2*i)
    i := i+1;

Transactional Memory

Introduction

Locking is “history”

Lock-freedom is “difficult”

Wanted: A synchronisation abstraction that is simple, robust and efficient

Back to the sequential level

atomic {
  accessing object 1;
  accessing object 2;
}

Semantics (serialisability)

Every transaction appears to execute at an indivisible point in time (linearizability of transactions)

TM

The TM Topic has been a VERY HOT topic

The TM API

(a simple view)

Two-phase locking

Idea

To write or read O, T requires a lock on O;
T waits if some T’ acquired a lock on O;
At the end, T releases all its locks.

Every object O, with state s(O) (a register), is protected by a lock l(O) (a c&s)
Every transaction has local variables wSet and wLog
Initially: l(O) = unlocked, wSet = wLog = ∅

Algorithm

Upon op = read() or write(v) on object O
  if O  wSet then
    wait until unlocked = l(O).c&s(unlocked,locked)
    wSet := wSet U O
    wLog := wLog U S(O).read()
  if op = read() then return S(O).read()
  S(O).write(v)
  return ok
Upon commit()
  cleanup()
  return ok
Upon abort()
  rollback()
  cleanup()
  return ok
Upon rollback()
  for all O  wSet do S(O).write(wLog(O))
  wLog := 
Upon cleanup()
  for all O  wSet do l(O).write(unlocked)
  wSet := 

Why two phases?

Original Idea

Read-write Lock

(1) Wait

(2) Better dead than wait

(3) Better kill than wait

(4) Better kill than wait, revised

Visible Read

(SXM, RSTM, TLRW)

Two-phase locking with invisible reads

Invisible reads

Upon write(v) on object O
  if O  wSet then
    wait until unlocked = l(O).c&s(unlocked,locked)
    wSet := wSet U O
    wLog := wLog U S(O).read()
  (*,ts) := S(O).read()
  S(O).write(v,ts)
  return ok
Upon read() on object O
  (v,ts) := S(O).read()
  if O  wSet then return v
  if l(O) = locked or not validate() then abort()
  if rset(O) = 0 then rset(O) := ts
  return v
Upon validate()
  for all O s.t rset(O) > 0 do
    (v,ts) := S(O).read()
    if ts  rset(O) or (O  wset and l(O) = locked) then 
      return false
    else return true
Upon commit()
  if not validate() then abort()
  for all O  wset do
    (v,ts) := S(O).read()
  S(O).write(v,ts+1)
  cleanup()
Upon rollback()
  for all O  wSet do S(O).write(wLog(O))
  wLog := 
Upon cleanup()
  for all O  wset do l(O).write(unlocked)
  wset := 
  rset(O) := 0 for all O

DSTM (SUN)

More efficient algorithm?

Apologizing versus asking permission

Example

Invariant: 0 < x < y

Initially: x := 1; y := 2

Division by zero

T1: x := x+1 ; y:= y+1

T2: z := 1 / (y - x)

Infinite loop

T1: x := 3; y:= 6

T2: a := y; b:= x; repeat b:= b + 1 until a = b

Opacity

Trade-off

The read is either visible or careful

Read invisibility

NB. Another way out is the use of multiversions: T2 would not have written “on” T1

Aborting is a fatality

Conditional progress (obstruction-freedom)

DSTM

Progress

Contention managers

Greedy contention manager

Concluding remarks

The garbage-collection analogy

Reference

Principles of Transactional Memory by Rachid Guerraoui, Michal Kapalka

Register implementations out of faulty base registers

Failure modes

t denotes the number of base objects that can fail

NB. In the asynchronous model, it is impossible to distinguish a non-responsive from a slow object

Algorithms

  1. Implements a SWMR register out of t+1 SWMR base responsive failure-prone registers
  2. Implements a SWSR register out of 2t+1 SWSR base non-responsive failure-prone registers

Responsive model

Write(v):
  for j = 1 to (t+1) do
    Reg[j].write(v);
   return(ok)
   
Read():
  for j = t+1 downto 1 do
    v := Reg[j].read();
    if v   then return(v)

Non-responsive model

Init: seq := 1

Write(v):
  w_seq := w_seq + 1;
  for j = 1 to (2t+1) do
    Reg[j].write(w_seq, v);
  // wait until a majority of oks are returned
  return(ok)
Init: (sn,val) := (-1, );

Read():
  for j = 1 to (2t+1) do
    (s,v) := Reg[j].read();
  (sn,val) := (s,v) with the highest s from majority, including (sn,val)
  return (val)