This is the quick summary from course material Concurrent Computing.
A register has two operations: read()
and write()
read():
return(x)
write(v):
x <- v; return(ok)
We assume that registers contain only integers
Unless explicitly stated otherwise, registers are initially supposed to contain 0
Dimension 3: safe – regular – atomic
if read() after write(v): return(v)
else return(random anything)
if read() after write(v): return(v)
else return(last written v’) or return(this written v)
if read() after write(v): return(v)
else return(last written v’)
Theorem: A multivalued MRMW atomic register can be implemented with binary SRSW safe registers
The process executing the code is implicitly assumed to be pi
We assume a system of n processes
NB. We distinguish base and high-level registers
The operations to be implemented are denoted Read()
and Write()
Those of the base registers are denoted read()
and write()
We omit the return(ok)
instruction at the end of Write()
implementations
We use an array of SRSW registers Reg[1,…,N]
Read():
return(Reg[i].read());
Write(v):
for j = 1 to N
Reg[j].write(v);
The transformation works also for multi-valued registers and regular ones
It does not however work for atomic registers
SRSW safe -> MRSW safe ✅
SRSW regular -> MRSW regular ✅
SRSW atomic -> MRSW atomic ❌
We use one MRSW safe register
Read():
return(Reg.read());
Write(v):
if old ≠ v then
Reg.write(v);
old := v;
The transformation works for single reader registers
It does not work for multi-valued registers
It does not work for atomic registers
We use an array of MRSW registers Reg[0,1,…,M]
init to [1,0,…,0]
(Unary Coding)
Read():
for j = 0 to M
if Reg[j].read() = 1 then return(j)
Write(v):
Reg[v].write(1);
for j = v-1 downto 0
Reg[j].write(0);
The transformation would not work if the Write()
would first write 0s and then 1
The transformation works for regular but NOT for atomic registers
We use one SRSW register Reg
and two local variables t
and x
Read():
(t’,x’) := Reg.read();
if t’ > t then t := t’; x := x’;
return(x)
Write(v):
t := t+1;
Reg.write(v,t);
The transformation would not work for multiple readers
The transformation would not work without timestamps
(variable t
represents logical time, i.e., timestamp)
We use N*N SRSW atomic registers RReg[(1,1),(1,2),…,(k,j),…,(N,N)]
to communicate among the readers
RReg[(k,j)]
the reader is pk and the writer is pjWReg[1,…,N]
to store new valuesWReg[k]
is pkWrite(v):
t1 := t1+1;
for j = 1 to N
WReg[j].write(v,t1);
Read():
for j = 1 to N do
(t[j],x[j]) := RReg[i,j].read();
(t[0],x[0]) := WReg[i].read();
(t,x) := highest(t[..],x[..]); // Value with highest timestamp
for j = 1 to N do
RReg[j,i].write(t,x);
return(x)
The transformation would not work for multiple writers
The transformation would not work if the readers do not communicate (i.e., if a reader does not write)
We use N MRSW atomic registers Reg[1,…,N]
; the writer of Reg[j]
is pj
Write(v):
for j = 1 to N do
(t[j],x[j]) := Reg[j].read();
(t,x) := highest(t[..],x[..]);
t := t+1;
Reg[i].write(t,v);
Read():
for j = 1 to N do
(t[j],x[j]) := Reg[j].read();
(t,x) := highest(t[..],x[..]);
return(x)
The entry in the table shows the base component and the method to form the corresponding register.
(1) Dimension 1 - Binary
Binary | SRSW | MRSW | MRMW |
---|---|---|---|
Safe | (1) Safe SRSW | ||
Regular | (1) Regular SRSW (2) Safe MRSW |
||
Atomic | (4) Regular SRSW | (5) Atomic SRSW | (6) Atomic MRSW |
(2) Dimension 1 - M-valued
M-valued | SRSW | MRSW | MRMW |
---|---|---|---|
Safe | (1) Safe SRSW | ||
Regular | (1) Regular SRSW (3) Binary Regular MRSW |
||
Atomic | (4) Regular SRSW | (5) Atomic SRSW | (6) Atomic MRSW |
Question 1: what objects can we implement with registers?
Question 2: what objects we cannot implement?
A counter has two operations inc()
and read()
and maintains an integer x
init to 0
read():
return(x)
inc():
x := x+1; return(ok)
The processes share one register Reg
read():
return(Reg.read())
inc():
temp := Reg.read()+1;
Reg.write(temp);
return(ok)
The processes share an array of registers Reg[1,…,N]
inc():
Reg[i].write(Reg[i].read()+1);
return(ok)
read():
sum := 0;
for j = 1 to n do
sum := sum + Reg[j].read();
return(sum)
A snapshot has operations update()
and scan()
and maintains an array x
of size N
scan():
return(x)
update(i,v):
x[i] := v;
return(ok)
The processes share one array of N registers Reg[1,…,N]
scan():
for j = 1 to N do
x[j] := Reg[j].read();
return(x)
update(i,v):
Reg[i].write(v);
return(ok)
What we implement here is some kind of regular snapshot:
A scan returns, for every index of the snapshot, the last written values or the value of any concurrent update
We call it collect
To scan, a process keeps reading the entire snapshot (i.e., it collect), until two results are the same
This means that the snapshot did not change, and it is safe to return without violating atomicity
Same value ↛ Same timestamp
Same value ← Same timestamp
The processes share one array of N registers Reg[1,…,N]
; each contains a value and a timestamp
We use the following operation for modularity
collect():
for j = 1 to N do
x[j] := Reg[j].read();
return(x)
scan():
temp1 := self.collect();
while(true) do
temp2 := self.collect();
if (temp1 = temp2) then
return(temp1.val)
temp1 := temp2;
update(i,v):
ts := ts + 1;
Reg[i].write(v,ts);
return(ok)
Key idea for atomicity & wait-freedom
The processes share an array of registers Reg[1,…,N]
that contains each:
v
),ts
), andv1',…,vN'
)To scan, a process keeps collecting and returns a collect if it did not change, or some collect returned by a concurrent scan
Timestamps are used to check if the collect changes or if a scan has been taken in the meantime
To update, a process scans and writes the value, the new timestamp and the result of the scan
Every process keeps a local timestamp ts
update(i,v):
ts := ts+1;
Reg[i].write(v,ts,self.scan());
return(ok)
scan():
t1 := self.collect();
t2 := t1
while(true) do
t3 := self.collect();
if (t3 = t2) then return (t3[..].value); // Return value for 1 to N in t3
for j = 1 to N do
if (t3[j].timestamp ≥ t1[j].timestamp+2) then // Have update after scan() begin
return (t3[j].scancopy)
t2 := t3
Possible execution?
We have seen how to:
Transform Safe SRSW to Atomic MRMW registers:
Can we do better?
Theorem 1. There is no wait-free algorithm that:
Build a “general” counter-example (adversarial model)
For any algorithm (given the hypotheses), you (the adversary) can always create a scenario/example that violates (one of) the properties it is meant to achieve
Good strategies/steps:
Let us assume (without loss of generality) that (Hypothesis):
reg
)We are the adversary crafting a general counter-example. Consider any algorithm that implements the atomic SWSR register.
reg
can only take a finite (limited) number of different valuesreg.read()
can return its previous value when if there is a concurrent write to reg
Read()
operation does not change reg
→ the state of reg
is controlled entirely by the writerThe writer alternates between writing 0 and 1 on the atomic register, forever.
What is the state of reg after each Write(0)
?
In Hypothesis 2: Finite number of bounded registers → the underlying register reg
can only take a finite (limited) number of different values
There is a value v0 that appears an infinite number of times after Write(0)
There is an infinite number of Write(1)
operations, starting after reg
in state v0, in which reg
ends in state vn after the Write(1)
.
A Write(1)
might perform multiple writes to reg
, so reg
might not change from v0 to vn immediately.
Claim: there are values v0, v1, … vn, such that reg
changes directly from v(i) to v(i+1) infinitely often.
Theorem 2. There is no wait-free algorithm that:
Question 1: what objects can we implement with registers?
Question 2: what objects we cannot implement?
fetch&inc()
increments the counter and returns the new valueOne operation propose()
which returns a value. When a propose operation returns, we say that the process decides
Every decided value is a proposed value
R0
and R1
and a Fetch&Inc C
.Uses two registers R0
and R1
, and a Fetch&Inc object C
(with one fetch&inc()
operation that returns its value)
(NB. The value in C
is initialized to 0)
Process pI:
propose(vI):
R[I].write(vI)
val := C.fetch&inc()
if(val = 1) then return(vI) // Implying self gets initial increment (winner), decides on own proposed value
else return(R[1-I].read()) // Implying other gets initial increment (loser), decides on other's proposed value
Uses two registers R0
and R1
, and a queue Q
; Q
is initialized to {winner, loser}
Process pI:
propose(vI):
R[I].write(vI)
item := Q.dequeue()
if item = winner return(vI)
else return(R[1-I].read())
Proof (algorithm):
x
, init to 0, and y
; it provides one operation: test&set()
Sequential spec:
test&set():
y := x; x: = 1; return(y)
x
, init to ⊥, and provides one operation: compare&swap(v,w)
Sequential spec:
compare&swap(old,new):
if x = old then x := new; return(x)
Uses two registers R0
and R1
, and a Test&Set object T
Process pI:
propose(vI):
R[I].write(vI)
val := T.test&set()
if(val = 0) then return(vI)
else return(R[1-I].read())
Uses a Compare&Swap object C
Process pI:
propose(vI):
val := C.compare&swap(⊥, vI)
if(val = ⊥) then return(vI)
else return(val)
R1 … Rk …
A
for p0 and p1 exists.R1 … Rk …
;The initial configuration C(0,1) is bivalent.
Proof:
Consider C(0,0) and p1 not taking any step:
Hence the bivalency.
Given any bivalent configuration C, there is an arbitrarily long schedule S such that S(C) is bivalent.
Proof (by contradiction):
Let S be the schedule with the maximal length such as D = S(C) is bivalent;
p0(D) and p1(D) are both univalent:
one of them is 0-valent (say p0(D)) and the other is 1-valent (say p1(D))
To go from D to p0(D) (vs p1(D)) p0 (vs p1) accesses a register;
the register must be the same in both cases;
otherwise p1(p0(D)) is the same as p0(p1(D)):
in contradiction with the very fact that p0(D) is 0-valent whereas p1(D) is 1-valent
To go from D to p0(D), p0 cannot read R;
otherwise R has the same state in D and in p0(D);
in this case, the registers and p1 have the same state in p1(p0(D)) and p1(D);
if p1 is the only one executing steps, then p1 eventually decides 1 in both cases:
a contradiction with the fact that p0(D) is 0-valent;
the same argument applies to show that p1 cannot read R to go from D to p1(D)
Thus both p0 and p1 write in R to go from D to p0(D) (resp., p1(D)). But then p0(p1(D)) = p0(D) (resp. p1(p0(D)) = p1(D)) — a contradiction.
Encyclopedia of Algorithms, Second Edition, Page 152, Asynchronous Consensus Impossibility, 1985; Fischer, Lynch, Paterson
Theorem 1: Consensus is universal [Her91]
Shared memory model = Registers (read-write) + Consensus objects
One operation propose()
which returns a value. When a propose returns, the process decides.
Example (FIFO Queue) Non-linearizable execution
Lreq
(theoretically of infinite size)
Lcons
(also of infinite size)
We use an ordered list of consensus objects:
The algorithm combines the shared registers Lreq[I]
and the consensus object list Lcons
to ensure that:
Linearization (FIFO Queue)
Every process also uses two local data structures:
lPerf
lInv
Every request is uniquely identified
Every process pI executes three // tasks:
Lreq[J]
into lInv
lInv – lPerf
) is not empty, pI performs requests using Lcons
While (lInv – lPerf
) is not empty
lInv – lPerf
for a new consensus in Lcons
(increasing the consensus integer)lperf
) on the local copyLreq[I]
pI puts the request in lPerf
Example (FIFO Queue)
Proof (sketch):
Assume by contradiction that pI does not return from that invocation;
pI puts req into Lreq
(Task 1);
eventually, every proposed lInv - lPerf
contains req (Task 2);
and the consensus decision contains req (Task 3);
the result is then eventually returned (Task 3)
Proof (sketch): the processes agree on the same total order for sets of requests and then use the same order within every set of requests (the linearization order is determined by the integers associated with the consensus)
Proof (sketch): it directly follows from the algorithm that the result of req2 is based on the state of req1
Every process pI executes three // tasks:
linv
lInv – lPerf
) is not empty, pI performs requests using Lcons
Universality (1 + 2)
The type defined by $\delta’$ is deterministic
It is sufficient to implement a type defined by $\delta’$ !
Task 3 (Preserving non-determinism)
While lInv–lPerf
is not empty
update
) from request by performing: (reply,update) := object.exec(request)
(request,reply,update)
to a new consensus in Lcons
(increasing the consensus integer) producing (request',reply',update')
object.update(update')
Lreq[I]
(request',reply',update')
in lPerf
We implement Wait-free Consensus (Consensus) through:
Lock-free Consensus (L-Consensus)
and
_Registers
We implement L-Consensus through:
Obstruction-free Consensus (O-Consensus)
and
<>Leader (encapsulating timing assumptions and
sometimes denoted by \(\Omega\))
ts
, initialized to i
and incremented by n
Reg[1,…,n]
; each element of the array contains two registers:
Reg[i].T
contains a timestamp (init to 0)Reg[i].V
contains a pair (value,timestamp) (init to (⊥,0))To simplify the presentation, we assume two functions applied to Reg[1,…,N]
highestTsp()
returns the highest timestamp among all elements Reg[1].T
, Reg[2].T
, …, Reg[N].T
highestTspValue()
returns the value with the highest timestamp among all elements Reg[1].V
, Reg[2].V
, …, Reg[N].V
propose(v):
while(true) do
Reg[i].T.write(ts); // (1)
val := Reg[1,…,n].highestTspValue(); // (2)
if val = ⊥ then val := v;
Reg[i].V.write(val,ts); // (3)
if ts = Reg[1,…,n].highestTsp() then return(val) // (4)
ts := ts + n
We implement L-Consensus using <>leader (leader()
) and the O-Consensus algorithm
The idea is to use <>leader to make sure that, eventually, one process keeps executing steps alone, until that process decides
leader()
which does not take any input parameter and returns a booleanleader()
is trueProperty: If a correct process invokes leader, then the invocation returns and eventually, some correct process is permanently the only leader
propose(v):
while(true) do
if leader() then
Reg[i].T.write(ts);
val := Reg[1,…,n].highestTspValue();
if val = ⊥ then val := v;
Reg[i].V.write(val,ts);
if ts = Reg[1,…,n].highestTsp() then return(val)
ts := ts + n
Dec
(init to ⊥)Dec
propose(v):
while (Dec.read() = ⊥) do
if leader() then
Reg[i].T.write(ts);
val := Reg[1,…,n].highestTspValue();
if val = ⊥ then val := v;
Reg[i].V.write(val,ts);
if ts = Reg[1,…,n].highestTsp() then Dec.write(val)
ts := ts + n;
return(Dec.read())
leader()
which does not take any input parameter and returns a booleanProperty: If a correct process invokes leader, then the invocation returns and eventually, some correct process is permanently the only leader
Reg[i]
, i.e., claiming leadershipReg[i]
clock
and check
, as well as a local variable delay whenever its leader changeslast[j]
to record the last value of Reg[j]
pi has read (pi can hence know whether pj has progressed)noLeader
is true)check
, and delay
are initialized to 1last[j]
and Reg[j]
are initialized to 0leader():
clock := 0;
while(true) do
if (leader=self) then
Reg[i].write(Reg[i].read()+1);
clock := clock+1;
if (clock = check) then elect();
elect():
noLeader := true;
for j = 1 to (i-1) do
if (Reg[j].read() > last[j]) then
last[j] := Reg[j].read();
if (leader ≠ pj) then delay := delay*2;
leader:= pj;
noLeader := false;
break (for);
check := check + delay
if (noLeader) then leader := self;
<>Leader encapsulates the following synchrony assumption:
There is a time after which a lower and an upper bound hold on the time it takes for every process to execute a step (eventual synchrony)
A counter has two operations inc()
and read()
and maintains an integer x
init to 0
read():
return(x)
inc():
x := x+1;
return(ok)
The processes share an array of SWMR registers
Reg[1,…,n]
; the writer of register Reg[i]
is pi
inc():
temp := Reg[i].read()+1;
Reg[i].write(temp);
return(ok)
read():
sum := 0;
for j = 1 to n do
sum := sum + Reg[j].read();
return(sum)
A weak counter has one operation wInc()
wInc():
x := x+1;
return(x)
Correctness:
NB. Resembles a regular Fetch&Inc object
The processes share an (infinite) array of MWMR registers Reg[1,…,n,…]
, init to 0
wInc():
i := 0;
while (Reg[i].read() ≠ 0) do
i := i+1;
Reg[i].write(1);
return(i);
The processes also use a MWMR register L
wInc():
i := 0;
while (Reg[i].read() ≠ 0) do
// if L has been updated n times then return the largest value seen in L
i := i+1;
L.write(i);
Reg[i].write(1);
return(i);
Organized Vers:
wInc():
t := l := L.read();
i := k := 0;
while (Reg[i].read() ≠ 0) do
i : = i+1;
if L.read() ≠ l then
l := L.read();
t := max(t,l);
k := k+1;
if k = n then return(t);
L.write(i);
Reg[i].write(1);
return(i);
A snapshot has operations update()
and
scan()
and maintains an array x
of size n
scan():
return(x)
NB. No component is devoted to a process
update(i,v):
x[i] := v;
return(ok)
Wcounter
, init to 0;Reg[1,…,N]
that contains each:
Every process keeps a local timestamp ts
update(i,v):
ts := Wcounter.wInc();
Reg[i].write(v,ts,self.scan());
return(ok)
scan():
ts := Wcounter.wInc();
while(true) do
// If some Reg[j] contains a collect with a higher timestamp than ts, then return that collect
// If n+1 sets of reads return identical results then return that one
We consider binary consensus
The processes share two infinite arrays of registers: Reg0[i]
and Reg1[i]
Every process holds an integer i
init to 1
Idea: to impose a value v
, a process needs to be fast enough to fill in registers Reg{v}[i]
propose(v):
while(true) do
if Reg{1-v}[i] = 0 then
Reg{v}[i] := 1;
if i > 1 and Reg{1-v}[i-1] = 0 then return(v);
else v := 1-v;
i := i+1;
propose(v):
while(true) do
if Reg{1-v}[i] = 0 then
Reg{v}[i] := 1;
if i > 1 and Reg{1-v}[i-1] = 0 then return(v);
else if Reg{v}[i] = 0 then v := 1-v;
if v = 1 then wait(2*i)
i := i+1;
Locking is “history”
Lock-freedom is “difficult”
Wanted: A synchronisation abstraction that is simple, robust and efficient
Back to the sequential level
atomic {
accessing object 1;
accessing object 2;
}
Every transaction appears to execute at an indivisible point in time (linearizability of transactions)
The TM Topic has been a VERY HOT topic
(a simple view)
begin()
returns ok
read()
returns a value or abort
write()
returns ok
or abort
commit()
returns ok
or abort
abort()
returns ok
To write
or read
O, T requires a lock
on O;
T waits
if some T’ acquired a lock
on O;
At the end, T releases
all its locks.
Every object O, with state s(O) (a register), is protected by a lock l(O) (a c&s
)
Every transaction has local variables wSet
and wLog
Initially: l(O) = unlocked, wSet
= wLog
= ∅
Upon op = read() or write(v) on object O
if O ∉ wSet then
wait until unlocked = l(O).c&s(unlocked,locked)
wSet := wSet U O
wLog := wLog U S(O).read()
if op = read() then return S(O).read()
S(O).write(v)
return ok
Upon commit()
cleanup()
return ok
Upon abort()
rollback()
cleanup()
return ok
Upon rollback()
for all O ∈ wSet do S(O).write(wLog(O))
wLog := ∅
Upon cleanup()
for all O ∈ wSet do l(O).write(unlocked)
wSet := ∅
write
or read
O, T requires a lock
on O; T waits
if some T’ acquired a lock
on Oreleases
the lock on O when T is done with O(1) Wait
write
O, T requires a write-lock
on O; T waits
if some T’ acquired a lock
on Oread
O, T requires a read-lock
on O; T waits
if some T’ acquired a write-lock
on Oreleases
all its locks(2) Better dead than wait
write
O, T requires a write-lock
on O; T aborts
if some T’ acquired a lock
on Oread
O, T requires a read-lock
on O; T aborts
if some T’ acquired a write-lock
on Oreleases
all its locks(3) Better kill than wait
write
O, T requires a write-lock
on O; T aborts
T’ if some T’ acquired a lock
on Oread
O, T requires a read-lock
on O; T aborts
T’ if some T’ acquired a write-lock
on Oreleases
all its locks(4) Better kill than wait, revised
write
O, T requires a write-lock
on O; T aborts
T’ if some T’ acquired a lock
on Oread
O, T requires a read-lock
on O; T waits
if some T’ acquired a write-lock
on Oreleases
all its locks(SXM, RSTM, TLRW)
Visible but not so careful read: when a transaction reads an object, it says so
write
O, T requires a write-lock
on O; T waits
if some T’ acquired a write-lock
on Oread
O, T checks if all objects read remain valid - else T aborts
c&s
)wSet
and wLog
rset(O)
for every objectUpon write(v) on object O
if O ∉ wSet then
wait until unlocked = l(O).c&s(unlocked,locked)
wSet := wSet U O
wLog := wLog U S(O).read()
(*,ts) := S(O).read()
S(O).write(v,ts)
return ok
Upon read() on object O
(v,ts) := S(O).read()
if O ∈ wSet then return v
if l(O) = locked or not validate() then abort()
if rset(O) = 0 then rset(O) := ts
return v
Upon validate()
for all O s.t rset(O) > 0 do
(v,ts) := S(O).read()
if ts ≠ rset(O) or (O ∉ wset and l(O) = locked) then
return false
else return true
Upon commit()
if not validate() then abort()
for all O ∈ wset do
(v,ts) := S(O).read()
S(O).write(v,ts+1)
cleanup()
Upon rollback()
for all O ∈ wSet do S(O).write(wLog(O))
wLog := ∅
Upon cleanup()
for all O ∈ wset do l(O).write(unlocked)
wset := ∅
rset(O) := 0 for all O
write
O, T requires a write-lock
on O; T aborts
T’ if some T’ acquired a write-lock
on Oread
O, T checks if all objects read remain valid – else T aborts
Before committing, T releases all its locks
Apologizing versus asking permission
Invariant: 0 < x < y
Initially: x := 1; y := 2
T1: x := x+1 ; y:= y+1
T2: z := 1 / (y - x)
T1: x := 3; y:= 6
T2: a := y; b:= x; repeat b:= b + 1 until a = b
The read is either visible or careful
NB. Another way out is the use of multiversions: T2 would not have written “on” T1
write
O, T requires a write-lock
onO (use C&S
); T aborts
T’ if some T’ acquired a write-lock
on O (use C&S
)read
O, T checks if all objects read remain valid - else abort (use C&S
)C&S
)Principles of Transactional Memory by Rachid Guerraoui, Michal Kapalka
t
denotes the number of base objects that can fail
NB. In the asynchronous model, it is impossible to distinguish a non-responsive from a slow object
t+1
SWMR base responsive failure-prone registers2t+1
SWSR base non-responsive failure-prone registersWrite(v):
for j = 1 to (t+1) do
Reg[j].write(v);
return(ok)
Read():
for j = t+1 downto 1 do
v := Reg[j].read();
if v ≠ ⟂ then return(v)
Init: seq := 1
Write(v):
w_seq := w_seq + 1;
for j = 1 to (2t+1) do
Reg[j].write(w_seq, v);
// wait until a majority of oks are returned
return(ok)
Init: (sn,val) := (-1, ⟂);
Read():
for j = 1 to (2t+1) do
(s,v) := Reg[j].read();
(sn,val) := (s,v) with the highest s from majority, including (sn,val)
return (val)