#### Weak Memory Concurrency in C/C++11

#### Ori Lahav

http://www.cs.tau.ac.il/~orilahav/



Hydra Distributed Computing Conference

July 11, 2019

```
Initially, x = y = 0.

x := 1; y := 1; b := x; if (a = 0) then /* critical section */ /* critical section */
```

```
Initially, x = y = 0.

x := 1; y := 1; b := x; if (a = 0) then /* critical section */ /* critical section */
```

Is it safe?

```
Initially, x = y = 0.

x := 1; y := 1; b := x; \# 0

if (a = 0) then f(b = 0) then
```

#### Is it safe?

Yes, if we assume sequential consistency (SC):



```
Initially, x = y = 0.

x := 1; y := 1; b := x; \# 0

if (a = 0) then f(b = 0) then
```

#### Is it safe?

Yes, if we assume sequential consistency (SC):



No existing hardware implements SC!

- $\triangleright$  SC is very expensive (memory  $\sim$ 100 times slower than CPU).
- SC does not scale to many processors.

#### Example: Shared-memory concurrency in C++

```
int X, Y, a, b;

void thread1() {
    X = 1;
    a = Y;
}

void thread2() {
    Y = 1;
    b = X;
}
```

```
int main () {
    int cnt = 0;
    do {
        X = 0: Y = 0:
        thread first(thread1);
        thread second(thread2):
        first.join();
        second.join();
        cnt++;
    } while (a != 0 || b != 0);
    printf("%d\n",cnt);
    return 0;
```

## Example: Shared-memory concurrency in C++

```
int X, Y, a, b;

void thread1() {
    X = 1;
    a = Y;
}

void thread2() {
    Y = 1;
    b = X;
}
int main () {
    int cnt = 0;

do {
    X = 0; Y
    thread f
    thread s
    first.jc
    second.j
    cnt++;
```

If Dekker's mutual exclusion is safe, this program will not terminate

```
int cnt = 0;
     X = 0: Y = 0:
     thread first(thread1);
     thread second(thread2):
     first.join();
     second.join();
→} while (a != 0 || b != 0);
 printf("%d\n",cnt);
 return 0:
```

#### Weak memory models



We look for a substitute for SC:

#### **Unambiguous specification**

What are the possible outcomes of a multithreaded program?

#### Amenable to formal reasoning

Can prove theorems about the model.

#### Typically called a weak memory model (WMM)

Allows more behaviors than SC.

4

#### Weak memory models



We look for a substitute for SC:

#### **Unambiguous specification**

What are the possible outcomes of a multithreaded program?

#### Amenable to formal reasoning

Can prove theorems about the model.

#### Typically called a weak memory model (WMM)

Allows more behaviors than SC.

#### But it is not easy to get right

- ► The Java memory model is flawed.
- ▶ The C/C++11 model is also flawed.

4

#### The Problem of Programming Language Concurrency Semantics

Mark Batty, Kayvan Memarian, Kyndylan Nienhuis, Jean Pichon-Pharabod, and Peter Sewell

University of Cambridge

"Disturbingly, 40+ years after the first relaxed-memory hardware was introduced (the IBM 370/158MP), the field still does not have a credible proposal for the concurrency semantics of any general-purpose high-level language that includes high performance shared-memory concurrency primitives. This is a major open problem for programming language semantics."

European Symposium on Programming (ESOP) 2015

#### Plan for rest of the talk

- 1. Challenges for memory models
- 2. The C/C++11 memory model
- 3. The "out-of-thin-air" problem
- 4. A solution: a promising semantics

#### Plan for rest of the talk

- 1. Challenges for memory models
- 2. The C/C++11 memory model
- 3. The "out-of-thin-air" problem
- 4. A solution: a promising semantics

## Challenge 1: Various hardware models

















$$a := y; \ /\!\!/ 0$$

$$y := 1;$$

$$b := x; // 0$$







$$\triangleright x := 1;$$

$$a := y; // 0$$

▶ 
$$y := 1$$
;

$$b := x; // 0$$









▶ a := y; // 0

$$\triangleright$$
  $y := 1;$ 

b := x; // 0









► 
$$a := y$$
; // 0

$$y := 1;$$

▶ b := x; // 0





```
Initially, x=y=0.  \begin{aligned} x &:= 1; & & y &:= 1; \\ & \textbf{fence}; & & \textbf{fence}; \\ & a &:= y; \ \# \ 0 \end{aligned} \qquad \begin{vmatrix} & & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & & \\ & & &
```





```
Initially, x = y = 0.
```

$$a := x; // 1$$

$$a := x; \ // 1$$
  $b := y; \ // 1$   $y := 1;$   $x := b;$ 





```
Initially, x = y = 0.
```

$$a := x; // 1$$

$$a := x; \ // 1$$
  $b := y; \ // 1$   $y := 1;$   $x := b;$ 





```
Initially, x = y = 0.
```

$$a := x; // 1$$

$$a := x; //1$$
  $b := y; //1$   $y := 1;$   $x := b;$ 





```
Initially, x = y = 0.
```

$$a := x; // 1$$

$$a := x; //1$$
  $b := y; //1$   $y := 1;$   $x := b;$ 





Initially, 
$$x = y = 0$$
.

$$a := x; \ /\!\!/ 1$$
  $b := y; \ /\!\!/ 1$   $y := 1;$   $x := b;$ 

$$b := y; // 1$$
  
  $x := b;$ 



## Challenge 2: Compilers stir the pot

```
Initially, x = y = 0.
```

$$x := 1;$$
  $b := x;$   $b := y;$   $1$   $c := x;$   $0$ 

X forbidden under SC

#### Challenge 2: Compilers stir the pot

Initially, 
$$x = y = 0$$
.

$$x := 1; \begin{vmatrix} a := x; \\ b := y; // 1 \\ y := 1; \end{vmatrix} \begin{array}{c} a := x; \\ compiler \\ c := x; // 0 \end{array}$$

$$x := 1; \begin{vmatrix} a := x; \\ b := y; // 1 \\ c := a; // 0 \end{array}$$

$$x := 1; \begin{vmatrix} a := x; \\ b := y; // 1 \\ c := a; // 0 \end{array}$$

$$x := 1; \begin{vmatrix} a := x; \\ b := y; // 1 \\ c := a; // 0 \end{aligned}$$

$$x := 1; \begin{vmatrix} a := x; \\ b := y; // 1 \\ c := a; // 0 \end{aligned}$$

$$x := 1; \begin{vmatrix} a := x; \\ b := y; // 1 \\ c := a; // 0 \end{aligned}$$

$$x := 1; \begin{vmatrix} a := x; \\ b := y; // 1 \\ c := a; // 0 \end{aligned}$$

$$x := 1; \begin{vmatrix} a := x; \\ b := y; // 1 \\ c := a; // 0 \end{aligned}$$

#### Challenge 3: Transformations do not suffice

Program transformations fail short to explain some weak behaviors.

- ► In C/C++:
  - Release stores cannot be reordered.
  - Acquire loads cannot be reordered.

```
Message passing (MP)
x :=_{rel} 1; \quad || \quad a := y_{acq}; \quad /\!\!/ 1
y :=_{rel} 1; \quad || \quad b := x_{acq}; \quad /\!\!/ 0
```

#### Challenge 3: Transformations do not suffice

Program transformations fail short to explain some weak behaviors.

- ► In C/C++:
  - Release stores cannot be reordered.
  - Acquire loads cannot be reordered.

► And yet, since C/C++ is intended to be compiled to a *non-multi-copy-atomic* architectures:

# Independent reads of independent writes (IRIW) $a := x_{\text{acq}}; \ \ /\!\!/ \ 1 \\ b := y_{\text{acq}}; \ \ /\!\!/ \ 0 \ \ x :=_{\text{rel}} 1; \ \ y :=_{\text{rel}} 1; \ \ \ d := x_{\text{acq}}; \ \ /\!\!/ \ 0$

#### Overview



#### WMM desiderata

- 1. Formal and comprehensive
- Not too weak (good for programmers)
- 3. Not too strong (good for hardware)
- 4. Admits optimizations (good for compilers)

Implementability vs.

**Programmability** 

# The C11 memory model

- ▶ Introduced by the ISO C/C++ 2011 standards.
- ▶ Defines the semantics of concurrent memory accesses.

## The C11 memory model: Atomics

Two types of accesses

Ordinary (Non-Atomic)

Races are errors

**Atomic** 

Welcome to the expert mode

## The C11 memory model: Atomics

Two types of accesses

Ordinary (Non-Atomic)

Races are errors

**Atomic** 

Welcome to the expert mode

DRF (data race freedom) guarantee

 $\begin{array}{c} \text{no data races} \\ \text{under SC} \end{array} \Longrightarrow \begin{array}{c} \text{only} \\ \text{SC behaviors} \end{array}$ 

#### A spectrum of access modes

```
non-atomic
              \sqsubset relaxed \sqsubset
                                        release/acquire
                                                                    SC
                      memory_order_seq_cst (sc)
                  full fence (x86,PPC); stlr&ldar (ARM)
memory_order_release write (rel)
                                      memory_order_acquire read (acq)
  no fence (x86); lwsync (PPC);
                                         no fence (x86); isync (PPC);
           stlr (ARM)
                                              Idar/Idapr (ARM)
                     memory_order_relaxed (rlx)
                               no fence
                            Non-atomic (na)
                        no fence, races are errors
```

+ Explicit primitives for fences

## C11: a declarative memory model

Declarative semantics abstracts away from implementation details.

- 1. a program  $\sim$  a set of directed graphs.
- 2. The model defines what executions are *consistent*.
- 3. C/C++11 also has *catch-fire* semantics (forbidden data races).

#### Execution graphs

#### Store buffering (SB)

$$x = y = 0$$
  
 $x :=_{rlx} 1; || y :=_{rlx} 1;$   
 $a := y_{rlx}; || b := x_{rlx};$ 





#### Relations

- Program order, po
- ▶ Reads-from, rf

17

#### C/C++11 formal model

#### [Vafeiadis & Narayan OOPSLA'13]

```
[-]: CExp \rightarrow \mathbb{P}(\langle res : Val \cup \{\bot\}, A : \mathbb{P}(AName), lab : A \rightarrow Act, sb : \mathbb{P}(A \times A), fst : A, lst : A\rangle)
                                   [v] \stackrel{\text{def}}{=} \{ \langle v, \{a\}, \mathsf{lab}, \emptyset, a, a \rangle \mid a \in \mathsf{AName} \land \mathsf{lab}(a) = \mathsf{skip} \}
                       [alloc()] \stackrel{\text{def}}{=} \{ \langle \ell, \{a\}, lab, \emptyset, a, a \rangle \mid a \in AName \land \ell \in Loc \land lab(a) = A(\ell) \}
                  [\![v]_Z := v']\!] \stackrel{\text{def}}{=} \{\langle v', \{a\}, \mathsf{lab}, \emptyset, a, a \rangle \mid a \in \mathsf{AName} \land \mathsf{lab}(a) = W_Z(v, v')\}
                             ||[v]||_{\mathbb{Z}} \stackrel{\text{def}}{=} \{\langle v', \{a\}, | \mathsf{lab}, \emptyset, a, a \rangle \mid a \in \mathsf{AName} \land v' \in \mathsf{Val} \land |\mathsf{lab}(a) = \mathsf{R}_{\mathcal{Z}}(v, v')\}
 \begin{aligned} & \| \mathbf{CAS}_{X,Y}(v, v_o, v_n) \| \overset{\text{def}}{=} \left\{ \langle v', \{a\}, \mathsf{lab}, \emptyset, a, a \rangle \mid a \in \mathsf{AName} \land v' \in \mathsf{Val} \land v' \neq v_o \land \mathsf{lab}(a) = \mathrm{R}_Y(v, v') \right\} \\ & \cup \left\{ \langle v_o, \{a\}, \mathsf{lab}, \emptyset, a, a \rangle \mid a \in \mathsf{AName} \land \mathsf{lab}(a) = \mathrm{RMW}_X(v, v_o, v_n) \right\} \end{aligned} 
    \| \mathbf{let} \ x = E_1 \ \mathbf{in} \ E_2 \| \stackrel{\text{def}}{=} \{ \langle \bot, A_1, | \mathbf{ab_1}, \mathbf{sb_1}, fst_1, lst_1 \rangle \mid \langle \bot, A_1, | \mathbf{ab_1}, \mathbf{sb_1}, fst_1, lst_1 \rangle \in \| E_1 \| \}
                                           \cup {\langle res_2, A_1 \uplus A_2, lab_1 \cup lab_2, sb_1 \cup sb_2 \cup \{(lst_1, fst_2)\}, fst_1, lst_2 \rangle
                                                   (v_1, A_1, lab_1, sb_1, fst_1, lst_1) \in [E_1] \land (res_2, A_2, lab_2, sb_2, fst_2, lst_2) \in [E_2[v_1/x]] \}
        \llbracket \mathbf{repeat} \ E \ \mathbf{end} \rrbracket \stackrel{\mathrm{def}}{=} \{\langle res_N, \biguplus_{i \in [1..N]} \mathcal{A}_i, \bigcup_{i \in [1..N]} \mathsf{lab}_i, \bigcup_{i \in [1..N]} \mathsf{sb}_i \cup \{(lst_1, fst_2), \dots, (lst_{N-1}, fst_N)\}, fst_1, lst_N \rangle \mid \mathsf{frace} \rbrace \}
                                                     \forall i. \langle res_i, A_i, lab_i, sb_i, fst_i, lst_i \rangle \in |E| \land (i \neq N \implies res_i = 0) \land res_N \neq 0 
                        [\![E_1]\!] \stackrel{\text{def}}{=} \{\langle \mathsf{combine}(\mathit{res}_1, \mathit{res}_2), \mathcal{A}_1 \uplus \mathcal{A}_2 \uplus \{\mathit{a}_\mathsf{fork}, \mathit{a}_\mathsf{join}\}, \mathsf{lab}_1 \cup \mathsf{lab}_2 \cup \{\mathit{a}_\mathsf{fork} \mapsto \mathsf{skip}, \mathit{a}_\mathsf{join} \mapsto \mathsf{skip}\},
                                                      sb_1 \cup sb_2 \cup \{(a_{fork}, fst_1), (a_{fork}, fst_2), (lst_1, a_{join}), (lst_2, a_{join})\}, a_{fork}, a_{join}\}
                                                    (res_1, A_1, sb_1, fst_1, lst_1) \in [E_1] \land (res_2, A_2, sb_2, fst_2, lst_2) \in [E_2] \land a_{fork}, a_{loin} \in AName
                                                                     Figure 2. Semantics of closed program expressions.
                                                                                                          \exists x \ \mathsf{hb}(x \ x)
                                                                                                                                                                                                                (IrreflexiveHB)
                                                                  \forall \ell. totalorder(\{a \in A \mid iswrite_{\ell}(a)\}, mo) \land hb_{\ell} \subseteq mo
                                                                                                                                                                                                              (ConsistentMO)
                                     totalorder(\{a \in A \mid isSeqCst(a)\}, sc) \land hb_{SeqCst} \subseteq sc \land mo_{SeqCst} \subseteq sc
                                                                                                                                                                                                                (ConsistentSC)
                                        \forall b. \ rf(b) \neq \bot \iff \exists \ell, a. \ iswrite_{\ell}(a) \land isread_{\ell}(b) \land hb(a, b)
                                                                                                                                                                                                       (Consistent REdom)
                                       \forall a, b. \ rf(b) = a \implies \exists \ell, v. \ iswrite_{\ell,v}(a) \land isread_{\ell,v}(b) \land \neg hb(b, a)
                                                                                                                                                                                                                (ConsistentRE)
                                     \forall a, b, \text{ rf}(b) = a \land (\text{mode}(a) = \text{na} \lor \text{mode}(b) = \text{na}) \implies \text{hb}(a, b)
                                                                                                                                                                                                           (ConsistentRFna)
               \forall a, b. \ \mathsf{rf}(b) = a \land \mathsf{isSeqCst}(b) \implies \mathsf{isc}(a, b) \lor \neg \mathsf{isSeqCst}(a) \land (\forall x. \ \mathsf{isc}(x, b) \Rightarrow \neg \mathsf{hb}(a, x))
                                                                                                                                                                                                              (RestrSCReads)
                                                                   \nexists a, b. \ hb(a, b) \land mo(rf(b), rf(a)) \land locs(a) = locs(b)
                                                                                                                                                                                                                  (CoherentRR)
                                                           \exists a, b, hb(a, b) \land mo(rf(b), a) \land iswrite(a) \land locs(a) = locs(b)
                                                                                                                                                                                                                (CoherentWR)
                                                           \nexists a, b. \ hb(a, b) \land mo(b, rf(a)) \land iswrite(b) \land locs(a) = locs(b)
                                                                                                                                                                                                                (CoherentRW)
                               \forall a. \text{ isrmw}(a) \land \text{rf}(a) \neq \bot \implies \text{mo}(\text{rf}(a), a) \land \nexists c. \text{mo}(\text{rf}(a), c) \land \text{mo}(c, a)
                                                                                                                                                                                                                (AtomicRMW)
                                                                           \forall a, b, \ell, \ \mathsf{lab}(a) = \mathsf{lab}(b) = \mathsf{A}(\ell) \implies a = b
                                                                                                                                                                                                           (ConsistentAlloc)
where \mathsf{iswrite}_{\ell,v}(a) \stackrel{\text{def}}{=} \exists X, v_{\text{old}}, \mathsf{lab}(a) \in \{W_X(\ell, v), RMW_X(\ell, v_{\text{old}}, v)\}
                                                                                                                                                     iswrite_{\ell}(a) \stackrel{\text{def}}{=} \exists v. iswrite_{\ell,v}(a)
              isread_{\ell,v}(a) \stackrel{\text{def}}{=} \exists X, v_{\text{new}}. lab(a) \in \{R_X(\ell, v), RMW_X(\ell, v, v_{\text{new}})\}
             rsElem(a, b) \stackrel{\text{def}}{=} sameThread(a, b) \lor isrmw(b)
                      rseq(a) \stackrel{\text{def}}{=} \{a\} \cup \{b \mid rsElem(a,b) \land mo(a,b) \land (\forall c. mo(a,c) \land mo(c,b) \Rightarrow rsElem(a,c))\}
                               \mathsf{sw} \stackrel{\mathrm{def}}{=} \{(a,b) \mid \mathsf{mode}(a) \in \{\mathsf{rel}, \mathsf{rel\_acq}, \mathsf{sc}\} \land \mathsf{mode}(b) \in \{\mathsf{acq}, \mathsf{rel\_acq}, \mathsf{sc}\} \land \mathsf{rf}(b) \in \mathsf{rseq}(a)\}
                               hb \stackrel{\text{def}}{=} (sb \cup sw)^+
                             \mathsf{hb}_{\ell} \stackrel{\mathrm{def}}{=} \{(a,b) \in \mathsf{hb} \mid \mathsf{iswrite}_{\ell}(a) \land \mathsf{iswrite}_{\ell}(b)\}
                      X_{SeqCst} \stackrel{\text{def}}{=} \{(a, b) \in X \mid isSeqCst(a) \land isSeqCst(b)\}
                     isc(a, b) \stackrel{\text{def}}{=} iswrite_{locs(b)}(a) \land sc(a, b) \land \nexists c. sc(a, c) \land sc(c, b) \land iswrite_{locs(b)}(c)
                                Figure 3. Axioms satisfied by consistent C11 executions. Consistent (A, lab. sb. rf. mo. sc).
       c: W(\ell, 1) \longrightarrow a: R(\ell, 1) \mid c: W(\ell, 2) \xrightarrow{ma} a: W(\ell, 1) \mid c: W(\ell, 1) \xrightarrow{d} a: R(\ell, 1)
                1 mo
                                         ыЫ
       d: W(\ell, 2) \longrightarrow b: R(\ell, 2)
            violates Coherent RR
                                                                  violates CoherentWR
```

## Basic ingredients of execution graph consistency

- 1. SC-per-location (a.k.a. coherence)
- 2. Release/acquire synchronization
- 3. Global conditions on SC accesses

# Basic ingredients of execution graph consistency

- 1. SC-per-location (a.k.a. coherence)
- 2. Release/acquire synchronization
- 3. Global conditions on SC accesses

# Sequential Consistency (SC)

#### Definition (Declarative definition of SC, Lamport '79)

G is SC-consistent if there exists a relation S s.t. the following hold:

- S is a total order on the events of G.
- $\triangleright$  (po  $\cup$  rf);  $S = \emptyset$ .
- ▶ If  $\langle a, b \rangle \in \text{rf}$  then there does not exist  $c \in \mathbb{W}_{\text{loc}(a)}$  such that  $\langle a, c \rangle \in \mathbb{S}$  and  $\langle c, b \rangle \in \mathbb{S}$ .

Namely, the following is disallowed:

$$\begin{array}{c} \text{rf} \\ \mathbb{W} \times \xrightarrow{-} \mathbb{W} \times \xrightarrow{-} \mathbb{R} \times \end{array}$$

# SC: Example

```
x = y = 0

x :=_{rlx} 1; y :=_{rlx} 1;

a := y_{rlx}; // 0 || b := x_{rlx}; // 0
```



not SC-consistent!

### SC-per-location

#### Definition (SC-per-location)

*G* is satisfies *SC-per-location* if for every location x, there exists a relation  $S_x$  s.t. the following hold:

- $\triangleright$  S<sub>x</sub> is a total order on the events of G that access x.
- ightharpoonup (po  $\cup$  rf);  $S_x = \emptyset$ .
- ▶ If  $\langle a,b\rangle \in \mathtt{rf}$  then there does not exist  $c \in \mathtt{W}_x$  such that  $\langle a,c\rangle \in \mathtt{S}_x$  and  $\langle c,b\rangle \in \mathtt{S}_x$ .

Namely, the following is disallowed:

$$\begin{array}{c} \text{rf} \\ \mathbb{W} \times \begin{array}{c} -- \mathbb{W} \times -- \mathbb{R} \times \\ \mathbb{S}_{\times} \end{array} \end{array}$$

# SC-per-location: Example 1

$$x = y = 0$$
  
 $x :=_{rlx} 1;$   $y :=_{rlx} 1;$   
 $a := y_{rlx}; // 0 || b := x_{rlx}; // 0$ 



satisfies SC-per-location!

# SC-per-location: Example 2

$$x = 0$$
  
 $x :=_{rlx} 1;$   $\| x :=_{rlx} 2;$   
 $a := x_{rlx}; // 2 \| b := x_{rlx}; // 1$ 



does not satisfy SC-per-location!

# Release/Acquire synchronization

#### SC-per-location is often *too weak*:

▶ It does not support the message passing idiom:





Also: we cannot implement locks.

```
int y = 0;
int x = 0;
y = 42; || if(x == 1){
x = 1; || print(y);
```

```
int y = 0;
int x = 0;
y = 42; | if(x == 1){
x = 1; | race print(y);
}
```

```
int y = 0;
 int x = 0;
```

```
int y = 0;
                                               atomic < int > x = 0;
y = 42; | if(x == 1){

x = 1; | race | print(y); | x =<sub>rlx</sub> 1; | print(y); | }
```

```
int y = 0;
int x = 0;
```

```
int y = 0;
                                                     atomic < int > x = 0;
y = 42; if (x == 1) { y = 42; if (x_{rlx} == 1) { x = 1; race print(y); } x =_{rlx} 1; race print(y);
```

```
int y = 0;
 int x = 0;
```

```
int y = 0;
                                                   atomic<int> x = 0;
y = 42; if (x == 1) { y = 42; if (x_{rlx} == 1) { x = 1; race print(y); x =_{rlx} 1; race print(y);
```

```
int y = 0;
      atomic<int> x = 0;
y = 42;
x =<sub>rel</sub> 1; || if(x<sub>acq</sub> == 1){
    print(y);
}
```

```
int y = 0;
 int x = 0;
```

```
int y = 0;
                                                   atomic<int> x = 0;
y = 42; if (x == 1) { y = 42; if (x_{rlx} == 1) { x = 1; race print(y); x =_{rlx} 1; race print(y);
```

```
int y = 0;
 atomic<int> x = 0;
```

```
int y = 0;
 int x = 0;
```

```
int y = 0;
                                                   atomic<int> x = 0;
y = 42; if (x == 1) { y = 42; if (x_{rlx} == 1) { x = 1; race print(y); x =_{rlx} 1; race print(y);
```

```
int y = 0;
     atomic<int> x = 0;
y = 42; | if(x<sub>acq</sub> == 1){
x = rel 1 | print(y);
```

```
int y = 0;
 int x = 0;
```

```
int y = 0;
                                              atomic<int> x = 0;
y = 42; if (x == 1) { y = 42; if (x_{rlx} == 1) { x = 1; race print(y); x =_{rlx} 1; race print(y);
```

```
int y = 0;
atomic<int> x = 0;
```

```
int y = 0;
                                            atomic<int> x = 0;
y = 42; if (x_{acq} == 1) { y = 42; if (x_{rlx} == 1) { fence_{rel}; } x =_{rlx} 1; print(y);
```

```
int y = 0;
 int x = 0;
```

```
int y = 0;
                                             atomic<int> x = 0;
y = 42; if (x == 1) { y = 42; if (x_{rlx} == 1) { x = 1; race print(y); x =_{rlx} 1; race print(y);
```

```
int y = 0;
atomic<int> x = 0;
```

```
int y = 0;
                                           atomic<int> x = 0;
y = 42; if (x_{acq} == 1) { y = 42; if (x_{rlx} == 1) { fence_{rel}; } x =_{rlx} 1; print(y);
```

```
int y = 0;
        int x = 0;
y = 42; if(x == 1){
x = 1; race print(y);
}
```

```
int y = 0;
            atomic<int> x = 0;
y = 42; if (x_{rlx} == 1) {
 x =_{rlx} 1; race print(y);
```

```
int y = 0;
atomic<int> x = 0;
```

```
int y = 0;
                                                atomic<int> x = 0;
y = 42; if (x_{acq} == 1) { y = 42; if (x_{rlx} == 1) { fencerel; sw print(y); x =_{rlx} 1; print(y);
```

# The "synchronizes-with" relation



# The "synchronizes-with" relation: Release sequences





▶ Note: the latter case will be deprecated in C++20.

## The "happens-before" relation

# Definition (happens-before) $\frac{a \xrightarrow{po} b}{a \xrightarrow{hb} b} \qquad \frac{a \xrightarrow{sw} b}{a \xrightarrow{hb} b} \qquad \frac{a \xrightarrow{hb} b}{a \xrightarrow{hb} c}$

- hb should be acyclic.
- ► The SC-per-location orders should never contradict hb.



#### SC accesses and fences

#### Store buffer

```
x :=_{rel} 1; y :=_{rel} 1; a := y_{acq}; // 0 b := x_{acq}; // 0
```



How to guarantee only SC behaviors (i.e.,  $a = 1 \lor b = 1$ )?

#### SC accesses and fences

#### Store buffer

$$x :=_{rel} 1;$$
  $y :=_{rel} 1;$   $a := y_{acq}; // 0$   $b := x_{acq}; // 0$ 



How to guarantee only SC behaviors (i.e.,  $a = 1 \lor b = 1$ )?

► The semantics of SC atomics is the *most complicated* part of the model.

- The semantics of SC atomics is the most complicated part of the model.
- ► C/C++11 provides too strong semantics (a correctness problem!)

```
a := x_{acq}; //1 \ b := y_{sc}; //0 \ x :=_{sc} 1; \ y :=_{sc} 1; \ c := y_{acq}; //1 \ d := x_{sc}; //0
```

- The semantics of SC atomics is the most complicated part of the model.
- ► C/C++11 provides too strong semantics (a correctness problem!)

$$a := x_{acq}; //1 \ b := y_{sc}; //0 \ x :=_{sc} 1; \ y :=_{sc} 1; \ c := y_{acq}; //1 \ d := x_{sc}; //0$$

▶ In addition, its semantics for SC fences is too weak.

$$a := x_{acq}; \ /\!\!/ 1$$
 $fence_{sc};$ 
 $b := y_{acq}; \ /\!\!/ 0$ 
 $x :=_{rel} 1;$ 
 $y :=_{rel} 1;$ 
 $c := y_{acq}; \ /\!\!/ 1$ 
 $fence_{sc};$ 
 $d := x_{acq}; \ /\!\!/ 0$ 

- ► The semantics of SC atomics is the most complicated part of the model.
- ► C/C++11 provides too strong semantics (a correctness problem!)

$$a := x_{acq}; //1 \ b := y_{sc}; //0 \ x :=_{sc} 1; \ y :=_{sc} 1; \ c := y_{acq}; //1 \ d := x_{sc}; //0$$

In addition, its semantics for SC fences is too weak.

```
a := x_{acq}; \ /\!\!/ 1
fence<sub>sc</sub>;
b := y_{acq}; \ /\!\!/ 0
x :=_{rel} 1;
y :=_{rel} 1;
c := y_{acq}; \ /\!\!/ 1
fence<sub>sc</sub>;
d := x_{acq}; \ /\!\!/ 0
```

Recently, the standard committee fixed the specification following: [Repairing Sequential Consistency in C/C++11. L, Vafeiadis, Kang, Hur, Dreyer. PLDI'17] The "out-of-thin-air" problem

# Load-buffering $a := x; \ /\!\!/ 1 \qquad \qquad |\!\!| \quad b := y; \ /\!\!/ 1$ $y := 1; \qquad \qquad |\!\!| \quad x := b;$

 $\verb|non-atomic| \qquad \boxed{\texttt{relaxed}} \quad \boxed{} \quad \verb|release/acquire| \quad \boxed{} \quad \verb|sc||}$ 

#### Load-buffering

$$a := x; \ // 1$$
  $b := y; \ // 1$   $y := 1;$   $x := b;$ 



 $\verb|non-atomic| \qquad \boxed{\texttt{relaxed}} \quad \boxed{} \quad \verb|release/acquire| \quad \boxed{} \quad \verb|sc||}$ 

#### Load-buffering

$$a := x; \ // 1$$
  $b := y; \ // 1$   $y := 1;$   $x := b;$ 



#### Load-buffering

$$a := x; \ // 1$$
  $b := y; \ // 1$   $y := 1;$   $x := b;$ 



#### Load-buffering

$$a := x; \ // 1$$
  $b := y; \ // 1$   $y := 1;$   $x := b;$ 



 $\verb|non-atomic| \qquad \boxed{\texttt{relaxed}} \qquad \boxed{} \qquad \verb|release/acquire| \qquad \boxed{} \qquad \verb|sc||$ 

#### Load-buffering

$$a := x; //1$$
  $b := y; //1$   $y := 1;$   $x := b;$ 

C/C++11 allows this behavior because **POWER & ARM allow it!** 

#### Load-buffering + data dependency

$$a := x; //1$$
  $b := y; //1$   $y := a;$   $x := b;$ 



relaxed □ release/acquire non-atomic sc

#### Load-buffering

$$a := x$$
; // 1

$$a := x; //1$$
  $b := y; //1$   $y := 1;$   $x := b;$ 

$$y := 1;$$

C/C++11 allows this behavior because POWER & ARM allow it!

#### Load-buffering + data dependency

$$a := x; // 1$$
  
 $y := a;$ 

$$b := y; // 1$$

$$y := a$$

$$x := b$$
;

$$C/C++11$$
 allows this behavior



relaxed non-atomic □ release/acquire sc

#### Load-buffering

$$a := x$$
; // 1

$$a := x; //1$$
  
 $y := 1;$   $b := y; //1$   
 $x := b;$ 

$$y:=1$$
;

C/C++11 allows this behavior because POWER & ARM allow it!

#### Load-buffering + data dependency

$$a := x; // 1$$
  
 $y := a;$ 

$$b := y; // 1$$

$$y := a$$

$$x := b$$
;

C/C++11 allows this behavior Values appear out-of-thin-air! (no hardware/compiler exhibit this behavior)



non-atomic

□ relaxed

□ release/acquire

SC

#### Load-buffering + control dependency

$$a := x; // 1$$
  
**if**  $(a = 1)$   
 $y := 1;$ 

$$b := y; // 1$$
  
**if**  $(b = 1)$   
 $x := 1;$ 



 $\verb|non-atomic| \qquad \boxed{\texttt{relaxed}} \quad \boxed{} \quad \verb|release/acquire| \quad \boxed{} \quad \verb|sc||}$ 

#### Load-buffering + control dependency

$$a := x; // 1$$
  
if  $(a = 1)$   
 $y := 1;$   
 $b := y; // 1$   
if  $(b = 1)$   
 $x := 1;$ 

C/C++11 allows this behavior



#### Load-buffering + control dependency

$$a := x; \ /\!\!/ 1$$
  
**if**  $(a = 1)$   
 $y := 1;$   
 $b := y; \ /\!\!/ 1$   
**if**  $(b = 1)$   
 $x := 1;$ 

C/C++11 allows this behavior **The DRF guarantee is broken!** 



□ SC

$$a := x; //1$$
  $b := y; //1$  if  $(a = 1)$ 

$$b := y; // 1$$
  
if  $(b = 1)$ 

$$\begin{bmatrix} x = y = 0 \end{bmatrix}$$

### The three examples have the same execution graph!

The DRF guarantee is broken!



#### The hardware solution

Keep track of syntactic dependencies and forbid dependency cycles.

# Load-buffering $a := x; \ // 1 \qquad || \quad b := y; \ // 1 \qquad || \quad y := 1; \qquad || \quad x := b;$

Load-buffering 
$$+$$
 data dependency  $a:=x; \ \# 1 \ y:=a; \ \# x:=b;$ 



#### The hardware solution

Keep track of syntactic dependencies and forbid dependency cycles.

#### Load-buffering

$$a := x; // 1$$

$$b := y; // 1$$
  
  $x := b;$ 

$$y := 1;$$

#### Load-buffering + data dependency

$$a := x; \ /\!\!/ 1$$

$$b := y; // 1$$

$$y := a;$$

$$b := y; // 3$$

$$a := x; //1$$
  
 $y := a + 1 - a;$   $b := y; //1$   
 $x := b;$ 

$$x := b$$
;



#### The hardware solution

Keep track of syntactic dependencies and forbid dependency cycles.

#### Load-buffering

$$a := x; //1$$
  
 $y := 1;$ 

$$a := x; //1$$
  $b := y; //1$   $y := 1;$   $x := b;$ 

$$a := x \cdot // 1$$

$$a := x; \ // 1$$
  $b := y; \ // 1$   $y := a;$   $x := b;$ 

$$y := a;$$

$$a := x; //1$$
  
 $y := a + 1 - a;$   $b := y; //1$   
 $x := b;$ 

$$b := y$$
; //



syntactic dependency

This approach is not suitable for a programming language: Compilers do not preserve syntactic dependencies.

### The "out-of-thin-air" problem

#### C/C++11 is too weak

- ► Values might appear out-of-thin-air.
- ► The *DRF guarantee* is broken.

#### The C++14 standard states:

"Implementations should ensure that no "out-of-thin-air" values are computed that circularly depend on their own computation."

"Defined" by examples.

#### Solution

#### A straightforward solution

- ▶ Disallow po ∪ rf cycles!
- On weak hardware it carries a certain implementation cost.

[Ou and Demsky. Towards understanding the costs of avoiding out-of-thin-air results. OOPSLA'18]

#### Solution

#### A straightforward solution

- Disallow po ∪ rf cycles!
- On weak hardware it carries a certain implementation cost.

[Ou and Demsky. Towards understanding the costs of avoiding out-of-thin-air results. OOPSLA'18]

#### RC11 (Repaired C11) model

[L, Vafeiadis, Kang, Hur, Dreyer. PLDI'17]

- (Modified) compilation schemes are correct.
- DRF holds and no OOTA-values.
- ► Model checking [Kokologiannakis, L, Sagonas, Vafeiadis. POPL'18] http://plv.mpi-sws.org/rcmc/

#### Solution

#### A straightforward solution

- Disallow po ∪ rf cycles!
- On weak hardware it carries a certain implementation cost.

[Ou and Demsky. Towards understanding the costs of avoiding out-of-thin-air results. OOPSLA'18]

#### RC11 (Repaired C11) model

[L, Vafeiadis, Kang, Hur, Dreyer. PLDI'17]

- (Modified) compilation schemes are correct.
- DRF holds and no OOTA-values.
- Model checking [Kokologiannakis, L, Sagonas, Vafeiadis. POPL'18] http://plv.mpi-sws.org/rcmc/
- Solving the problem without changing the compilation schemes will require a major revision of the standard.

# A 'promising' solution to OOTA

[Kang, Hur, L, Vafeiadis, Dreyer. POPL'17]

# A 'promising' solution to OOTA

[Kang, Hur, L, Vafeiadis, Dreyer. POPL'17]

**Key idea:** Start with an operational interleaving semantics, but allow threads to **promise** to write in the future.

# Store-buffering $\begin{aligned} x &= y = 0 \\ x &= 1; \\ a &= y; \ \# 0 \end{aligned} \quad \begin{aligned} y &= 1; \\ b &= x; \ \# 0 \end{aligned}$

# 

# Memory $\langle x:0@0\rangle$ $\langle y:0@0\rangle$

$$T_1$$
's view  $\frac{x}{0} = \frac{y}{0}$ 

$$\begin{array}{ccc} T_2 \text{'s view} \\ \hline x & y \\ \hline 0 & 0 \end{array}$$

▶ Global memory is a pool of messages of the form

⟨location : value @ timestamp⟩

# 

# Memory $\langle x:0@0\rangle$ $\langle y:0@0\rangle$ $\langle x:1@5\rangle$

$$\begin{array}{c|c}
T_1' \text{s view} & T_2' \\
\hline
x & y \\
\hline
0 & 0
\end{array}$$

 $\begin{array}{ccc}
T_2's & view \\
x & y \\
\hline
0 & 0
\end{array}$ 

Global memory is a pool of messages of the form

⟨location : value @ timestamp⟩

# Store-buffering x = y = 0 x = 1; a = y; # 0 b = x; # 0

# Memory $\langle x:0@0\rangle$ $\langle y:0@0\rangle$ $\langle x:1@5\rangle$ $\langle y:1@5\rangle$

$$\begin{array}{c|c}
T_1' \text{s view} & T_2' \text{s view} \\
\hline
x & y \\
\hline
0 & \chi \\
5 & 5
\end{array}$$

Global memory is a pool of messages of the form

⟨location : value @ timestamp⟩



# Memory $\langle x:0@0\rangle$ $\langle y:0@0\rangle$ $\langle x:1@5\rangle$ $\langle y:1@5\rangle$

$$\begin{array}{c|c}
T_1 \text{'s view} & T \\
\hline
x & y \\
\hline
0 & 5
\end{array}$$

$$\begin{array}{c|c}
T_2's \text{ view} \\
\hline
x & y \\
\hline
0 & X \\
\hline
5
\end{array}$$

Global memory is a pool of messages of the form

⟨location : value @ timestamp⟩

# Store-buffering $\begin{array}{c|c} x=y=0 \\ x=1; & y=1; \\ a=y; \ \#0 \end{array}$

# Memory $\langle x:0@0\rangle$ $\langle y:0@0\rangle$ $\langle x:1@5\rangle$ $\langle y:1@5\rangle$

$$\begin{array}{c|c}
T_1's \text{ view} \\
\hline
x & y \\
\hline
0 & 0 \\
5
\end{array}$$

Global memory is a pool of messages of the form

⟨location : value @ timestamp⟩

#### Store-buffering

$$x = y = 0$$
  
 $x = 1;$   $y = 1;$   $b = x; // 0$ 

#### Memory

\(\langle x : 0@0\)
\(\langle y : 0@0\)
\(\langle x : 1@5\)
\(\langle y : 1@5\)

 $T_1$ 's view x y

 $T_2$ 's view  $\begin{array}{ccc} x & y \\ \hline 0 & \chi \\ \hline & 5 \end{array}$ 

#### Coherence Test

$$x = 0$$
  
 $x := 1;$   $x := 2;$   
 $a = x;$   $/\!\!/ 2$   $b = x;$   $/\!\!/ 1$ 

#### Store-buffering

$$x = y = 0$$
  
 $x = 1;$   $y = 1;$   
 $a = y; \# 0$   $b = x; \# 0$ 

# Memory

$$\langle x : 0@0 \rangle$$
  
 $\langle y : 0@0 \rangle$   
 $\langle x : 1@5 \rangle$   
 $\langle y : 1@5 \rangle$ 

$$T_1$$
's view

$$\begin{array}{c|c} T_1 \text{'s view} & T_2 \text{'s view} \\ \hline x & y \\ \hline x & 0 \\ \hline 5 & 5 \end{array}$$

#### Coherence Test

$$x = 0$$
 $x = 1;$ 
 $a = x; // 2$ 
 $x = 0;$ 
 $b = x; // 1$ 

# Memory

Memory 
$$\langle x:0@0\rangle$$

$$T_1$$
's view

0

$$T_2$$
's view  $\frac{x}{0}$ 

#### Store-buffering

$$x = y = 0$$
  
 $x = 1;$   $y = 1;$   
 $a = y; \# 0$   $b = x; \# 0$ 

# Memory

 $\langle x:0@0\rangle$  $\langle y:0@0\rangle$  $\langle x:105\rangle$  $\langle y: 1@5 \rangle$ 

$$T_1$$
's view

 $\begin{array}{c|c} x & y \\ \hline & x & y \\ \hline & 0 & x \\ \hline & 0 & x \\ \hline \end{array}$ 

$$T_2$$
's view  $x$   $y$ 

#### Coherence Test

$$x = 0$$
 $x := 1;$ 
 $\Rightarrow a = x; // 2$ 
 $x := 2;$ 
 $b = x; // 1$ 

### Memory

 $\langle x:0@0\rangle$  $\langle x:105\rangle$ 

# $T_1$ 's view

$$T_2$$
's view

#### Store-buffering

$$x = y = 0$$
  
 $x = 1;$   $y = 1;$   $b = x; \# 0$ 

# Memory

$$\langle x : 0@0 \rangle$$
  
 $\langle y : 0@0 \rangle$   
 $\langle x : 1@5 \rangle$   
 $\langle y : 1@5 \rangle$ 

$$\mathcal{T}_1$$
's view

$$\begin{array}{c|c} T_1\text{'s view} & T_2\text{'s view} \\ \hline x & y & \hline & 0 & \hline & x & y \\ \hline 5 & & 5 & \\ \end{array}$$

#### Coherence Test

$$x = 0$$
  
 $x := 1;$   $x := 2;$   
 $a = x; //2$   $b = x; //1$ 

### Memory

 $\langle x:0@0\rangle$  $\langle x:105\rangle$ 

 $\langle x:2@7\rangle$ 

 $T_1$ 's view

 $T_2$ 's view

#### Store-buffering

$$x = y = 0$$
  
 $x = 1;$   $y = 1;$   $b = x;$  #0

### Memory

 $\langle x : 0@0 \rangle$   $\langle y : 0@0 \rangle$   $\langle x : 1@5 \rangle$  $\langle y : 1@5 \rangle$ 

$$T_1$$
's view

x y X 0 5

$$T_2$$
's view  $\frac{x}{0}$ 

#### Coherence Test

$$x = 0$$
  
 $x := 1;$   $x := 2;$   
 $a = x; // 2$   $\Rightarrow b = x; // 1$ 

#### Memory

 $\langle x:0@0\rangle$  $\langle x:1@5\rangle$  $\langle x:2@7\rangle$ 

X X 7

 $T_1$ 's view

$$T_2$$
's view

X X 7

#### Store-buffering

$$x = y = 0$$
  
 $x = 1;$   $y = 1;$   $b = x;$  # 0

### Memory

$$\langle x : 0@0 \rangle$$
  
 $\langle y : 0@0 \rangle$   
 $\langle x : 1@5 \rangle$   
 $\langle y : 1@5 \rangle$ 

$$\mathcal{T}_1$$
's view

$$\begin{array}{c|c} T_1\text{'s view} & T_2\text{'s view} \\ \hline x & y \\ \hline x & 0 \\ \hline 5 & 5 \end{array}$$

#### Coherence Test

$$x = 0$$
  
 $x := 1;$   $x := 2;$   $x :=$ 

#### Memory

 $\langle x:0@0\rangle$  $\langle x:105\rangle$  $\langle x:2@7\rangle$ 

# $T_1$ 's view

# $T_2$ 's view

# Load-buffering $a := x; \ /\!\!/ 1 \ y := 1; \ x := y;$

- ► To model load-store reordering, we allow "promises".
- At any point, a thread may promise to write a message in the future, allowing other threads to read from the promised message.

#### Load-buffering

► 
$$a := x; // 1$$
  
 $y := 1;$  ►  $x := y;$ 

$$\langle x:0@0\rangle$$
  
 $\langle y:0@0\rangle$ 

$$T_1$$
's view  $\frac{x}{0} = \frac{y}{0}$ 

$$T_2$$
's view  $\frac{x}{0} = \frac{y}{0}$ 

- ► To model load-store reordering, we allow "promises".
- At any point, a thread may promise to write a message in the future, allowing other threads to read from the promised message.

#### Load-buffering

$$lackbr{\triangleright} a := x; // 1 \ y := 1;$$
  $\blacktriangleright x := y;$ 

$$\langle x:0@0\rangle$$
$$\langle y:0@0\rangle$$
$$\langle y:1@5\rangle$$

$$T_1$$
's view  $X$   $Y$ 

$$T_2$$
's view  $\frac{x}{0} = \frac{y}{0}$ 

- ► To model load-store reordering, we allow "promises".
- At any point, a thread may promise to write a message in the future, allowing other threads to read from the promised message.

#### Load-buffering

$$lackbr{\triangleright} a := x; // 1 \ y := 1;$$
  $\blacktriangleright x := y;$ 

$$\langle x:0@0\rangle$$
$$\langle y:0@0\rangle$$
$$\langle y:1@5\rangle$$

$$T_1$$
's view  $\frac{x}{0}$ 

- ► To model load-store reordering, we allow "promises".
- At any point, a thread may promise to write a message in the future, allowing other threads to read from the promised message.

#### Load-buffering

$$\langle x:0@0\rangle$$
  
 $\langle y:0@0\rangle$   
 $\langle y:1@5\rangle$   
 $\langle x:1@5\rangle$ 

$$T_1$$
's view

$$T_2$$
's view
$$\begin{array}{c|c} x & y \\ \hline & & \\ \hline & & \\ \hline & 5 & 5 \end{array}$$

- ► To model load-store reordering, we allow "promises".
- At any point, a thread may promise to write a message in the future, allowing other threads to read from the promised message.



Memory 
$$\langle x:0@0\rangle$$
  $\langle y:0@0\rangle$   $\langle y:1@5\rangle$   $\langle x:1@5\rangle$ 

$$T_1$$
's view
$$\begin{array}{c|c} x & y \\ \hline & 0 \\ \hline & 5 \end{array}$$

- ► To model load-store reordering, we allow "promises".
- At any point, a thread may promise to write a message in the future, allowing other threads to read from the promised message.





$$T_1$$
's view
$$\begin{array}{c|c}
x & y \\
\hline
 & X \\
\hline
 & 5 \\
\end{array}$$

- ► To model load-store reordering, we allow "promises".
- At any point, a thread may promise to write a message in the future, allowing other threads to read from the promised message.

#### Load-buffering

#### Memory

$$\langle x:0@0\rangle$$
  
 $\langle y:0@0\rangle$   
 $\langle y:1@5\rangle$   
 $\langle x:1@5\rangle$ 

#### Load-buffering + dependency

$$a := x; //1 y := a;$$
  $x := y;$ 

Must not admit the same execution!

#### Load-buffering

#### Load-buffering + dependency

$$a := x; //1 \ y := a;$$
  $x := y;$ 

#### Key Idea

A thread can only promise if it can perform the write anyway (even without having made the promise)

# Certified promises

#### Thread-local certification

A thread can promise to write a message, if it can *thread-locally certify* that its promise will be fulfilled.

# Certified promises

#### Thread-local certification

A thread can promise to write a message, if it can *thread-locally certify* that its promise will be fulfilled.

#### Load-buffering

$$a := x; //1 y := 1;$$
  $x := y;$ 

### Load buffering + fake dependency

$$a := x; //1$$
  
 $y := a + 1 - a;$   $x := y;$ 

 $T_1$  may promise y := 1, since it is able to write y := 1 by itself.

### Load buffering + dependency

$$a := x; //1 y := a; || x := y;$$

 $T_1$  may **NOT** promise y := 1, since it is not able to write y := 1 by itself.

## Is this behavior possible?

$$a := x$$
; // 1  $x := 1$ ;

### Is this behavior possible?

```
a := x; // 1
x := 1;
```

#### No.

Suppose the thread promises x := 1. Then, once a := x reads 1, the thread view is increased and so the promise cannot be fulfilled.

### Is this behavior possible?

$$a := x; \ // 1 \ || \ y := x; \ || \ x := y;$$

### Is this behavior possible?

$$a := x; //1 \ x := 1;$$
  $y := x;$   $x := y;$ 

Yes. And the ARMv7 model allows it!

### Is this behavior possible?

$$a := x; \ // 1 \ | \ y := x; \ | \ x := y;$$

#### Yes. And the ARMv7 model allows it!

This behavior can be also explained by sequentialization:

42

### Is this behavior possible?

$$a := x; //1 \ x := 1;$$
  $y := x;$   $x := y;$ 

#### Yes. And the ARMv7 model allows it!

This behavior can be also explained by sequentialization:

$$a := x; \ // 1$$
 $\sim x := 1;$ 
 $y := x;$ 
 $x := y;$ 
 $x := x; \ // 1$ 
 $x := y;$ 
 $y := 1;$ 

42

### Is this behavior possible?

```
\begin{array}{c} b := y_{rlx}; \\ \textbf{if } (b = 42) \\ c := "if"; \\ \textbf{else} \\ y :=_{rlx} a; \\ y :=_{rlx} b; \\ \textbf{print } (c); \ /\!\!/ \ \text{prints "} if" \end{array}
```

Is this behavior possible?

```
\begin{array}{c} b := y_{\mathtt{rlx}}; \\ \textbf{if } (b = 42) \\ c := "if"; \\ \textbf{else} \\ y :=_{\mathtt{rlx}} a; \\ y :=_{\mathtt{rlx}} b; \\ \textbf{print } (c); \ /\!\!/ \ \text{prints } "if" \end{array}
```

Yes. And it can obtained by compiler optimizations!

The full model (POPL'17)

We have extended this basic idea to handle:

- ► Atomic updates (e.g., CAS, fetch-and-add)
- ► Release/acquire fences and accesses
- Release sequences
- SC fences
- ▶ Plain accesses (C11's non-atomics & Java's normal accesses)

#### Results

- ► No "out-of-thin-air" values
- DRF guarantees
- Compiler optimizations (incl. reorderings, eliminations)
- ► Efficient h/w mappings (x86-TSO, Power, ARM)

The full model (POPL'17)

We have extended this basic idea to handle:

- Atomic updates (e.g., CAS, fetch-and-add)
- ► Release/acquire fences and accesses
- Release sequences
- SC fences
- ► Plain accesses (C11's non-atomics & Java's normal accesses)



The **Coq** proof assistant



#### Results

- ► No "out-of-thin-air" values
- DRF guarantees
- Compiler optimizations (incl. reorderings, eliminations)
- ► Efficient h/w mappings (x86-TSO, Power, ARM)

The full model (POPL'17)

We have extended this basic idea to handle:

- ► Atomic updates (e.g., CAS, fetch-and-add)
- ► Release/acquire fences and accesses
- Release sequences
- SC fences
- Plain accesses
  (C11's non-atomics & Java's normal accesses)



The **Coq** proof assistant



#### Results

- ► No "out-of-thin-air" values
- DRF guarantees
- ► Compiler optimizations (incl. reorderings, eliminations)
- ► Efficient h/w mappings (x86-TSO, Power, ARM)

# An intermediate memory model [Podkopaev, L, Vafeiadis POPL'18]



- ► A *common denominator* of existing models
- Formulated in the *declarative style*
- Simplifies compilation correctness proofs

## Summary



- ► The challenges in designing a WMM.
- ► The C/C++11 model.
- ightharpoonup C/C++11 is broken:
  - Most problems are locally fixable.
  - But ruling out OOTA requires an entirely different approach.
- The promising model may be the solution.

# Summary



- The challenges in designing a WMM.
- ► The C/C++11 model.
- ightharpoonup C/C++11 is **broken**:
  - Most problems are locally fixable.
  - But ruling out OOTA requires an entirely different approach.
- The promising model may be the solution.

## Thank you!

http://www.cs.tau.ac.il/~orilahav/