



# Scalable Crash Consistency for Secure Persistent Memory

Ming Zhang, Yu Hua, Xuan Li, Hao Xu Huazhong University of Science and Technology, China

# Persistent Memory (PM)



1

# Persistent Memory (PM)















8







































<sup>[1]</sup> Persist instruction sequence, e.g., clwb + sfence



[1] Persist instruction sequence, e.g., clwb + sfence

| Design            | Confidentiality | Integrity | Atomicity for a group of updates | Atomicity of data and its security metadata |
|-------------------|-----------------|-----------|----------------------------------|---------------------------------------------|
| SCA@HPCA'18       | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                              |
| SuperMem@MICRO'19 | ~               | ×         | $\checkmark$                     | Data + Counter                              |



| Design            | Confidentiality | Integrity | Atomicity for a group of updates | Atomicity of data and its security metadata |
|-------------------|-----------------|-----------|----------------------------------|---------------------------------------------|
| SCA@HPCA'18       | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                              |
| SuperMem@MICRO'19 | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                              |

SCA@HPCA'18

- Write-back counter cache
- New primitives required
  - CounterAtomicity
  - counter\_cache\_writeback()
- → Limited portability



| Design            | Confidentiality | Integrity | Atomicity for a group of updates | Atomicity of data and its security metadata |
|-------------------|-----------------|-----------|----------------------------------|---------------------------------------------|
| SCA@HPCA'18       | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                              |
| SuperMem@MICRO'19 | ✓               | ×         | $\checkmark$                     | Data + Counter                              |

SCA@HPCA'18

- Write-back counter cache
- > New primitives required
  - CounterAtomicity
  - counter\_cache\_writeback()
- → Limited portability





| Design            | Confidentiality | Integrity | Atomicity for a group of updates | Atomicity of data and its security metadata |
|-------------------|-----------------|-----------|----------------------------------|---------------------------------------------|
| SCA@HPCA'18       | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                              |
| SuperMem@MICRO'19 | ✓               | ×         | $\checkmark$                     | Data + Counter                              |

SCA@HPCA'18

- Write-back counter cache
- > New primitives required
  - CounterAtomicity
  - counter\_cache\_writeback()
- → Limited portability





| Design            | Confidentiality | Integrity | Atomicity for a group of updates | Atomicity of data and its security metadata |
|-------------------|-----------------|-----------|----------------------------------|---------------------------------------------|
| SCA@HPCA'18       | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                              |
| SuperMem@MICRO'19 | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                              |

SCA@HPCA'18

- Write-back counter cache
- > New primitives required
  - CounterAtomicity
  - counter\_cache\_writeback()
- → Limited portability



SuperMem@MICRO'19

- Write-through counter cache
- A register appends <data+counter> to write queue
  - Application transparent  $\rightarrow$  Good portability •



| Design            | Confidentiality | Integrity | Atomicity for a group of updates | Atomicity of data and its security metadata |
|-------------------|-----------------|-----------|----------------------------------|---------------------------------------------|
| SCA@HPCA'18       | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                              |
| SuperMem@MICRO'19 | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                              |

SCA@HPCA'18

- Write-back counter cache
- New primitives required
  - CounterAtomicity
  - counter\_cache\_writeback()

#### ➔ Limited portability



SuperMem@MICRO'19

- Write-through counter cache
- A register appends <data+counter> to write queue
  - Application transparent → Good portability
- Limited scalability





- Security and crash consistency for PM
- Goal

| Design            | Confidentiality | Integrity | Atomicity for a group of updates | Atomicity of data and its<br>security metadata |
|-------------------|-----------------|-----------|----------------------------------|------------------------------------------------|
| SCA@HPCA'18       | $\checkmark$    | ×         | $\checkmark$                     | Data + Counter                                 |
| SuperMem@MICRO'19 | ✓               | ×         | $\checkmark$                     | Data + Counter                                 |
| Our Secon         | ✓               | ✓         | ✓                                | Data + Counter + CMAC                          |





 Scalable write-through security metadata cache

- Move BMT update to the background
- Transaction-specific epoch persistency model
  - Minimize ordering constraints
    between logs and data

### Security metadata writereduction schemes

 Mitigate the writes caused by counters and CMACs





 Scalable write-through security metadata cache

- Move BMT update to the background
- Transaction-specific epoch persistency model
  - Minimize ordering constraints
    between logs and data
- Security metadata writereduction schemes
  - Mitigate the writes caused by counters and CMACs





- Scalable write-through
  security metadata cache
  - Move BMT update to the background
- Transaction-specific epoch persistency model
  - Minimize ordering constraints
    between logs and data
- Security metadata writereduction schemes
  - Mitigate the writes caused by counters and CMACs





- Scalable write-through
  security metadata cache
  - Move BMT update to the background
- Transaction-specific epoch persistency model
  - Minimize ordering constraints
    between logs and data

### Security metadata writereduction schemes

 Mitigate the writes caused by counters and CMACs





• **Observation:** PM always has a consistent copy of data by logging



- **Observation:** PM always has a consistent copy of data by logging
  - In the log region or data region



- **Observation:** PM always has a consistent copy of data by logging
  - In the log region or data region
- Persist the tuple of <data, counter, CMAC> in advance



- **Observation:** PM always has a consistent copy of data by logging
  - In the log region or data region
- Persist the tuple of <data, counter, CMAC> in advance
  - Release the register early to process the next independent write request



- **Observation:** PM always has a consistent copy of data by logging
  - In the log region or data region
- Persist the tuple of <data, counter, CMAC> in advance
  - Release the register early to process the next independent write request



- Observation: PM always has a consistent copy of data by logging
  - In the log region or data region
- Persist the tuple of <data, counter, CMAC> in advance
  - Release the register early to process the next independent write request
  - Move BMT update to the background



- Observation: PM always has a consistent copy of data by logging
  - In the log region or data region
- Persist the tuple of <data, counter, CMAC> in advance
  - Release the register early to process the next independent write request
  - Move BMT update to the background



• Guarantee the consistency between on-chip BMT root and off-chip counters after a crash



- Guarantee the consistency between on-chip BMT root and off-chip counters after a crash
  - Pending BMT update queue (In MC) which CMAC is updated



- Guarantee the consistency between on-chip BMT root and off-chip counters after a crash
  - Pending BMT update queue (In MC) which CMAC is updated
  - Counter track bitmap (In ADR<sup>[1]</sup> of MC) which counter is updated •



- Guarantee the consistency between on-chip BMT root and off-chip counters after a crash
  - Pending BMT update queue (In MC) which CMAC is updated
  - Counter track bitmap (In ADR<sup>[1]</sup> of MC) which counter is updated •

[Example] The counter and CMAC of current write request are respectively mc<sub>1</sub> and CMAC<sub>1</sub>



- Guarantee the consistency between on-chip BMT root and off-chip counters after a crash
  - Pending BMT update queue (In MC) which CMAC is updated
  - Counter track bitmap (In ADR<sup>[1]</sup> of MC) which counter is updated •

[Example] The counter and CMAC of current write request are respectively mc<sub>1</sub> and CMAC<sub>1</sub>



- Guarantee the consistency between on-chip BMT root and off-chip counters after a crash
  - Pending BMT update queue (In MC) which CMAC is updated
  - Counter track bitmap (In ADR<sup>[1]</sup> of MC) which counter is updated

[Example] The counter and CMAC of current write request are respectively mc1 and CMAC1



- Guarantee the consistency between on-chip BMT root and off-chip counters after a crash
  - Pending BMT update queue (In MC) which CMAC is updated
  - Counter track bitmap (In ADR<sup>[1]</sup> of MC) which counter is updated

[Example] The counter and CMAC of current write request are respectively mc<sub>1</sub> and CMAC<sub>1</sub>



- Guarantee the consistency between on-chip BMT root and off-chip counters after a crash
  - Pending BMT update queue (In MC) which CMAC is updated
  - Counter track bitmap (In ADR<sup>[1]</sup> of MC) which counter is updated

[Example] The counter and CMAC of current write request are respectively mc<sub>1</sub> and CMAC<sub>1</sub>





- Guarantee the consistency between on-chip BMT root and off-chip counters after a crash
  - Pending BMT update queue (In MC) which CMAC is updated
  - Counter track bitmap (In ADR<sup>[1]</sup> of MC) which counter is updated

[Example] The counter and CMAC of current write request are respectively mc<sub>1</sub> and CMAC<sub>1</sub>



- Guarantee the consistency between on-chip BMT root and off-chip counters after a crash
  - Pending BMT update queue (In MC) which CMAC is updated
  - Counter track bitmap (In ADR<sup>[1]</sup> of MC) which counter is updated

[Example] The counter and CMAC of current write request are respectively mc1 and CMAC1



#### **Unnecessary ordering constraints**



#### **Unnecessary ordering constraints**



[1] A transaction without pre-defined write set



• Log (A) and Log (B) are independent, but ordered



#### **Unnecessary ordering constraints**



Log (A) Write (A) Log (B) Write (B)

- Log (A) and Log (B) are independent, but ordered
- Write (A) and Write (B) are independent, but ordered



[1] A transaction without pre-defined write set

#### **Unnecessary ordering constraints**



A dynamic transaction<sup>[1]</sup>

[1] A transaction without pre-defined write set



- Log (A) and Log (B) are independent, but ordered
- Write (A) and Write (B) are independent, but ordered
- → LogB (or DataB) waits for the BMT updates of LogA (or DataA)



#### Epoch Persistency Model<sup>[1]</sup>



[1] Memory persistency@ISCA'14

- A program is divided by memory barrier (e.g., sfence)
  - All writes in one epoch are persisted w/o order
  - Different epochs are persisted in order



#### Epoch Persistency Model<sup>[1]</sup>



A dynamic transaction

[1] Memory persistency@ISCA'14[2] A transaction with pre-defined write set

- A program is divided by memory barrier (e.g., sfence)
  - All writes in one epoch are persisted w/o order
  - Different epochs are persisted in order

→ Efficient in static transactions<sup>[2]</sup> since only one barrier is needed



#### Epoch Persistency Model<sup>[1]</sup>



A dynamic transaction

[1] Memory persistency@ISCA'14[2] A transaction with pre-defined write set

- A program is divided by memory barrier (e.g., sfence)
  - All writes in one epoch are persisted w/o order
  - Different epochs are persisted in order
- Efficient in static transactions<sup>[2]</sup> since only one barrier is needed
  Inefficient in dynamic transactions due to many barriers



#### Epoch Persistency Model<sup>[1]</sup>



A dynamic transaction

[1] Memory persistency@ISCA'14[2] A transaction with pre-defined write set

- A program is divided by memory barrier (e.g., sfence)
  - All writes in one epoch are persisted w/o order
  - Different epochs are persisted in order

Efficient in static transactions<sup>[2]</sup> since only one barrier is needed
 Inefficient in dynamic transactions due to many barriers









**Our Transaction-specific Epoch Persistency Model** 

۲



Paired epoch: Two adjacent epochs are paired





- **Paired epoch:** Two adjacent epochs are paired
  - Writes in one pair are persisted in epoch order





- *Paired epoch:* Two adjacent epochs are paired
  - Writes in one pair are persisted in epoch order
  - Different pairs are persisted w/o order





- *Paired epoch:* Two adjacent epochs are paired
  - Writes in one pair are persisted in epoch order
  - Different pairs are persisted w/o order





- **<u>Paired epoch</u>**: Two adjacent epochs are paired ٠
  - Writes in one pair are persisted in epoch order
  - Different pairs are persisted w/o order ٠





- *Paired epoch:* Two adjacent epochs are paired
  - Writes in one pair are persisted in epoch order
  - Different pairs are persisted w/o order







- **<u>Paired epoch</u>**: Two adjacent epochs are paired ٠
  - Writes in one pair are persisted in epoch order
  - Different pairs are persisted w/o order ٠







- *Paired epoch:* Two adjacent epochs are paired
  - Writes in one pair are persisted in epoch order
  - Different pairs are persisted w/o order







- *Paired epoch:* Two adjacent epochs are paired
  - Writes in one pair are persisted in epoch order
  - Different pairs are persisted w/o order



- ➔ Efficient in both static and dynamic transactions
- ➔ Minimize ordering constraints

#### **Implementations**





**Co-locate log and counter** 



**Co-locate log and counter** 

When writing data to PM

8-bit 16-bit 48-bit 1-word Write 1 TID TxID Addr Data undo log entry Write 2 mc 7-bit minor-counter



**Co-locate log and counter** 

When writing data to PM



Write a minor-counter together with a log entry





#### **Coalesce BMT blocks**

















TION

# **Performance Evaluation**

Model Secon using Gem5 and NVMain

| Design                          | Description                                                                                      | Benchmark | Description                                    |
|---------------------------------|--------------------------------------------------------------------------------------------------|-----------|------------------------------------------------|
| WB                              | An ideal write-back scheme                                                                       | Array     | Swap two random entries in an array            |
|                                 |                                                                                                  | Queue     | Enqueue/dequeue random entries in a queue      |
| WT                              | A standard write-through                                                                         | Btree     | Insert/delete random nodes in a B-tree         |
| SuperMem<br>[MICRO'19]<br>Secon | schemeA write-optimized write-<br>through scheme using<br>our BMT coalescingOur proposed schemes | Hash      | Insert/delete random items in a hash table     |
|                                 |                                                                                                  | RBtree    | Insert/delete random nodes in a red-black tree |
|                                 |                                                                                                  | YCSB      | Cloud benchmark. 100% update                   |
|                                 |                                                                                                  | TPCC      | OLTP benchmark. Use the New-Order transaction  |



# **Transaction Throughput**



- Move BMT update to the background
- Eliminate unnecessary ordering constraints



## Write Traffic



- Log and counter co-locating
- BMT block coalescing



# Conclusion

- Security and crash consistency are important for persistent memory
- Existing approaches suffer from low scalability
- Our solution: Secon
  - Scalable write-through security metadata cache
    - Move BMT update to the background
  - Transaction-specific epoch persistency model
    - Minimize ordering constraints
  - Security metadata write-reduction schemes
    - Enhance endurance





# Thanks! Q&A