

# An Integrated Simulation Tool for Computer Architecture and Cyber-Physical Systems

Hokeun Kim<sup>1,2</sup>, Armin Wasicek<sup>3</sup>, and Edward A. Lee<sup>1</sup>

### <sup>1</sup>University of California, Berkeley <sup>2</sup>LinkedIn Corp. <sup>3</sup>Technical University Vienna

CyPhy'17, Seoul, Korea



Sponsored by the TerraSwarm Research Center, one of six centers administered by the STARnet phase of the Focus Center Research Program (FCRP) a Semiconductor Research Corporation program sponsored by MARCO and DARPA.



- Many tools used for CPS modeling and simulation employs a simplified timing model for "cyber" part of CPS
  - Example tools: OpenModelica, Ptolemy II
  - E.g., computation time, communication delay
- These tools are useful
  - Faster than simulating or emulating cyber part
  - Enough for CPS simulation in many cases
- But, sometimes we need more than just simplified computation & communication models



# Motivation 1 – Side Channels

- Side channel attacks
  - Gaining information by leveraging physical implementation of computer systems
  - E.g., power analysis



Brier, Clavier, and Olivier. "Correlation power analysis with a leakage model.", CHESS 2004.



# Motivation 1 – Side Channels

## Cold boot attack on DRAMs

Freeze the DRAM memory of the running system to prevent the data from decaying



Shamir and Someren, "Playing Hide and Seek with Stored Keys", FC 99 (Conference on Financial Cryptography )

Read out data and look for high entropy in data (cryptographic key)



Halderman, J.A., et al., "Lest we remember: Cold-boot attacks on encryption keys." Communications of the ACM, 2009



CPS classes

### - Involve a lot of hands-on experiments



e.g. EECS149.1x, Cyber-Physical Systems at UC Berkeley <u>https://www.edx.org/course/cyber-</u> <u>physical-systems-uc-berkeleyx-eecs149-1x</u>

Cyber part

- MOOC for CPS classes?
  - Not like other CS classes
  - Accurate model for CPS would help



- Building a CPS simulator supporting accurate computer architecture model
- Demonstration of an open-source integrated simulation tool for CPS and computer architecture
- Case study using DRAM power and thermal modeling



- The gem5 architecture simulator (from UMich)
  - Open-source powerful, modular, flexible and widely used both in academia and industry
- Characteristics
  - Object-oriented, discrete-event



- Modular components (CPUs, Memories, Buses, Interconnects), easily interchangeable
- Simulated system = collection of objects



- Ptolemy II
  - An open-source software for research on cyber-physical systems



- Developed at UC Berkeley since 1996
- Supports modeling of both the cyber part (computation, communication) & physical process (continuous dynamics)
- Quite stable, easy to learn and use (supports GUI, one can build a model by drawing components)
- Based on actor-oriented design
- More information on http://ptolemy.org



## Actor-Oriented Design in Ptolemy II

- Actors
  - Concurrently executed components
  - Interact with other actors through input/output ports
  - Model computation, communication, physical processes, etc.
- Directors
  - Implement Models of Computation (MoCs)
  - Orchestrate behavior of actors, for example, when each actor should be executed (=fired)
- Actor hierarchy
  - An actor can have sub-atctors

#### TerraSwarm Research Center



Opaque CompositeActor

Transparent CompositeActor

Claudius Ptolemaeus, Editor, System Design, Modeling, and Simulation Using Ptolemy II, Ptolemy.org, 2014.



## Model of Computation (MoC)

 A set of rules orchestrating behavior of actors (e.g., when to execute actors, how actors react to inputs)





# Background – DRAM Model

- DRAM thermal model by Lin et al. (ISCA`07)
  - Power is proportional to throughput (GB/s)
  - Factors that affect DRAM temperature



Lin et al., "Thermal modeling and management of DRAM memory systems" ISCA '07









# Approach – Configuring gem5 Simulator

- Implementation of gem5
  - Python high-level object configuration & simulation
  - C++ low-level object implementation (for performance)
- The gem5 Simulator python scripts
  - Modify execution scripts for periodic execution
  - gem5 runs for given cycles and stops
  - Resume after Ptolemy II model runs
- DRAM component
  - Add DPRINTF functions to DRAM component
  - Print out command and cycle information











## Approach – Communication between gem5 & Ptolemy II





### Communication between gem5 & Ptolemy II





### Communication between gem5 & Ptolemy II





### Communication between gem5 & Ptolemy II





### Communication between gem5 & Ptolemy II







# Approach – A DRAM Behavioral Model in Ptolemy II





### – DRAM Power & Thermal Model in Ptolemy II





- CMOS Device power = Static power + Dynamic power  $P_{device} = P_{DRAM\_static} + P_{DRAM\_dynamic}$
- DRAM dynamic power  $\propto$  Throughput



Equations & coefficients are from Lin et al., ISCA`07



DRAM stable temperature from DRAM power

Ambient temperatureThermal resistance (temperature / power),<br/>constant, measured value $T_{AMB} = T_A + P_{AMB} \times \Psi_{AMB} + P_{DRAM} \times \Psi_{DRAM}$ Stable<br/>temperatures $T_{DRAM} = T_A + P_{AMB} \times \Psi_{AMB}$ 

• Current DRAM temperature

$$T(t + \Delta t) - T(t) = (T_{stable} - T(t))(1 - e^{-\Delta t})$$
  
Current temperature  
Current temperature  
Current temperature

Equations & coefficients are from Lin et al., ISCA`07



#### – DRAM Power & Thermal Model in Ptolemy II







TerraSwarm Research Center



Experiments and Results – Experimental Setup

- Experimented on
  - Different cache configurations
  - Different software workloads
- To measure
  - Average DRAM/AMB power
  - Peak DRAM/AMB temperature reached during simulation (0.1 sec in simulated time)



Experiments and Results – Experimental Setup

- gem5 configurations (except caches)
  - -ISA ARM
  - CPU Type TimingSimpleCPU: Stalls on every load memory access.
  - Clock rate CPU: 1GHz / System: 1GHz
  - Off-chip DRAM memory: DDR3 SDRAM with a data rate of 1600MHz and a bus width of 16 bits.
  - Cache block size 64 bytes



# Experiments and Results – Power and Temperature Results

- Benchmark
  - Top 5 memory-intensive programs from MiBench
    - where memory-intensity is defined as # memory accesses (read+write) / instruction

| <b>MiBench</b><br>programs | writes | reads  | total instructions<br>executed | memory<br>intensity (%) |
|----------------------------|--------|--------|--------------------------------|-------------------------|
| cjpeg_large                | 6,183  | 74,966 | 1,000,000                      | 8.11                    |
| rijndael_large             | 2,558  | 68,458 | 1,000,000                      | 7.1                     |
| typeset_small              | 12,843 | 55,963 | 1,000,000                      | 6.88                    |
| dijkstra_large             | 4,942  | 59,198 | 1,000,000                      | 6.41                    |
| patricia_large             | 4,255  | 49,198 | 1,000,000                      | 5.35                    |



# Experiments and Results – Power and Temperature Results

• Power and temperature results for different cache configurations

– (workload: cjpeg\_large)

| Cache size options (KB) |     | Average power (mW) |       | Maximum temperature<br>increase (10 <sup>-6</sup> °C) |      |
|-------------------------|-----|--------------------|-------|-------------------------------------------------------|------|
| L1 (I/D)                | L2  | DRAM               | AMB   | DRAM                                                  | AMB  |
| 16                      | N/A | 1,057              | 4,027 | 2.67                                                  | 6.05 |
| 32                      | N/A | 1,023              | 4,011 | 2.63                                                  | 5.93 |
| 64                      | N/A | 1,000              | 4,008 | 2.46                                                  | 5.51 |
| 32                      | 128 | 996                | 4,006 | 2.17                                                  | 4.86 |
| 32                      | 256 | 995                | 4,006 | 1.99                                                  | 4.47 |



• Temperature results for different workloads

- (Cache configuration: L1: 16kB, L2: N/A)



TerraSwarm Research Center



- Gem5 tuned for tool integration
  - <u>https://github.com/gem5-ptolemy/</u>
- Ptolemy II
  - <u>http://ptolemy.org</u>
    - Version 11.0 Development version
  - Case study example model:
    - ptolemy/actor/lib/gem5/demo/DramThermalModel.xml



- Summary
  - The gem5 architecture simulator is integrated into Ptolemy II as an computer architectural aspects with higher accuracy
  - Experiments show usefulness of the approach
- Future work
  - More architectural information from gem5
  - More applications for the proposed approach
- For more information
  - <u>https://github.com/gem5-ptolemy/</u>
  - <u>http://ptolemy.org</u>