

# **HERO: Heterogeneous Research Platform**

Open-Source HW/SW Platform for R&D of Heterogeneous SoCs 21.01.2019

Andreas Kurth and the PULP Team led by Prof. Luca Benini



<sup>1</sup>Department of Electrical, Electronic and Information Engineering

> **ETH** zürich <sup>2</sup>Integrated Systems Laboratory

### Heterogeneous Systems on Chip (HeSoCs)





Nvidia Tegra X1 (source: Nvidia)

Apple A12 (source: TechInsights)



Andreas Kurth | 20.01.2019 | 2

There are **many open questions** in various areas of computer engineering:

- programming models, task distribution and scheduling,
- memory organization, communication, synchronization,
- accelerator architectures and granularity, ...

But there was no **research platform for heterogeneous SoCs**!



### **HERO: Heterogeneous Research Platform**

### **Heterogeneous Hardware Architecture**



#### **Heterogeneous Software Stack**

- single-source, single-binary cross compilation toolchain
- OpenMP 4.5
- shared virtual memory for Host and PMCA





### **HERO: Hardware Architecture**





### bigPULP on FPGA: Configurable, Modifiable and Expandable

**Configurable:** 



### Modifiable and expandable:

- All components are open-source and written in industry-standard SystemVerilog.
- Interfaces are either standard (mostly AXI) or simple (e.g., stream-payload).
- New components can be easily added to the memory map.



# **bigPULP: Distinguishing Features**

Scalable and efficient multi-cluster atomic transactions (RISC-V 'A' extension) to shared L2 memory



- Atomic transactions: RI5CY with 'A' decoder, additional signals through cluster and SoC bus, transactions executed atomically at L2 SPM
- Scalable SVM: Two-level software-managed TLB ("RAB"); TLB misses signaled back to RI5CY and DMA; handled in SW with lightweight HW support



### **HERO: Software Architecture**

Allows to write programs that start on the host but seamlessly integrate the PMCAs.



- Offloads with OpenMP 4.5 target semantics, zero-copy (pointer passing) or copy-based
- Integrated cross-compilation and single-binary linkage
- PMCA-specific runtime environment and hardware abstraction libraries (HAL)



### **HERO: Heterogeneous Cross-Compilation Toolchain**

• OpenMP offloading with the GCC toolchain requires a host compiler plus one target compiler for each PMCA ISA in the system.



- A target compiler requires both compiler and runtime extensions.
- HERO includes the first non-commercial heterogeneous crosscompilation toolchain.



### **HERO: FPGA Platforms**

| Property                      | <b>ARM Juno</b> (with a Xilinx Virtex-7 2000T)               | Xilinx Zynq UltraScale+ ZU9EG | Xilinx Zynq Z-4045        |
|-------------------------------|--------------------------------------------------------------|-------------------------------|---------------------------|
| Host CPU                      | 64-bit ARMv8 big.LITTLE                                      | 64-bit ARMv8 quad-core A53    | 32-bit ARMv7 dual-core A9 |
| Shared main memory            | 8 GiB DDR3L                                                  | 2 GiB DDR4                    | 1 GiB DDR3                |
| PMCA clock frequency          | 30 MHz                                                       | 150 MHz                       | 50 MHz                    |
| # of RISC-V PEs               | 64 in 8 clusters                                             | 16 in 2 cluster               | 8 in 1 cluster            |
| Integer DSP unit              | private per PE                                               |                               |                           |
| L1 SPM                        | 256 KiB in 16 banks                                          |                               |                           |
| Instruction cache             | 8 KiB in 8 single-ported banks 4 KiB in 4 multi-ported banks |                               |                           |
| Slices used by clusters       | 80%                                                          | 63%                           | 65%                       |
| Slices used by infrastructure | 7%                                                           | 15%                           | 12%                       |
| BRAMs used by clusters        | 89%                                                          | 55%                           | 70%                       |
| BRAMs used by infrastructure  | 6%                                                           | 12%                           | 13%                       |
| Price                         | 25 000 \$                                                    | 2500 \$                       | 2500\$                    |



### **HERO:** Roadmap



```
git clone --recursive \
https://github.com/pulp-platform/hero-sdk
cd hero-sdk; git checkout v1.1.0
```

Check README.md for prerequisites and install them.

```
./hero-z-7045-builder -A
```



# **Questions?**

www.pulp-platform.org

@pulp\_platform

PULP Parallel Ultra Low Power

Florian Zaruba<sup>2</sup>, Davide Rossi<sup>1</sup>, Antonio Pullini<sup>2</sup>, Francesco Conti<sup>1</sup>, Michael Gautschi<sup>2</sup>, Frank K. Gürkaynak<sup>2</sup>, Florian Glaser<sup>2</sup>, Stefan Mach<sup>2</sup>, Giovanni Rovere<sup>2</sup>, Igor Loi<sup>1</sup> Davide Schiavone<sup>2</sup>, Germain Haugou<sup>2</sup>, Manuele Rusci<sup>1</sup>, Alessandro Capotondi<sup>1</sup>, Giuseppe Tagliavini<sup>1</sup>, Daniele Palossi<sup>2</sup>, Andrea Marongiu<sup>1,2</sup>, Fabio Montagna<sup>1</sup>, Simone Benatti<sup>1</sup>, Eric Flamand<sup>2</sup>, Fabian Schuiki<sup>2</sup>, Andreas Kurth<sup>2</sup>, Luca Benini<sup>1,2</sup>



<sup>1</sup>Department of Electrical, Electronic and Information Engineering

**ETHZÜRICH** <sup>2</sup>Integrated Systems Laboratory