

## Working with RISC-V

Part 1 of 4: Introduction to RISC-V ISA

Frank K. Gürkaynak Luca Benini

<kgf@ee.ethz.ch>

<lbenini@iis.ee.ethz.ch>













### **Summary**

- Part 1 Introduction to RISC-V ISA
  - What is RISC-V about
  - Description of ISA, and basic principles
  - Simple 32b implementation (lbex by LowRISC)
  - How to extend the ISA (CV32E40P by OpenHW group)
- Part 2 Advanced RISC-V Architectures
- Part 3 PULP concepts
- Part 4 PULP based chips







### Few words about myself









- Started by UC-Berkeley in 2010
- Contract between SW and HW
  - Partitioned into user and privileged spec
  - External Debug
- Standard governed by RISC-V foundation
  - **ETHZ** is a founding member of the foundation
  - Necessary for the continuity
- Defines 32, 64 and 128 bit ISA
  - No implementation, just the ISA
  - Different implementations (both open and close source)
- At ETH Zurich we specialize in efficient implementations of RISC-V cores





Debug





## RISC-V maintains basically a PDF document





## ISA defines the instructions that processor uses



https://godbolt.org/



## **RISC-V Ecosystem**

- Binutils upstream
- GCC upstream
- LLVM upstream
- Simulator:
  - "Spike" reference
  - QEMU, Gem5
- OpenOCD

- OS
  - Linux, sel4, freeRTOS, zephyr
- Runtimes
  - Jikes, Ocaml, Go
- SW maintained by different parties
  - Binutils and GCC by Sifive a Berkeley start-up







### RISC-V ISA is divided into extensions

- Integer instructions (frozen)
- E Reduced number of registers
- Multiplication and Division (frozen)
- A Atomic instructions (frozen)
- F Single-Precision Floating-Point (frozen)
- Double-Precision Floating-Point (frozen)
- C Compressed Instructions (frozen)
- X Non Standard Extensions

- Kept very simple and extendable
  - Wide range of applications from IoT to HPC
- RV + word-width + extensions
  - RV32IMC: 32bit, integer, multiplication, compressed
- User specification:
  - Separated into extensions, only I is mandatory
- Privileged Specification (WIP):
  - Governs OS functionality: Exceptions, Interrupts
  - Virtual Addressing
  - Privilege Levels







- Foundation members work in task-groups
- Dedicated task-groups
  - Formal specification
  - Memory Model
  - Marketing
  - External Debug Specification
- ETH Zurich also contributes
  - Bit manipulation
  - Packed SIMD

- Q Quad-precision Floating-Point
- L Decimal Floating Point
- **B** Bit Manipulation
- Transactional Memory
- P Packed SIMD
- J Dynamically Translated Languages
- V Vector Operations
- N User-Level Interrupts



## What is so special about RISC-V

RISC-V base ISAs have either little-endian or big-endian memory systems, with the privileged architecture further defining bi-endian operation. Instructions are stored in memory as a sequence of 16-bit little-endian parcels, regardless of memory system endianness. Parcels forming one instruction are stored at increasing halfword addresses, with the lowest-addressed parcel holding the lowest-numbered bits in the instruction specification.

We originally chose little-endian byte ordering for the RISC-V memory system because little-endian systems are currently dominant commercially (all x86 systems; iOS, Android, and Windows for ARM). A minor point is that we have also found little-endian memory systems to be more natural for hardware designers. However, certain application areas, such as IP networking,

- Major design decisions have been properly motivated and explained
- Reserved space for extensions, modular
- Open standard, you can help decide how it is developed





## The FREEDOM in RISC-V is implementation

- You can access all ISAs without (many) restrictions
  - SW tools need to be developed so that they can generate code for that ISA
- Most ISAs are closed. Only specific vendors can implement it
  - To use a core that implements an ISA, you have to license/buy it from vendor
  - Open source SW (for the ISA) is possible but building HW is not allowed

RISC-V

Integer Register-Register Operations

RV32I defines several arithmetic R-type operations. All operations read the rs1 and rs2 registers as source operands and write the result into register rd. The funct7 and funct3 fields select the type of operation.

| 31      | 25 | 24 20 | 19 19 | 5 14 13     | 2 11 7 | 7 6 0  |  |
|---------|----|-------|-------|-------------|--------|--------|--|
| funct7  |    | rs2   | rsl   | funct3      | rd     | opcode |  |
| 7       |    | 5     | 5     | 3           | 5      | 7      |  |
| 0000000 |    | src2  | src1  | ADD/SLT/SLT | U dest | OP     |  |

C2.9

Add without Carry.

Syntax

 $ADD{S}{cond} {Rd}, Rn, Operand2$ ADD{cond} {Rd}, Rn, #imm12; T32, 32-bit encoding only

ADD





### Are RISC-V processors better than XYZ?

- Actual performance depends on the implementation
  - RISC-V does not specify implementation details (on purpose)
- Modern design, should deliver comparable performance
  - If implemented well, it should perform as good as other modern ISA implementations
  - In our experiments, we see no weaknesses when compared to other ISAs
  - It also is not magically 2x better
- High-end processor performance is not so much about ISA
  - Implementation details like technology capabilities, memory hierarchy, pipelining, and power management are more important.





## What is not so good about RISC-V?

#### Still in development

- Some standards (privilege, vector, debug etc.) still being refined, adjusted.
- Tools and development environment needs to catch up.

#### No canonical implementation (the RISC-V core)

■ It is free to implement, so many people did so, resulting in many cores

#### Higher end (out of order, superscalar) cores not yet mature

- In theory there is nothing to prevent a RISC-V based Linux laptop.
- It will take some more time until RISC-V implementations can compete with other commercial processors (which needed hundreds of man months of work).





## Hzürich

## Reduced Instruction Set: all in one page

Free & Open Reference Card

| base intege                |          |           |          |            |               |            |        |                | ΛV      | FIIVE    | regeu .       | enser e | CHOIIS     |         |
|----------------------------|----------|-----------|----------|------------|---------------|------------|--------|----------------|---------|----------|---------------|---------|------------|---------|
| Category Name              | e Fmt    |           | RV32I Ba | ase        | +RV           | {64,128}   |        | Catego         | ry      |          | Name          | R       | V mnemo    | onic    |
| .oads Load Byte            | e I      | LB        | rd,rs1   | .imm       |               |            |        | CSR Ac         |         | Aton     | nic R/W       | CSRRW   | rd,csr     | rsl     |
| Load Halfwor               |          | LH        | rd,rs1   |            |               |            |        |                |         |          | Set Bit       |         | rd,csr     |         |
| Load Wor                   |          | LW        | rd,rsl   |            | L{D Q}        | rd,rs1,    | i mm   |                |         |          | lear Bit      |         | rd,csr     |         |
| Load Byte Unsigne          |          | LBU       | rd,rs1   |            | 2(2/2)        | ra,ror,    |        | Act            | At:     |          |               |         | rd,csr     |         |
|                            |          | 2020      |          |            |               |            |        |                |         | D IN     |               |         |            |         |
| Load Half Unsigne          |          | LHU       | rd,rs1   |            | L{W D}U       | rd,rs1,    | 1.mm   |                | c Read  | 5 6      | A V n n       |         | e,csr      |         |
| Stores Store Byte          |          | SB        | rs1,rs   |            |               |            |        | Atomic         |         | Cleur I  | Jit _m_n      | CodPo   | Ld, csr    | ,imm    |
| Store Halfwor              |          | SH        | rsl,rs   |            |               |            |        | Change         |         |          | nv. Call      |         |            |         |
| Store Wor                  | d S      | SW        | rs1,rs   | 2,imm      | S{D Q}        | rs1,rs2    | , imn  | Env            | vironme | ent Br   |               | FIRFAR  |            |         |
| Shifts Shift Lef           | R        | SLL       | rd,rs1   | .rs2       | SLL{W D}      | rd,rs1,    | rs2    |                | Enviro  | onme     | MOIN          | 0 [ = 1 |            |         |
| Shift Left Immediat        |          | SLLI      | rd,rs1   |            |               |            |        | Trap Re        |         |          | nenvisor      | MDTIS   |            |         |
| Shift Righ                 |          | SRL       | rd,rs1   |            | SRL(W D)      | rd,rs1,    |        |                |         |          | pervisor      |         |            |         |
|                            |          | ISRLI     |          |            |               |            |        |                |         |          |               |         |            |         |
| Shift Right Immediat       | e        | SRLI      | rd,rs1   |            | SRLI{W D}     |            |        | Hypervis       |         |          |               |         |            |         |
| Shift Right Arithmeti      |          |           | rd,rsl   |            | SRA{W D}      | rd,rs1,    |        | Interru        |         |          |               |         |            |         |
| Shift Right Arith Imr      |          | SRAI      | rd,rs1   |            | SRAI (W D)    | rd,rs1,    |        | MMU            | Sur     | ervisor  | FENCE         | SFENCE  | .VM rsl    |         |
| Arithmetic ADI             | 5 0      | ADD       | rd,rsl   | ,rs2       | ADD{W D}      | rd, rs1,   | rs2    |                |         |          |               |         |            |         |
| ADD Immediat               |          | ADDI      | rd,rs1   | ,imm       | ADDI{W D}     | rd,rsl,    | imm    |                |         |          |               |         |            |         |
| SUBtrac                    | t re     | SUB       | rd, rsl  | ,rs2       |               |            | _      |                |         |          |               |         |            |         |
| Load Upper Imr             |          |           | rd,imm   |            | Ontic         | nal Con    | nres   | sed (1)        | 5-hit)  | Inst     | ruction       | Evto    | nsion: I   | 2VC     |
|                            |          |           |          |            |               |            |        | Seu (10        |         |          | uctioi        |         |            |         |
| Add Upper Imm to P         |          |           |          |            | Category      |            | Fmt    |                | RV      |          |               |         | VI equiva  |         |
| Logical XOR                |          | XOR       | rd,rs1   |            |               | oad Word   | CL     | C.LW           |         | rsl',    |               |         | ,rs1',i    |         |
| XOR Immediat               | e ı      | IXORI     | rd,rs1   | ,imm       | Loa           | d Word SP  | CI     | C.LWSP         | rd,i    | .mm      |               | LW rd,  | sp,imm*    | 4       |
| OF                         | 2        | OR        | rd,rs1   | ,rs2       | Lo            | ad Double  | CL     | C.LD           | rd',    | rsl',    | imm           | LD rd'  | ,rsl',in   | nm*8    |
| OR Immediat                | 0        | ORI       | rd,rs1   |            | Load          | Double SP  | CI     | C.LDSP         | rd,i    |          |               |         | sp,imm*    |         |
| AN                         |          | AND       | rd,rsl   |            |               | Load Quad  |        | C.LO           |         | rsl',    |               |         | rsl',i     |         |
| AND Immediat               |          | ANDI      | rd,rsl   |            |               | d Quad SP  |        | C.LOSP         | rd,i    |          |               |         | sp,imm*    |         |
| Compare Set <              |          |           | rd,rs1   |            | Stores S      |            | CS     | C.EUSP<br>C.SW |         | ,rs2'    |               |         | ',rs2',:   |         |
|                            |          | SLT       |          |            |               |            |        |                |         |          |               |         |            |         |
| Set < Immediat             |          | SLTI      | rd,rs1   |            |               | e Word SP  |        | C.SWSP         | rs2,    |          |               |         | ,sp,imm    |         |
| Set < Unsigne              |          | SLTU      | rd,rs1   |            |               | ore Double |        | C.SD           |         | ,rs2'    |               |         | ',rs2',    |         |
| Set < Imm Unsigne          | d C      | SLTIU     | rd,rsl   | ,imm       | Store         | Double SP  | CSS    | C.SDSP         | rs2,    | imm      |               | SD rs2  | ,sp,imm    | *8      |
| Branches Branch =          | Cir      | BEQ       | rsl,rs   | 2.imm      | 9             | tore Quad  | CS     | C.SO           | rs1'    | ,rs2'    | .imm          | so rsl  | ',rs2',    | i.mm*16 |
| Branch :                   | £ (d)    | BNE       | rsl,rs   |            |               | e Quad SP  | (3)    | 2000           | T-62    |          |               |         | ,sp,imm    |         |
| Branch                     | e pn     | BLT       | rs1,rs   |            | Arithmeti     |            | C      | 6161           | 1016    | es       | 131           | D D     | rd,rd,r    |         |
| Branch                     |          | BGE       | rsl,rs   |            |               | ADD Word   | CR     | C.ADDW         |         | d,rsl    |               | ADDW    | rd,rd,i    |         |
| Branch < Unsigne           |          | BLTU      |          |            | ADD           | immediate  |        | C.ADDI         |         | d,imm    |               | ADOT    | rd,rd,i    |         |
|                            |          |           | rsl,rs   |            |               |            | l CI   | C.ADDI         | 4       | d, imm   |               |         | rd, rd, ii |         |
| Branch ≥ Unsigne           |          | BGEU      | rs1,rs   |            |               | Word In    | 115    |                |         |          | 15            | A DIW   |            |         |
| Jump & Link J&             |          | JAL       | rd, imm  |            | ADD SP        |            | 1      | J.A.           | S       | to, drum |               | AD.     | sp,sp,i    |         |
| Jump & Link Registe        |          | JALR      | rd,rsl   | ,imm       |               | P Imm * 4  |        | C.ADDI4        |         |          |               | ADDI    | rd',sp,    |         |
| Synch Synch thread         |          | FENCE     |          |            |               | Immediate  |        | C.LI           |         | d,imm    |               | ADDI    | rd,x0,i    | nm      |
| Synch Instr & Dat          |          | FENCE     | .1       |            | Load L        | Jpper Imm  |        | C.LUI          | ž.      | d,imm    |               | LUI     | rd, imm    |         |
| System System CALL         | . I      | SCALL     |          |            |               | MoVe       | CR     | C.MV           | Y       | d,rs1    |               | ADD     | rd,rsl,    | x0      |
| System BREA                | < I      | SBREA     | K        |            |               | SUB        | CR     | C.SUB          |         | d,rs1    |               | SUB     | rd,rd,r    |         |
| Counters ReaD CYCL         |          | RDCYC     |          | d          | Shifts Shif   |            |        | C.SLLI         |         | d,imm    |               | SLLI    | rd,rd,i    |         |
| ReaD CYCLE upper Ha        | _        | RDCYC     |          |            | Branches      |            | CB     | C.BEQZ         |         | sl',i    |               | BEQ     | rsl',x0    |         |
| ReaD TIM                   |          | RDTIM     |          |            | - Tunciles    | Branch≠0   |        | C.BNEZ         |         |          |               | BNE     |            |         |
|                            |          |           |          |            | Jump          | Jump       | CJ     |                |         | sl',i    |               |         | rs1',x0    | , inuit |
| ReaD TIME upper Ha         |          | RDTIM     |          |            |               |            |        | C.J            |         | mm       |               | JAL     | x0,imm     |         |
| ReaD INSTR RETire          |          | RDINS     |          |            |               | p Register | CR     | C.JR           |         | d,rsl    |               | JALR    | x0,rs1,    | 0       |
| ReaD INSTR upper Ha        | F        | IRDINS    | TRETH r  | d          | Jump & L      |            | CJ     | C.JAL          |         | .mm      |               | JAL     | ra,imm     |         |
|                            |          |           |          | Jump & Lir | ık Register   | CR         | C.JALR | 1              | sl      |          | JALR ra,rs1,0 |         |            |         |
|                            | System E | nv. BREAK | CI       | C.EBREA    | AK            |            |        | EBREAR         |         |          |               |         |            |         |
| 32-bit Instruction Formats |          |           |          |            |               |            |        |                |         |          |               |         |            |         |
|                            |          |           |          |            |               |            | CR     | 15 14 13       | 12      | 11 10    | 9 8 7         | 6 5     | 4 3 2      | 1 0     |
| 31 30 25                   |          |           |          |            | 11 8 7        | 6 0        |        | funct          |         |          | /rsl          | T       | rs2        | op      |
| R funct7                   |          | s2        | rsl      | funct3     | rd            | opcode     | CI     | funct3         | imm     | rd       | /rsl          |         | imm        | op      |
| imm[11                     |          |           | rsl      | funct3     | rd            | opcode     | CSS    | funct3         |         | imm      |               |         | rs2        | op      |
| S imm[11:5]                | T:       | s2        | rs1      | funct3     | imm[4:0]      | opcode     | CIW    | funct3         |         | i        | mm            |         | rd'        | op      |
| SB imm[12] imm[10:5]       | n        | s2        | rsl      | funct3     | imm[4:1] imm[ |            | CL     | funct3         | im      | m        | rs1'          | imm     | rd'        | op      |
|                            |          |           |          |            |               |            |        |                |         |          |               |         |            |         |

|    | 31 3         |          | 5 24 | 21     | 20   | 10  | 15 14 | 12 11 |           | 7       |      | 0                  | CR . | 15 14 13 | 12   | 11 10 | 9 8 1 | 6 : | í |
|----|--------------|----------|------|--------|------|-----|-------|-------|-----------|---------|------|--------------------|------|----------|------|-------|-------|-----|---|
| -  |              |          | D 24 |        | 20   | 19  |       |       |           | -       |      | <u> </u>           |      | func     | t4   | ro    | rs1   |     |   |
| R  | fun          | t7       |      | rs2    |      | rsl | func  | t3    | ro        | i       | opco |                    | CI   | funct3   | imm  | re    | rs1   |     | i |
| 1  | imm[11:0]    |          |      |        |      | rsl | func  | t3    | ro        |         | opco | de                 | CSS  | funct3   | imm  |       |       | _   |   |
| s  | imm 11:5 rs2 |          |      | rs1    | func | t3  | imm   | 4:0   | opcode Cl |         |      | funct3             |      | i        | mm   |       | П     |     |   |
| SB | imm[12]      | mm[10:5] |      | rs2    |      | rsl | func  | t3 in | nm[4:1]   | imm[11] | opco | de                 | CL   | funct3   | im   | m     | rs1'  | imm | 1 |
| U  |              |          | im   | m[31:1 | 2    | -   |       |       | ro        | 1       | opco | de                 | CS   | funct3   | im   |       | rs1'  | imm | ı |
| UJ |              |          |      |        |      | _   | ro    |       | opco      |         | СВ   | funct3             | off  | set      | rs1' |       | 0     |     |   |
| -  |              |          |      |        |      |     | _     |       |           |         | CJ   | funct3 jump target |      |          | rget |       |       |     |   |
|    |              |          |      |        |      |     |       |       |           |         |      |                    | CJ . |          |      |       |       |     |   |

RISC-V Integer Base (RV321/641/128I), privileged, and optional compressed extension (RVC). Registers x1-x31 and the pc are 32 bits wide in RV32I, 64 in RV64I, and 128 in RV128I (x0=0). RV64I/128I add 10 instructions for the wider formats. The RVI base of <50 classic integer RISC instructions is required. Every 16-bit RVC instruction matches an existing 32-bit RVI instruction. See risc.org.

|             |                                     | _       |          | _             |                       | _         | _              | _       |               |                                 |
|-------------|-------------------------------------|---------|----------|---------------|-----------------------|-----------|----------------|---------|---------------|---------------------------------|
|             |                                     |         | _        | Series and    | a production)         | DIVIDE    | anourue        | non L   | atelledic law |                                 |
| Category    | Name                                | Fmt     | 1        | RV32M (I      | Multiply-Div          | ide)      |                | +RV{(   | 54,128}       |                                 |
| Multiply    | MULtiply                            | R       | MUL      |               | rd,rsl                |           | MUL{W D        | }       | rd,rs1,rs2    |                                 |
|             | ULtiply upper Half                  | R       | MULH     |               | rd,rsl                |           |                |         |               |                                 |
|             | iply Half Sign                      | Πň      | 1772     |               | Divi                  |           | MM             |         |               |                                 |
|             | oly upper Hal                       | LL.     | LU 3     | ЛУИ           |                       | ue        |                |         |               |                                 |
| Divide      | DIVide<br>DIVide Unsigned           | l K     | DIV      |               | rd,rsl                |           | DIA(M D        | }       | rd,rs1,rs2    |                                 |
| Remainder   | DIVide Unsigned                     | R       | DIVU     |               | rd,rsl                |           | DEM CHI D      |         |               |                                 |
| kemainder   | REMainder                           | l K     | REM      |               | rd,rsl                | ,rsz      | REM(W D)       | }       | rd,rs1,rs2    |                                 |
|             | SEPTIMENT POR MEDICAL PROPERTY OF A | -       |          |               |                       |           |                |         | DA DEL DES    |                                 |
| Category    | Name                                | Fmt     | II ALUI  |               | A (Atomic)            | extensio  | JIII KVA       | +DV//   | 64,128}       | 1                               |
| Load        | Load Reserved                       | R       | LR.W     | NVJ2          | rd,rsl                |           | LR. (D Q)      |         | rd,rsl        | 1                               |
| Store       | Store Conditional                   | R       | SC.W     |               | rd,rsl                |           | SC. (D Q       |         | rd,rs1,rs2    |                                 |
| Swap        | SWAP                                | R       | AMOSW    | AP.W          | rd,rs1                |           | AMOSWAP.       |         | rd,rs1,rs2    | 1                               |
| Add         | ADD                                 |         | AMOAD    |               | rd,rsl                |           | AMOADD.        |         | rd,rs1,rs2    | 1                               |
| Logical     | 1000                                | -       | _        | $\overline{}$ |                       |           | OXI R.         |         | rd,rs1,rs2    | i                               |
|             | A(0)                                | II ii I | Can      |               | iens                  |           | DAL V          | A WA    | rd,rs1,rs2    |                                 |
|             | OR                                  | R       | AMOOR    | .W            | rd,rs1                | ,rs2      | AMOOR. (I      |         | rd,rs1,rs2    |                                 |
| Min/Max     | MINimum                             | _       | AMOMI    |               | rd,rs1                |           | AMOMIN.        |         | rd,rs1,rs2    | 1                               |
| , riux      | MAXimum                             |         | AMOMA    |               | rd,rs                 |           | AMOMAX.        |         | rd,rs1,rs2    |                                 |
| M           | INimum Unsigned                     |         | AMOMI    |               | rd,rsl                |           |                |         | rd,rs1,rs2    |                                 |
|             | Ximum Unsigned                      |         | AMONA    |               | rd,rs)                |           |                |         | rd,rs1,rs2    |                                 |
|             |                                     |         |          |               |                       |           | ticalest trans |         |               |                                 |
| Category    | Name                                | Fmt     | RV32     | (FIDIO)       | (HP/SP.DP             | OP FI Pt) | IIIS. KVF,     | +RVE    | 64.1283       |                                 |
|             | Move from Integer                   | P       | MV.      | H S}.X        | rd,                   | sl        | FMV.{D         | 01-X    | rd,r          |                                 |
|             | Move to Integer                     | R       |          | .{H S}        | rd,                   |           | FMV.X.         |         | rd,r          |                                 |
| Convert     | Convert from Int                    |         |          | {H S D        |                       |           | FCVT. {H       | SDO     |               |                                 |
|             | rom Int Unsigned                    |         |          |               | 2}.WU rd,:            |           |                |         | .{L T}U rd,r  |                                 |
|             | Convert to Int                      | R       | CVT.     | W. (H S       | 0 Q} rd,1             | sl        |                |         | S D Q} rd,r   |                                 |
| Conve       | rt to Int Unsigned                  |         |          |               | D Q} rd,1             | sl        | FCVT. (L       | TIU. I  | ispos rd.r    |                                 |
| Load        | Load                                | I.      | L{W,     | D,Q}          | rd,rsl,               | _         |                |         | RISC-V Cal    | ing Convention                  |
| Store       | Store                               | 5       | S{W,     | D,Q}          | rsl,rs2               |           | Register       | ABI Nar |               | Description                     |
| Arithmetic  | ADD                                 |         |          |               | rd,rsl,               |           | ×0             | zero    |               | Hard-wired zero                 |
|             | SUBtract                            | P       | I SUB.   | {S D Q}       | rd,rs1,               | rs2       | x1             | ra      | Caller        | Return address                  |
|             | MULtiply                            | R.      | MUL.     | {S D Q}       | rd,rs1,               | rs2       | x2             | sp      | Callee        | Stack pointer                   |
|             | DIVide                              | R       | PDIV.    | {S D Q}       | rd,rsl,               | rs2       | х3             | gp      |               | Global pointer                  |
|             | SQuare RooT                         | D       |          | .{S D Q       |                       |           | ×4             | tp      |               | Thread pointer                  |
| Mul-Add     | Multiply-ADD                        |         |          | .{S D Q       |                       |           | x5-7           | t0-2    |               | Temporaries                     |
|             | Multiply-SUBtract                   |         |          | .{S D Q       |                       |           | x8             | s0/fp   |               | Saved register/frame pointer    |
|             | Multiply-SUBtract                   | A.      | PNMSU    | B. (S D       | 2} rd,rs1,            |           | x9             | sl      | Callee        | Saved register                  |
|             | tive Multiply-ADD                   |         |          | D. {S D       |                       |           | x10-11         | a0-1    |               | Function arguments/return value |
| Sign Inject | SiGN source                         |         |          | .{S D Q       |                       |           | x12-17         | a2-7    |               | Function arguments              |
| Neg         | ative SiGN source                   | 6       |          |               | <pre>2} rd,rs1,</pre> |           | x18-27         | s2-11   |               | Saved registers                 |
| h4: /h4     | Xor SiGN source                     |         |          | X. {S D       |                       |           | x28-31         | t3-t6   |               | Temporaries                     |
| Min/Max     | MINimum                             |         |          | {S D Q}       | rd,rs1,               |           | f0-7           | ft0-7   |               | FP temporaries                  |
| Compara     | MAXimum                             |         |          | {S D Q}       | rd,rsl,               |           | f8-9           | fs0-1   |               | FP saved registers              |
| Compare     | Compare Float =                     |         |          | S D Q}        | rd,rsl,               |           | f10-11         | fa0-1   |               | FP arguments/return values      |
|             | Compare Float <                     |         |          | S D Q}        | rd,rsl,               |           | f12-17         | fa2-1   |               | FP arguments                    |
|             | Compare Float ≤                     |         |          | S D Q}        | rd,rs1,               | rs2       | f18-27         | fs2-11  |               | FP saved registers              |
|             | ion Classify Type                   |         |          |               | 2} rd,rsl             |           | f28-31         | ft8-11  | Caller        | FP temporaries                  |
|             | on Read Status                      | P       | RCSR     |               | rd                    |           |                |         |               |                                 |
| Rea         | d Rounding Mode                     | R       | RRM      |               | rd                    |           |                |         |               |                                 |
|             | Read Flags                          | R       | FRFLA    |               | rd                    |           | I              |         |               |                                 |
|             | Swap Status Reg                     | 0       | SCSR     |               | rd,rsl                |           |                |         |               |                                 |
| Swa         | p Rounding Mode                     |         | FSRM     |               | rd,rsl                |           |                |         |               |                                 |
|             | Swap Flags                          | R       | FSFLA    | GS            | rd,rsl                |           |                |         |               |                                 |
| Swap Rou    | unding Mode Imm                     | I       | FSRMI    |               | rd,imm                |           |                |         |               |                                 |
|             | Curan Flanc Imm                     |         | IDODT'S. | COT           |                       |           |                |         |               |                                 |

RISC-V calling convention and five optional extensions: 10 multiply-divide instructions (RV32M); 11 optional atomic instructions (RV32A); and 25 floating-point instructions each for single-, double-, and quadruple-precision (RV32F, RV32D, RV32O). The latter add registers f0-f31, whose width matches the widest precision, and a floating-point control and status register fesr. Each larger address adds some instructions: 4 for RVM, 11 for RVA, and 6 each for RVF/D/Q. Using regex notation, {} means set, so  $L\{D|Q\}$  is both LD and LQ. See risc.org. (8/21/15 revision)





### **RISC-V Architectural State**

- There are 32 registers, each 32 / 64 / 128 bits long
  - Named x0 to x31
  - x0 is hard wired to zero
  - There is a standard 'E' extension that uses only 16 registers (RV32E)
- In addition one program counter (PC)
  - Byte based addressing, program counter increments by 4/8/16
- For floating point operation 32 additional FP registers
- Additional Control Status Registers (CSRs)
  - Encoding for up to 4'096 registers are reserved. Not all are used.







## RISC-V Instructions four basic types

- R register to register operations
- operations with immediate/constant values
- S / SB operations with two source registers
- U / UJ operations with large immediate/constant value

| 31 25     | 24 20     | 19 18 | 14 1:  | 11                  | 6 0    |        |
|-----------|-----------|-------|--------|---------------------|--------|--------|
| funct7    | rs2       | rs1   | funct3 | $\operatorname{rd}$ | opcode | R-type |
|           |           |       |        |                     |        |        |
| imm[11:   | )]        | rs1   | funct3 | $\operatorname{rd}$ | opcode | I-type |
|           |           |       |        |                     |        |        |
| imm[11:5] | rs2       | rs1   | funct3 | imm[4:0]            | opcode | S-type |
|           |           |       |        |                     |        |        |
|           | imm[31:12 |       |        | $\operatorname{rd}$ | opcode | U-type |
|           |           |       |        |                     |        | 7      |



FIHZürich





## **Encoding of the instructions, main groups**

- Reserved opcodes for standard extensions
- Rest of opcodes free for custom implementations
- Standard extensions will be frozen/not change in the future

| inst[4:2] | 000    | 001      | 010      | 011      | 100    | 101      | 110            | 111        |
|-----------|--------|----------|----------|----------|--------|----------|----------------|------------|
| inst[6:5] |        |          |          |          |        |          |                | (> 32b)    |
| 00        | LOAD   | LOAD-FP  | custom-0 | MISC-MEM | OP-IMM | AUIPC    | OP-IMM-32      | 48b        |
| 01        | STORE  | STORE-FP | custom-1 | AMO      | OP     | LUI      | OP-32          | 64b        |
| 10        | MADD   | MSUB     | NMSUB    | NMADD    | OP-FP  | reserved | custom-2/rv128 | 48b        |
| 11        | BRANCH | JALR     | reserved | JAL      | SYSTEM | reserved | custom-3/rv128 | $\geq 80b$ |









### RISC-V is a load/store architecture

- All operations are on internal registers
  - Can not manipulate data in memory directly
- Load instructions to copy from memory to registers
- R-type or I-type instructions to operate on them
- Store instructions to copy from registers back to memory
- Branch and Jump instructions





## Constants (Immediates) in Instructions

- In 32bit instructions, not possible to have 32b constants
  - Constants are distributed in instructions, and then sign extended
  - The Load Upper Immediate (lui) instruction to assemble/push constants
- Instruction types according to immediate encoding

| 31 30 25                 | 24 $21$  | 20     | 19  | $15 \ 14$ | 12  | 11 8       | 7              | 6 0    |        |
|--------------------------|----------|--------|-----|-----------|-----|------------|----------------|--------|--------|
| funct7                   | rs2      |        | rs1 | fur       | ct3 | ro         | l              | opcode | R-type |
|                          |          |        |     |           |     |            |                |        | _      |
| imm[1]                   | 1:0]     |        | rs1 | fur       | ct3 | ro         | l              | opcode | I-type |
|                          |          |        |     |           |     |            |                |        | -      |
| imm[11:5]                | rs2      |        | rs1 | fur       | ct3 | imm        | $\boxed{4.0]}$ | opcode | S-type |
|                          |          |        |     |           |     |            |                |        | _      |
| $imm[12] \mid imm[10:5]$ | rs2      |        | rs1 | fur       | ct3 | [imm[4:1]] | imm[11]        | opcode | B-type |
|                          |          |        |     |           |     |            |                |        | _      |
|                          | imm[31:1 | .2]    |     |           |     | ro         | l              | opcode | U-type |
|                          |          |        |     |           |     |            |                |        | _      |
| [imm[20]] $[imm[10]$     | 0:1] in  | nm[11] | imn | n[19:12]  |     | ro         | l              | opcode | J-type |







## Load from memory (1d), how immediates work

1d x9, 64(x22)



- Not possible to fit a 32b address in 32b encoding directly
  - Take the content in source (rs1), add the immediate (imm) to it. This is the address
  - Read from this **address** in the memory and load into the destination (**rd**) register
- RISC-V tries to minimize number of instructions
  - The 1d instruction seems overly complicated, but you can use this for everything



## FIHZürich

## Branching, how addresses come together

bne x10, x11, 2000 // if x10 != x11, jump 2000 ahead

imm[12]opcode imm[10:5]funct3 rs2 imm[4:1]rs1 imm|11

- Similar problem, how to encode jump address in branches
  - Branch on Equal (beq) and Branch on Not Equal (bne)
  - They use B type operations, need two source registers
- Jumps are relative to Program Counter (PC)
  - The immediate (constant) shows how far we have to jump (PC-relative addressing)
  - Works addresses within ± 4096. To branch further, we need several instructions.



## **RISC-V Instruction Length is Encoded**

- LSB of the instruction tells how long the instruction is
- Supports instructions of 16, 32, 48, 64, 80, 96, ..., 320 bit
  - Allows RISC-V to have Compressed instructions

```
16-bit (aa \neq 11)
                                 xxxxxxxxxxxxaa
                                                        32-bit (bbb \neq 111)
                                 xxxxxxxxxxxbbb11
          XXXXXXXXXXXXXXX
                                 xxxxxxxxxx011111
                                                        48-bit
\cdot \cdot xxxx
          XXXXXXXXXXXXXXX
                                 xxxxxxxxx0111111
                                                        64-bit
\cdot \cdot xxxx
          XXXXXXXXXXXXXX
                                 xxxxxnnnn1111111
                                                        (80+16*nnn)-bit, nnn \neq 1111
\cdot \cdot \cdot xxxx
          XXXXXXXXXXXXXXX
                                 xxxxx11111111111
                                                       Reserved for >320-bits
\cdot \cdot \cdot xxxx
          XXXXXXXXXXXXXX
```



Byte Address:

base+4

base+2

base



## Compressed Instruction extension 'C'

- Use 16-bit instructions for common operations
  - Code size reduction by 34 %
  - Compressed instructions increase fetch-bandwidth
  - Allow for macro-op fusion of common patterns









### So how to build RISC-V cores

- RISC-V ISA tells you the architecture
  - You know which instructions are supported
  - How they are encoded
  - What they are supposed to do
- It does not tell you any implementation details
  - Pipeline stages, memory hierarchy, computation units, in-order or out—of order
  - Everyone is free to figure out how to best implement these
- Need to come up with a micro-architecture to implement it
  - Determine which standard extensions are supported, how
  - Choose a micro-architecture that fits performance requirements







#### What are the Performance Metrics

#### Area

 in kGE equivalent (# of simple logic gates) or mm<sup>2</sup> (technology dependent)

#### Frequency:

Depends on # of gates on longest path

#### Power:

- Strongly depends on the above metrics
- Leakage: dissipated even when not working (Area)
- Dynamic Power: dissipated on logic transitions (frequency and area)

#### CPU Design:

- IPC (Instructions per cycle)
  - IPC implicitly measured in commonly used benchmarks (Coremark, Dhrystone, SpecInt)
- Energy Efficiency: OPs/Joule

#### Hardware Designer

- Tries to find a good balance
- Application dependent
  - IoT and HPC have different requirements
- One size does not fit all







zürich

### RISC-V cores developed at ETH Zurich

**Low Cost** Core

- Zero-riscy

- cro-risco 802-615

32 bit

**DSP Enhanced** Core

**RI5CY** 

HV Nops But Inanio Ret Fixed

OPENHW

**Streaming** Compute Core

**Snitch** 

RV32-**ICMDFX**  64 bit

Linux capable Core

- **Ariane**
- RV6



OPENHW



### Zero-riscy / Ibex, small core for control applications

- 2-stage pipeline
- Optimized for area
  - Area:
    - 19 kGE (Zero-riscy)
    - 12 kGE (Micro-riscy)
  - Critical path:
    - ~ 30 logic levels
- New name: Ibex
  - LowRISC has taken over Zero/Micro-Riscy in 2019



- Two Configurations:
  - Zero-riscy: RV32IMC (2,44 Coremark/MHz)
    - 32 registers, hardware multiplier
  - Micro-riscy: RV32EC (0,91 Coremark/MHz)
    - 16 registers (E), software emulated multiplier



## **Ibex continues to grow with LowRISC**

40+ Contributors680 Pull Requests314 GitHub Issues





Ibex is a small and efficient, 32-bit, in-order RISC-V core with a 2-stage (or optionally 3-stage) pipeline that implements the RV32IMCB instruction set architecture.

Since being contributed to lowRISC by ETH Zürich, it has seen substantial investment of development effort





### Roadmap of Ibex



- Randomised executiontime
- Non-data-dependent fixed execution time
- Parity checks

- Bus scrambling
- CFI (TBD)
- Shadow PMP regs
- OT secure coding guidelines conform

Security hardening phase 1 20Q2

Perf phase 2

20Q2

Security hardening phase 2 20Q3

#### Stabilisation 19Q3-19Q4

- RISC-V specification conformance
- Code clean up and refactoring (~50% LoC changed)
- CI & DV (riscv-dv, Google)

#### Perf phase 1 2001

- Branch target ALU
- Third pipeline stage
- Single-cycle MUL
- I\$ prototype

- Finalise I\$
- Static branch predictor
- Bitmanip ISA extension





### **Growth of Ibex measured with Coremark/MHz**









### RI5CY / CV32E40P our main 32bit RISC-V core

- Zero-riscy / Ibex is suitable for simple applications
  - Control applications, book-keeping
- For our research we need more capable cores
  - Mainly used in clusters for signal processing / machine learning applications
- Tuned for energy efficiency
  - Not necessarily low power
- Make use of custom extensions
  - The Xpulp extensions enhance the capabilities
  - Several Xpulp extensions in discussions for ratification





### Simplified pipeline for RI5CY / CV32E40P





### RI5CY: Our 32-bit workhorse

- 4-stage pipeline
  - **41** kGE
  - Coremark/MHz 3.19
- Includes Xpulp extensions
  - SIMD
  - Fixed point
  - Bit manipulations
  - HW loops



- Different Options:
  - FPU: IEEE 754 single precision
    - Including hardware support for FDIV, FSQRT, FMAC, FMUL
  - Privilege support:
    - Supports privilege mode M and U







- There is a reserved decoding space for custom instructions
  - Allows everyone to add new instructions to the core
  - The address decoding space is **reserved**, it will not be used by future extensions
  - Implementations supporting custom instructions will be compatible with standard ISA
    - Code compiled for standard RISC-V will run without issues
  - The user has to provide support to take advantage of the additional instructions
    - Compiler that generates code for the custom instructions
- ETH Zurich regularly uses these instructions
  - Great tool for exploring
  - The goal is to help ratify these extensions as standards through working groups





## Our extensions to RI5CY (with additions to GCC)

- Post-incrementing load/store instructions
- Hardware Loops (lp.start, lp.end, lp.count)
- ALU instructions
  - Bit manipulation (count, set, clear, leading bit detection)
  - Fused operations: (add/sub-shift)
  - Immediate branch instructions
- Multiply Accumulate (32x32 bit and 16x16 bit)
- SIMD instructions (2x16 bit or 4x8 bit) with scalar replication option
  - add, min/max, dotproduct, shuffle, pack (copy), vector comparison

For 8-bit values the following can be executed in a single cycle (pv.dotup.b)

$$Z = D_1 \times K_1 + D_2 \times K_2 + D_3 \times K_3 + D_4 \times K_4$$







### RI5CY ISA extensions improve performance

```
for (i = 0; i < 100; i++)
    d[i] = a[i] + b[i];
```

#### Baseline

#### Auto-incr load/store HW Loop

#### Packed-SIML

```
mv x5, 0
mv x4, 100
Lstart:
 1b x2, 0(x10)
 1b x3, 0(x11)
 addi x10,x10, 1
 addi
      x11,x11, 1
 add x2, x3, x2
 sb x2, 0(x12)
 addi x4, x4, -1
 addi x12, x12, 1
bne
      x4, x5, Lstart
```

```
mv x5, 0
 mv x4, 100
 Lstart:
 1b 	 x2, 	 0(x10!)
1b x3, 0(x11!)
addi x4, x4, -1
 add x2, x3, x2
 sb \quad x2, \ 0(x12!)
 bne x4, x5, Lstart
```

```
1p.setupi 100, Lend
 1b x2, 0(x10!)
 1b x3, 0(x11!) lw x3, 0(x11!)
add x2, x3, x2 pv.add.b x2, x3, x2
Lend: sb x2, 0(x12!)
```

```
lp.setupi 25, Lend
  1w \times 2, 0(\times 10!)
Lend: sw x2, 0(x12!)
```

11 cycles/output

8 cycles/output

5 cycles/output

1,25 cycles/output



## Runtime for three different applications





## Different cores for different area budgets





**RV32IMC** 

RV32EC





## Different cores for different power budgets







### Energy Efficiency: 2D-Convolution @55MHz, 0.8V





#### This was a short overview of basics of RISC-V

- After the break, more advanced cores
  - 64bit RISC-V core
  - Discussion on performance
  - Vector processing
- Tomorrow, we learn about PULP systems
  - Cores alone can not do much, they need a system around
  - Many core systems
  - Managing Data
  - Acceleration
  - Actual Integrated Circuits from the PULP group







Luca Benini, Davide Rossi, Andrea Borghesi, Michele Magno, Simone Benatti, Francesco Conti, Francesco Beneventi, Daniele Palossi, Giuseppe Tagliavini, Antonio Pullini, Germain Haugou, Manuele Rusci, Florian Glaser, Fabio Montagna, Bjoern Forsberg, Pasquale Davide Schiavone, Alfio Di Mauro, Victor Javier Kartsch Morinigo, Tommaso Polonelli, Fabian Schuiki, Stefan Mach, Andreas Kurth, Florian Zaruba, Manuel Eggimann, Philipp Mayer, Marco Guermandi, Xiaying Wang, Michael Hersche, Robert Balas, Antonio Mastrandrea, Matheus Cavalcante, Angelo Garofalo, Alessio Burrello, Gianna Paulin, Georg Rutishauser, Andrea Cossettini, Luca Bertaccini, Maxim Mattheeuws, Samuel Riedel, Sergei Vostrikov, Vlad Niculescu, Hanna Mueller, Matteo Perotti, Nils Wistoff, Luca Bertaccini, Thorir Ingulfsson, Thomas Benz, Paul Scheffler, Alessio Burello, Moritz Scherer, Matteo Spallanzani, Andrea Bartolini, Frank K. Gurkaynak,

and many more that we forgot to mention http://pulp-platform.org



@pulp\_platform

### The extensions translate to real speed-ups

- 8-bit convolution
  - Open source DNN library
- 10x through xPULP
  - Extensions bring real speedup
- Near-linear speedup
  - Scales well for regular workloads.
- 75x overall gain





