We present PULP: The Open-Source IoT Processor
Finally, PULP, goes multicore! We are happy to launch our flagship RISC-V-based parallel-ultra low power open source system. Simply put, OPENPULP, today’s new kid in town, is the most advanced open-source release we have done so far, and a quantum leap ahead in terms of performance, efficiency and completeness.
OPENPULP is an MCU on steroids: an ultra-low power “host” core coupled with a powerful compute engine based on a tightly-coupled cluster of eight cores. To spice things up, we also added an extensive set of IO peripherals and an ultra-efficient IO-DMA which can move data from peripherals to on-chip memory while the cores sleep, and trigger them in action when a data frame is ready. This heterogeneous architecture enables flexible and energy-efficient processing of data streams coming from multiple high-bandwidth sensors such as imagers, microphone arrays, and multi-electrode ExG biosignal arrays.
The brand new 8-core cluster is a true parallel-processing engine, featuring:
- A new low-latency memory interconnect enabling energy-efficient data sharing and atomic accesses on a multi-banked scratchpad memory
- An advanced DMA-engine, capable of 2D data transfers for multi-dimensional double-buffering
- A new event unit for hardware-optimized synchronization and implementation of primitives typical of parallel programming models such as OpenMP.
- A new energy-efficient shared instruction cache optimized for a tightly coupled cluster of processors.
- Support for shared-memory hardware accelerators. We provide examples of on how to add your own Hardware Processing Engines (HWPEs) into the cluster.
… and there is even more to come… Stay tuned!!!
Your PULP team
3.. 2.. 1.. Lift-off: Presenting Ariane
This year ETH Zurich and University of Bologna are celebrating 5 years of collaboration on the PULP project, and we are proud to present the newest member of the PULP family. Ariane is a Linux-ready, application-class, 64-bit RISC-V core supporting (RV64-IMC) written completely in System Verilog, and is available to download from our GitHub page immediately.
Ariane is a 6-stage, single issue, in-order CPU which fully implements I, M and C extensions as specified in Volume I: User-Level ISA V 2.1 as well as the draft privilege extension 1.10. It implements three privilege levels M, S, U to fully support a Unix-like (Linux, BSD, etc.) operating system. It has configurable size, separate TLBs, a hardware PTW and branch-prediction (branch target buffer, branch history table and a return address stack). The primary design goal was on reducing critical path length to about 20 gate delays.
Following the feedback we will get from our users, we will continue the development of Ariane on the public repositories, and we have many features that we are working on for this core such as:
- IPC improvements
- Double precision floating point unit
- Full support for Atomics
The PULP team presents PULPissimo
PULPissimo is a new single-core RISC-V based open-source microcontroller system which is a significant step ahead in terms of efficiency and completeness with respect to the more basic PULPino, offering a number of new features, such as:
- Autonomous Input/Output subsystem (uDMA) that allows data to be directly copied from peripherals to memory, with much improved energy efficiency.
- New memory subsystem for improved performance and power management
- Support for hardware accelerators that access memories directly. We provide examples of on how to include your own so called Hardware Processing Engines (HWPEs) into PULPissimo
- A brand new interrupt controller
- Additional peripherals such as the flexible Camera Parallel Interface (CPI) interface for low power image sensors like products of OMNIVISION or the I2S peripheral to support microphones like the ST MP34DT01-M
- New SDK with a custom operating-system optimized for uDMA and makefile-based application build process.
PULPissimo, like its smaller brother PULPino, is a single-core platform and supports all our 32-bit RISC-V cores: RI5CY, as well as Zero- and Micro-RI5CY. You can access PULPissimo directly from our GitHub page. We will be continuously updating PULPissimo with code and application examples.
What is PULPino and PULPissimo?
PULPino and PULPissimo are both competitive, state-of-the-art 32-bit processor based on the RISC-V architecture, with a rich set of peripherals, and full debug support. The difference between the two is that PULPissimo has a more advanced architecture than its more basic brother PULPino. At ETH Zurich and Università di Bologna we have put many of the ideas that we have developed through our research on ultra-low-power parallel processing (PULP project) into PULPino and PULPissimo.
You can download the entire source code, test programs, programming environment and even the bitstream for the popular ZEDboard, completely for free under the Solderpad license.
State-of-the-Art Microcontroller Core
PULP systems are based on RI5CY (and Zero-riscy), optimised 32-bit RISC-V cores developed at ETH Zurich and Universita’ di Bologna. The RI5CY core has an IPC close to 1, full support for the base integer instruction set (RV32I), compressed instructions (RV32C) and full support for the multiplication instruction set extension (RV32M). It implements several ISA extensions such as: hardware loops, post-incrementing load and store instructions, ALU and MAC operations, which increase the efficiency of the core in signal processing applications. Now we also support floating point (RV32M) and 16 register (RVE) configurations.
The PULP project supports four different 32-bit RISC-V configurations
Since our last update we support a total four different configurations:
- RI5CY: Our standard RV32-ICM core with custom PULP extensions for DSP applications
- RI5CY+FPU: The RI5CY core enhanced with an IEEE-754 single precision FPU
- Zero-riscy: The area-efficient core that implements RV32-ICM.
- Micro-riscy: The even smaller core implementing RV32-EC, with 16 registers and no hardware multiplication support
A core for every area budget
While RI5CY is a very efficient core for DSP applications, we have received many requests for a smaller core that can be used for control applications. Zero-riscy and its even smaller brother Micro-riscy are area-optimized implementations. The synthesis results for an ASIC run on the left show an area breakdown of three configurations. Micro-riscy is 3.5x smaller than RI5CY. You can select which core configuration to use within PULPino.
Choose the core for your application
The plot shows three different application benchmarks, a DSP workload (2D-Conv), Coremark, and a control intensive application which hardly has any ALU operations (Runtime) and compares the energy consumption at low operating frequencies. As you can see, different cores perform better (lower is better) depending on the application.
A Rich Set of I/O Peripherals
For communication with the outside world, PULP systems contain a broad set of peripherals, including I2S, I2C, SPI and UART. The platform internal devices can be accessed from outside via JTAG and SPI, which allows pre-loading RAMs with executable code. In standalone mode, the platform boots from an internal boot ROM and loads its program from an external SPI flash.
Low-Power, but Powerful
To allow embedded operating systems such as FreeRTOS to run, a subset of the privileged specification is supported. Moreover, PULP systems come with many of the low-power features we developed in the PULP Project: when the core is idle, the platform can be put into a low power mode, where only a simple event unit is active and everything else is clock-gated and consumes minimal power (leakage). A specialized event unit wakes up the core in case an event/interrupt arrives.
Not a Toy Design
PULPino is a mature design: it has been taped-out as an ASIC in UMC 65nm in January 2016. The PULPino platform is available for RTL simulation as well for FPGA mapping. It has full debug support on all targets. In addition we support extended profiling with source code annotated execution times through KCacheGrind in RTL simulations and debug via GDB.
And it is free, no registration, no strings attached, you can use it, change it, adapt it, add it to your own chip, use it for classes, research, projects, products… We just ask you to acknowledge the source, and if possible, let us know what you like and what you don’t like.