This presentation focuses on the architectural exploration of RISC-V ISA based processor for networking applications such as a Router, using the trade-off between power consumption and performance. The optimized architecture is compared against commercially available RISC processors from ARM. A model of RISC-V based Solid-State Drive is also proposed.
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
1. Architecture Exploration
of RISC-V Processor and
Comparison with ARM
Cortex A53
Karthikeyan S
Architecture Modeling Intern
Mirabilis Design
karthi.sugumar@gmail.com
Tom Jose
Application Specialist
Mirabilis Design
tjose@mirabilisdesign.com
2. 2
Problem
Limited devices for understanding RISC-V
No software platform for observation
Solution
Ease of understanding using System-Level
Modeling Technique
VisualSim Architect provides a platform for
observation and analysis
3. 3
Develop and Test Processor designs using
Processor Generator Package
Establish and Observe current and upcoming real-
world applications
Compare and benchmark a variety of hardware and
software implementations
Features of VisualSim
5. 5
RISC-V ISA
Built using VisualSim’s Processor Generator
Technology
Specifications of SiFive’s E31 Core was referred
RV32I is the ISA used
Instruction Cycles
ADD 1
MUL 2
DIV Min: 2 Max: 33
LW, SW 2
6. 6
RISC-V Processor Specs
Processor Bits – 32
ISA – RV32I
Clock Speed – 500 MHz
Pipeline type – In-Order
Pipeline Stages - 5
Cache – 32 KBytes of I-Cache and D-Cache
- 64 KBytes of L2 Cache
13. 13
A53 Specs
Processor Bits – 64
ISA – ARM v8
Clock Speed – 500 MHz
Pipeline type – In-Order
Pipeline Stages - 8
Cache - 64-KBytes of I-Cache and D-Cache
- 512 KBytes of L2 Cache
Modeling technique was the same as RISC-V’s
Same task profile was used for simulation
19. 19
Read/Write Latencies and Average Power
Results obtained from system-level model of a RISC-V
based Solid-State Drive
Traffic Rate: 30us Traffic Rate: 10us
20. 20
VisualSim Explorer
Try these links to get the feel of VisualSim
Requirements Just a browser and a Java Runtime Environment
Links
RISC-V Processor System
http://www.mirabilisdesign.com/launchdemo/demo/HAL/RISC_
V/RISCV_InOrder/
ARM Cortex A53 Processor System
http://www.mirabilisdesign.com/launchdemo/demo/HAL/A_Cort
ex/ARM_Cortex_A53/
RISC-V based Solid-State Drive
http://www.mirabilisdesign.com/launchdemo/demo/system_arch
itecture/SSD/SSD_RISC_V/
21. 21
Future Development
A 64 bit RISC-V System-Level Model
Machine Learning Applications
Multi-Core SoC Design
22. 22
Conclusion
Successfully simulated RISC-V ISA as a Processor
Core
Compared RISC-V Core with ARM Cortex A53
using a network application
Showcased a Solid-State Drive Design using the
RISC-V Processor Core
23. Thank You
Get back to us at www.mirabilisdesign.com
karthi.sugumar@gmail.com | tjose@mirabilisdesign.com
Notes de l'éditeur
The problem we are facing today with respect to any new architectural innovation, for instance RISC-V, is the presence of little to no devices actually using it. We are starting to see companies like SiFive develop RISC-V Cores. But RISC-V still has a long way to go. The main problem that we encounter right now is the availability of limited devices to understand RISC-V. Another problem we face is the absence of a reliable platform for observing new architectures. By this, I mean that there is no simulation tool which has capability of showing how the device works with RISC-V.
At Mirabilis Design, we have built a user-friendly simulation environment for easy understanding of current and upcoming technology by using system-level modeling. System-Level Modeling is a software solution to approximately estimate the working of real-time hardware implementations through computer simulations. VisualSim Architect is a simulation platform based on system-level modeling and is a product of Mirabilis Design. The software allows users to design, observe, and analyse models built based on real-world implementations and proposals.
VisualSim Architect consists of several design packages of which the Processor Generator Package was used for this project. It provides the user with blocks that can define the ISA, the Processor, the caches used with the processor and the architecture setup which is used to get the statistics of the processor.
It can be used to establish and observe current and upcoming real-world applications.
It also plays an important role in comparing and benchmarking various hardware and software implementations.
Now that you have a small idea of what VisualSim is, let’s start with how we modeled RISC-V using it.
We’ll start with how the RISC-V ISA was modelled in VisualSim. As discussed before the Processor Generator package of VisualSim was used to model the basic RISC-V Core for this model. The documentation of SiFive’s E31 Core was referred to get a fundamental idea of the number of clock cycles consumed for a variety of instructions. For this system-level implementation, we are using the 32-bit ISA of RISC-V. The table shows some of the basic instructions used and the number of clock cycles they utilize for execution.
Continuing with the specifications of the processor, we have:
[1] The Processor bits set at 32 32-bit processor.
[2] The ISA that has been used is the RV32I.
[3] The Clock Rate is set at 500 MHz
[4] An In-Order execution pipeline
[5] The RISC-V Core has a 5-stage pipeline (Fetch, Decode, Execute, Memory Access, Write Back)
[6] The Instruction and Data Cache sizes have been set to 32 KBytes and the L2 Cache has been set to 64KBytes
This diagram shows what a model looks like in VisualSim. More particularly this diagram shows how a Processor System looks like in VisualSim. The Task Generator block contains a traffic source which triggers data structures to be sent to the processor. The data structures will contain the instruction set that the processor has to execute. The processor is also connected to a bus, which in turn contains the L2 Cache and the DRAM. This system forms a memory hierarchy that the processor uses when it encounters a miss in its internal Cache. The DMA block is used to give the processor the power to access the DRAM when required. This is another architecture specification that may or may not be used depending on the designer. The Power Table block is used to specify the power consumed by the processor at different stages such as Active, Sleep, Standby, and Wait. The Plots block is a hierarchical block that contains plots such as Task Latency, Task Set Latency, MIPS, Cycles per instruction, etc. The Architecture block at the top links several blocks together and gets the statistics of each component.
To get a feel of the online version of VisualSim Architect and to take a look at this model, go to this link.
http://www.mirabilisdesign.com/launchdemo/demo/HAL/RISC_V/RISCV_InOrder/
We have seen how the processor model was built using VisualSim. For simulating the model we require a predefined task set. For this purpose the task profile provided by NpBench(A Network Processor Benchmarking Tool) was used. The tabulation shows a list of 10 Tasks each having a different number of instructions and variable percentage of the instruction type. RED (Random Early Detection), SSLD (SSL Dispatcher), MPLS (Multi Protocol Label Switching), AES (Advanced Encryption Standard), MD5 (Message Digestion), FRAG (Packet Fragmentation), CRC (Cyclic Redundancy Check), MTC (Media TransCoding), WFQ (Weighted Fair Queuing). This specific task profile was fed to the processor for simulation and analysis.
After modeling, we can see the results of the plots we have included within the model. This picture shows the Task Latency of the Processor. Each data point shows how long the processor has taken to complete 1 task. Since we are sending the tasks in order, we are able to see a pattern of data points each corresponding to a particular task from the NpBench Table. We can see a range of latencies from 1us to 23us in this plot.
http://www.mirabilisdesign.com/launchdemo/demo/HAL/RISC_V/RISCV_InOrder/
This plot shows the Task Set Latencies of the Processor. Since we are sending 10 Tasks in order, it could be helpful at times to know how long it takes to complete the entire set once. On an average, the task set latencies come to about 130 us.
http://www.mirabilisdesign.com/launchdemo/demo/HAL/RISC_V/RISCV_InOrder/
Another important result that we were able to see from this model was the power consumption. This plot shows the average power consumed by the processor for the simulation time of 1ms. This average is based on the parameters entered within the Power Table of the model. It has states like Active - 26.28 uW/MHz, Wait - 95% of Active, Idle – 25% of Active, Standby – 10% of Active, and Sleep – 2 % of Active. The Power Table finds out the state at which the processor is currently in and plots the average power consumption.
For this model we have chosen the power consumption metric from a research paper that presents a RISC-V core with DSP extensions for scalable IoT Devices. The power parameters entirely depends on the user and this is the reference we have used in this presentation.
http://www.mirabilisdesign.com/launchdemo/demo/HAL/RISC_V/RISCV_InOrder/
So now that you have heard about the implementation of RISC-V, lets make a system level comparison between RISC-V and an ARM processor or more specifically, Cortex A53.
Starting with the A53 specifications that we used to model:
[1] we used a 64 bit variant of the Cortex A53
[2] the instruction set architecture that we have used is ARM v8-A . The Armv8-A is the latest generation Arm architecture.
[3]The clock speed that we are using in our model is 500 MHz
[4]The ARM Cortex A53 is an In-Order pipeline processor
[5]There are 8 pipeline stages in A53
[6]In the model that we have designed, we have an instruction cache and Data Cache of size 64KBytes along with an L2 cache of size 512KBytes
Similar to the plots we saw for RISC-V. Each dot here denotes a single task. So here we have compared the latency of completing each task in A53 and that of RISC-V. We are able to see a slight variation in the latencies. A53 seems to take a little more time in completing the tasks.
To see the results yourself through online version of VisualSim, go to this link:
http://www.mirabilisdesign.com/launchdemo/demo/HAL/A_Cortex/ARM_Cortex_A53/
In the previous slide we saw the latency of individual tasks. But what you see here is the latency of a set of 10 tasks. This plot shows the task set latencies of both the RISC-V Model and the ARM Model. We can see that, RISC-V was able to complete 7 Task sets while ARM A53 was only able to complete 6 such task sets. More over the difference in latency between A53 and RISC-V is around 15-20 micro seconds.
http://www.mirabilisdesign.com/launchdemo/demo/HAL/A_Cortex/ARM_Cortex_A53/
What we see here is the average power consumption plots of A53 and RISC-V. We have referred Samsung Exynos 5433 for getting details regarding power consumption in A53 and for RISC-V we have referred a research paper. As you can see from the plots, the average power consumption of ARM A53 is found to be around 36.8 milliWatts while the average power consumed by RISC-V is around 12.75 milliWatts. So by using VisualSim, we were able to make a realistic system level comparison between two different processors running on different ISAs but both having the same tasks defined by NpBench. We are also able to see a significant difference in the average power consumption.
http://www.mirabilisdesign.com/launchdemo/demo/HAL/A_Cortex/ARM_Cortex_A53/
This slide shows a column graph that illustrates the difference between the power consumption of both the models. We could see here that RISC-V consumes approx. 3 times lesser power than A53. We have arrived at this conclusion by using a specific task profile created by NpBench and also by using a specific configuration for the processor models. This result changes as we change the parameters of the model, but comparisons like this can give an estimation of how the models can work after hardware implementation.
Another device, which we felt, where RISC-V processors could be used are the Solid-State Drives. We have built a simple SSD model with 1 flash module using the processor system discussed before. Instead of executing network tasks, the processor here executes tasks related to Wear Levelling, Encryption & Decryption, Error Correction, and Address Translation. A Gen 3 PCI Express bus acts as an interface between the host and the flash. The NVMe controller stores the incoming requests in queue and pops out requests after each one’s completion.
Use this link to see how the SSD model runs in the online version of VisualSim Architect:
http://www.mirabilisdesign.com/launchdemo/demo/system_architecture/SSD/SSD_RISC_V/
These plots here show the Read/Write Latencies of the SSD. The first Image was captured when the traffic rate was set at 30 micro seconds while the second one was captured when the traffic rate was 10 micro seconds. The second plot shows the SSD undergoes buffering at some stage of the model. VisualSim gives us a chance to change the parameters and configuration to get the right latencies when the traffic rate is increased. Potential hardware faults and bottleneck, such as this, can be prevented using system-level modeling and simulation even before the design stage.
http://www.mirabilisdesign.com/launchdemo/demo/system_architecture/SSD/SSD_RISC_V/
For this presentation, we have used RISC-V as a processor model for a network application as well as an SSD. The benefit of using VisualSim is that we could see results like the amount of power consumed , task latencies, read/write latencies, and also play with the configuration and parameters to get a good simulation even before implementing the model on hardware. These simulations try to provide close approximations of what can be expected with the hardware implementation. From comparative studies like with RISC-V and ARM model, we can find the ratios of performance difference and power consumption so that we can expect a similar ratio during hardware implementation.