CSC3050 Project 3: RISC-V Simulator

CSC3050 Project 3: RISC-V Simulator

1 Background

Efficient execution of instructions in a RISC-V pipeline relies on avoiding data hazards, where an instruction

depends on the result of a previous instruction that has not yet completed. Data hazards can cause stalls,

reducing the efficiency of the processor. To mitigate these hazards, instruction reordering and specialized fused

operations like fmadd (fused multiply-add) can be utilized.

This assignment has two parts:

• Implementing the fmadd Instruction In this part, you will implement the fused multiply-add (fmadd)

instruction, which performs a multiplication followed by an addition in a single step. This reduces

the number of instructions executed and can eliminate certain data hazards, leading to more efficient

computation.

• Reordering Instructions to Avoid Data Hazards You will be given a sequence of RISC-V instruc tions (add/mul) that suffer from data hazards. Your task will be to rearrange them while maintaining

correctness. This exercise will help you understand the importance of instruction scheduling in hazard

mitigation and performance optimization.

By completing this assignment, you will gain some basic hands-on experience with hazard avoidance

strategies in RISC-V, while learning代写CSC3050 Project 3: RISC-V Simulators how fmadd can be used to optimize multiplication-addition sequences and

how instruction reordering can improve pipeline execution efficiency.

2 RISC-V GNU Toolchain

RISC-V GNU Toolchain is already in your Docker, so you do not need to download it from the official link.

But we highly suggest that you open the official link and read the README.

To set up the RISC-V development environment, you need to compile and install the RISC-V GNU toolchain.

This toolchain supports the RISC-V 32I instruction set with M extension (integer multiplication and division),

based on the RISC-V Specification 2.2. Follow these steps to configure and compile the toolchain:

Create a build directory, configure the toolchain, and compile it with the following commands:

mkdir build; cd build

../configure --with-arch=rv32im --enable-multilib --prefix=/path/to/riscv32i

make -j$(nproc)

3 A Simple RISC-V64I Simulator

We use a modified version of Hao He's simulator. You can find the modified repository:

1

It is a simple RISC-V Emulator suppprting user mode RV64I instruction set, from PKU Computer Architecture

Labs, Spring 2019.

3.1 Compile

mkdir build

cd build

cmake ..

make

3.2 Usage

./Simulator riscv-elf-file-name -v -s -d -b strategy

3.3 Parameters

• -v for verbose output, can redirect output to file for further analysis.

• -s for single step execution, often used in combination with -v.

• -d for creating memory and register history dump in dump.txt.

• -b for branch prediction strategy (default BTFNT), accepted parameters are AT, NT, BTFNT, and BPB.

-- AT: Always Taken

-- NT: Always Not Taken

-- BTFNT: Back Taken Forward Not Taken

-- BPB: Branch Prediction Buffer (2-bit history information)

4 Part I: RISC-V32I Simulator

The first task in this assignment is to change the RISC-V64I simulator to be RISC-V32I simulator. This is an

easy job, but we suggest that you carefully read the code and know the logical structure of the simulator.

You can re-compile the sample test cases to test your RISC-V32I simulator. Take quicksort as an example:

riscv32-unknown-elf-gcc -march=rv32i \

test/quicksort.c test/lib.c -o riscv-elf/quicksort.riscv

You can change -march=rv32i to -march=rv32imf for the remain part of the assignment.

5 Part I: Fused Instructions

The fused instruction is part of the RISC-V ISA's F (single-precision floating-point) and D (double-precision

floating-point) extensions. These extensions provide support for floating-point arithmetic operations. In this

project, you only need to implement the integer version.

2

Take fmadd.s instruction as an example.

This instruction performs a fused multiply-add operation for floating-point numbers, which means it computes

the product of two floating-point numbers and then adds a third floating-point number to the result, all in a

single instruction. Obviously, this operation is beneficial for both performance and precision, as it reduces the

number of rounding errors compared to performing the multiplication and addition separately.

In this assignment, you are required to implement the fused instruction for integer type. We used the same

format as the standard RISCV R4 instruction. We used the reserved custom opcode 0x0B as our opcode.

Inst Name funct2 funct3 Description

fmadd.i Fused Mul-Add 0x0 0x0 rd = rs1 * rs2 + rs3

fmadd.u Unsigned Fused Mul-Add 0x1 0x0 rd = rs1 * rs2 + rs3

fmsub.i Fused Mul-Sub 0x2 0x0 rd = rs1 * rs2 - rs3

fmsub.u Unsigned Fused Mul-Sub 0x3 0x0 rd = rs1 * rs2 - rs3

fmnadd.i Fused Neg Mul-Add 0x0 0x1 rd = -rs1 * rs2 + rs3

fnmsub.i Fused Neg Mul-Sub 0x1 0x1 rd = -rs1 * rs2 - rs3

5.1 R4 Instruction

R4 instructions, as in Figure 1, involve four registers (rs1, rs2, rs3, rd), which is different from those you are

familiar with. To use standard R4 format, you need to add F-extension when compiling. In other words you

should use -march=rv32if but NOT change the compiling commands of RISC-V GNU Toolchain. (Again,

we are not using floating-points.)

Figure 1: R4 format

5.2 Cycle counts

The fused instruction needs more cycles to process, so we define that our fused instruction needs 3 more

cycles to execute. Specifically, the number of cycles required to complete this instruction is +3 compared to

standard instructions. The mul instruction also incurs an additional 3 cycles, making fmadd more efficient in

terms of cycle count.

Suppose add instruction takes 5 cycles to complete, then we have:

add(1) + mul(3) = 4 > fmadd(3)

5.3 Other Important Information

We also provide some basic test cases for reference. Please refer to the README and /test-fused under

the root of the project.

Also, you can try to compare the number of cycles between fused instructions and basic mul and add instruc tions.

3

6 Part I: Disable Data Forwarding

Add an option -x to disable data forwarding.

You can modify the logical of parsing arguments in the method parseParmeters in MainCache.cpp.

7 Part II: Introduction

In this part of the assignment, you will analyze a given sequence of RISC-V instructions that suffer from data

hazards. (With forwarding turned off) Your task is to rearrange these instructions while maintaining correct ness, ensuring that the processor pipeline executes efficiently. Then, you should be able to further optimize it

by substituting add/mul operations with fmadd operations. By strategically reordering instructions, you will

learn how to reduce stalls, improve instruction throughput, and optimize execution flow in a pipelined RISC-V

architecture.

8 Part II: Rearrange

In the part2.s file, you will find a RISC-V program that contains several data hazards affecting pipeline

efficiency. Your task is to rearrange the instructions to minimize stalls while ensuring the program produces

the same output as the original. You should start by reviewing and running part2.s in the simulator to

understand its functionality and identify potential improvements. Your optimized version should preserve

correctness while reducing the number of stalled cycles. Grading will be based on both correctness and

execution efficiency (fewer cycles due to reduced hazards). Name your result with part2 p2.s

9 Part II: Using fmadd.i

After optimizing part2.s, you may identify opportunities to replace certain instruction sequences with the more

efficient fmadd.i instruction (based on either part2.s or part2 p2.s). The final optimized file, part2 p3.s,

should produce the same output as both part2.s and part2 p2.s while improving execution efficiency. Since

fmadd.i combines multiplication and addition into a single operation, the total cycle count should be further

reduced. Grading will be based on both correctness and execution efficiency. Name your optimized file

part2 p3.s before submission.

10 Grading Criteria

The maximum score you can get for this lab is 100 points, and it is composed by the following components:

• Part 1 correctness of implementation 55 pts

• Part 2 correctness of part2 p2.s 20 pts

• Part 2 efficiency of part2 p2.s 20 pts

• A short report about anything you have learn in this project 5 pts

• Part 2 correctness of part2 p3.s Extra Credit 1 pts

• Part 2 efficiency of part2 p3.s Extra Credit 1 pts

4

11 Submission

You should make sure your code compiles and runs. Then, it should be compressed into a .zip file and

submitted to BlackBoard. Any necessary instructions to compile and run your code should also be doc umented and included. Finally, you are also required to include a report containing the results of your test

case execution.

相关推荐
尤老师FPGA5 小时前
QT代码自适应窗口
开发语言·qt
biter down5 小时前
5:原生 assert 断言
开发语言
布朗克1685 小时前
12 封装与构造方法
java·开发语言·封装·构造方法
z落落5 小时前
C# 抽象类(abstract)
java·开发语言·c#
折哥的程序人生 · 物流技术专研5 小时前
AI 编程与行业赋能|专栏总目录(持续更新)
开发语言·人工智能·软件工程·ai编程
SilentSamsara5 小时前
爬虫工程化:Playwright + 反反爬 + 数据清洗管道实战
开发语言·爬虫·python·青少年编程·playwright
AI玫瑰助手5 小时前
Python函数:函数的返回值(return)与多值返回
开发语言·python·信息可视化
花果山~~程序猿5 小时前
快速认识python项目的虚拟环境
开发语言·python
basketball6165 小时前
Go语言从入门到进阶:8. 接口
开发语言·后端·golang
gCode Teacher 格码致知5 小时前
Python教学:字符编码的四种环境-由Deepseek产生
开发语言·python