This is a guest post by Reshabh Sharma, who worked this summer on a Google Summer of Code (GSoC) project under the umbrella of the FOSSi Foundation.
RISC-V will change the world. Prof Taylor’s Bespoke Silicon Group is contributing by developing a GP-GPU based on RISC-V 32 bit ISA (RV32), after the huge success of their Open-Source RISC-V Tiered Accelerator Fabric SoC, Celerity which holds the world record for RISC-V performance; 500B RISC-V instructions per second, beating prior records by 100X.
For compute 32 bit is common for requirements like very high energy efficiency and density. Since GPGPUs often requires 4GB+ of memory, we need 64 bit addresses to access DRAM. This summer I worked under the vision of Prof. Taylor to initiate the support for custom instructions specifically designed for RISC-V based GP-GPU. We started with supporting 64 bit pointers using custom load and store instructions in address space 1 inside the RISC-V LLVM backend.
Following new instructions have been added:
- LDW rd, rs1, rs2 Loads the value from a 64 bit address by concating the i32 values in two registers.
- SDW rd, rs1, rs2 Stores the value to a 64 bit address by concating the i32 values in two registers.
Phase 1: Define new instructions
This phase was fairly simple and dealt with the addition of new instructions in RISCV LLVM backend.
Phase 2: Update data layout string
Data layout string conveys the front-end about the size and alignments of different entities like integers, pointers etc. We updated the data layout string to support 64 bit pointers in address space 1.
Phase 3: Lowering to custom load and store
Lowering to custom store was a huge challange and I’m glad that we could complete it. More information about the challanges we faced during lowering can be found at this blog post
All the code is hosted here List of commits:
- Make pointer size 64 bit in RV32
- Update data layout string in backend to support 64 bit pointers
- Add custom load/store instructions
- Break 64 bit GlobalAddress in address space 1 into a pair of i32 values
- Partially support MO_LOLO and MO_LOHI target flags
- Custom lower store to custom store node
- Custom lower load to custom load node
It was a wonderful experience working under the mentorship of Prof. Taylor. I also appriciate all the efforts from Neil Ryan and the awesome collaboration with Bespoke Silicon Group. The task was complex and looked hard to be completed in the given time frame, I’m glad we did it. I’ve got a lot of help from llvm-dev mailing list, riscv-llvm group especially Alex Bradbury and Luís Marques who are still helping us to run perf benchmark on spike. Million thanks to everyone who supported us. Feel free to reach out at email@example.com for any feedback/suggestions.