Myriad sequences of RISC-V code

Author:Wojciech Muła
Added on:2024-11-11

Contents

Myriad sequences

The RISC-V assembler defines the pseudo-instruction li that load an immediate into a register. Unlike other pseudo-instructions, having one or a few expansions, li explodes into — as the spec says — myriad sequences.

RISC-V opcodes have 32 bits, it's impossible to encode 64-bit immediates. It's impossible to encode 32-bit immediates too, as we need to have some spare bits for the opcode itself (instruction + destination).

Assemblers have to do quite complex job, as they can only use a single register — the li argument; compilers have more freedom.

RISC-V comes with two instructions that are used to fill registers with the given value:

For easy cases we can use pick a single instruction from the above list. But when a constant fall off any of the ranges, more instructions have to be used.

32-bit unsigned numbers

Example 1

When an immediate has 31 bits, then a pair of LUI & ADDIW is enough.

li      a1, 0x12345678

GNU as output:

lui     a1, 0x12345     ; a1 = 0x0000000012345000
addiw   a1, a1, 1656    ; a1 = 0x0000000012345678

Example 2

However, when a number has 32 bits, then assembler emits more instructions. For example:

li      a1, 0xfedcba53

GNU as output:

lui     a1, 0x40        ; a1 = 0x0000000000040000
addiw   a1, a1, -1165   ; a1 = 0x000000000003fb73
slli    a1, a1, 14      ; a1 = 0x00000000fedcc000
addi    a1, a1, -1453   ; a1 = 0x00000000fedcba53

Both GCC and Clang synthesize a bit better sequence:

lui     a1, 0x3fb73     ; a1 = 0x000000003fb73000
slli    a1, a1, 2       ; a1 = 0x00000000fedcc000
addi    a1, a1, -1453   ; a1 = 0x00000000fedcba53

Example 3

li      a1, 0xcccccccc

GNU as output:

lui     a1, 0xcd        ; a1 = 0x00000000000cd000
addiw   a1, a1, -819    ; a1 = 0x00000000000ccccd
slli    a1, a1, 12      ; a1 = 0x00000000ccccd000
addi    a1, a1, -820    ; a1 = 0x00000000cccccccc

Again, GCC and Clang yield shorter sequence:

lui     a1, 209715      ; a1 = 0x0000000033333000
addiw   a1, a1, 819     ; a1 = 0x0000000033333333
slli    a1, a1, 2       ; a1 = 0x00000000cccccccc

64-bit unsigned numbers

Example 1

64-bit immediates are more heavy. For this sample constant:

li      a1, 0x123456789abcdeff

The GNU assembler produces eight instructions:

lui     a1, 0x247       ; a1 = 0x0000000000247000
addiw   a1, a1, -1875   ; a1 = 0x00000000002468ad
slli    a1, a1, 14      ; a1 = 0x000000091a2b4000
addi    a1, a1, -947    ; a1 = 0x000000091a2b3c4d
slli    a1, a1, 12      ; a1 = 0x000091a2b3c4d000
addi    a1, a1, 1511    ; a1 = 0x000091a2b3c4d5e7
slli    a1, a1, 13      ; a1 = 0x123456789abce000
addi    a1, a1, -257    ; a1 = 0x123456789abcdeff

While GCC takes advantage of using multiple registers, the outcome is not much shorter, we saved only one instruction:

lui     a5, 0x4d5e7     ; a5 = 0x000000004d5e7000
lui     a1, 0x2469      ; a1 = 0x0000000002469000
slli    a5, a5, 1       ; a5 = 0x000000009abce000
addi    a1, a1, -1329   ; a1 = 0x0000000002468acf
addi    a5, a5, -257    ; a5 = 0x000000009abcdeff
slli    a1, a1, 35      ; a1 = 0x1234567800000000
add     a1, a1, a5      ; a1 = 0x123456789abcdeff

Clang gives up, and loads the constant from memory.

Example 2

li      a1, 0xcccccccccccccccc

GNU as output:

lui     a1, 0xfcccd     ; a1 = 0xfffffffffcccd000
addiw   a1, a1, -819    ; a1 = 0xfffffffffccccccd
slli    a1, a1, 12      ; a1 = 0xffffffccccccd000
addi    a1, a1, -819    ; a1 = 0xffffffcccccccccd
slli    a1, a1, 12      ; a1 = 0xfffcccccccccd000
addi    a1, a1, -819    ; a1 = 0xfffccccccccccccd
slli    a1, a1, 12      ; a1 = 0xccccccccccccd000
addi    a1, a1, -820    ; a1 = 0xcccccccccccccccc

GCC output (Clang also decided to load the value from memory):

lui     a5, 0x33333     ; a5 = 0x0000000033333000
addi    a5, a5, 819     ; a5 = 0x0000000033333333
slli    a1, a5, 32      ; a1 = 0x3333333300000000
add     a1, a1, a5      ; a1 = 0x3333333333333333
not     a1, a1          ; a1 = 0xcccccccccccccccc

Example 3

li     a1, 0xcccccccccccccccd

GNU as output:

lui     a1, 0xfcccd     ; a1 = 0xfffffffffcccd000
addiw   a1, a1, -819    ; a1 = 0xfffffffffccccccd
slli    a1, a1, 12      ; a1 = 0xffffffccccccd000
addi    a1, a1, -819    ; a1 = 0xffffffcccccccccd
slli    a1, a1, 12      ; a1 = 0xfffcccccccccd000
addi    a1, a1, -819    ; a1 = 0xfffccccccccccccd
slli    a1, a1, 12      ; a1 = 0xccccccccccccd000
addi    a1, a1, -819    ; a1 = 0xcccccccccccccccd

GCC output:

lui     a1, 0x33333     ; a1 = 0x0000000033333000
lui     a5, 0x33333     ; a5 = 0x0000000033333000
addi    a1, a1, 819     ; a1 = 0x0000000033333333
addi    a5, a5, 818     ; a5 = 0x0000000033333332
slli    a1, a1, 32      ; a1 = 0x3333333300000000
add     a1, a1, a5      ; a1 = 0x3333333333333332
not     a1, a1          ; a1 = 0xcccccccccccccccd

Clang output is better:

lui     a0, 0xccccd     ; a0 = 0xffffffffccccd000
addiw   a0, a0, -819    ; a0 = 0xffffffffcccccccd
slli    a1, a0, 32      ; a1 = 0xcccccccd00000000
add     a0, a0, a1      ; a0 = 0xcccccccccccccccd

Tool versions