Author: | Wojciech Muła |
---|---|
Added on: | 2024-11-11 |
Contents
The RISC-V assembler defines the pseudo-instruction li that load an immediate into a register. Unlike other pseudo-instructions, having one or a few expansions, li explodes into — as the spec says — myriad sequences.
RISC-V opcodes have 32 bits, it's impossible to encode 64-bit immediates. It's impossible to encode 32-bit immediates too, as we need to have some spare bits for the opcode itself (instruction + destination).
Assemblers have to do quite complex job, as they can only use a single register — the li argument; compilers have more freedom.
RISC-V comes with two instructions that are used to fill registers with the given value:
For easy cases we can use pick a single instruction from the above list. But when a constant fall off any of the ranges, more instructions have to be used.
When an immediate has 31 bits, then a pair of LUI & ADDIW is enough.
li a1, 0x12345678
GNU as output:
lui a1, 0x12345 ; a1 = 0x0000000012345000 addiw a1, a1, 1656 ; a1 = 0x0000000012345678
However, when a number has 32 bits, then assembler emits more instructions. For example:
li a1, 0xfedcba53
GNU as output:
lui a1, 0x40 ; a1 = 0x0000000000040000 addiw a1, a1, -1165 ; a1 = 0x000000000003fb73 slli a1, a1, 14 ; a1 = 0x00000000fedcc000 addi a1, a1, -1453 ; a1 = 0x00000000fedcba53
Both GCC and Clang synthesize a bit better sequence:
lui a1, 0x3fb73 ; a1 = 0x000000003fb73000 slli a1, a1, 2 ; a1 = 0x00000000fedcc000 addi a1, a1, -1453 ; a1 = 0x00000000fedcba53
li a1, 0xcccccccc
GNU as output:
lui a1, 0xcd ; a1 = 0x00000000000cd000 addiw a1, a1, -819 ; a1 = 0x00000000000ccccd slli a1, a1, 12 ; a1 = 0x00000000ccccd000 addi a1, a1, -820 ; a1 = 0x00000000cccccccc
Again, GCC and Clang yield shorter sequence:
lui a1, 209715 ; a1 = 0x0000000033333000 addiw a1, a1, 819 ; a1 = 0x0000000033333333 slli a1, a1, 2 ; a1 = 0x00000000cccccccc
64-bit immediates are more heavy. For this sample constant:
li a1, 0x123456789abcdeff
The GNU assembler produces eight instructions:
lui a1, 0x247 ; a1 = 0x0000000000247000 addiw a1, a1, -1875 ; a1 = 0x00000000002468ad slli a1, a1, 14 ; a1 = 0x000000091a2b4000 addi a1, a1, -947 ; a1 = 0x000000091a2b3c4d slli a1, a1, 12 ; a1 = 0x000091a2b3c4d000 addi a1, a1, 1511 ; a1 = 0x000091a2b3c4d5e7 slli a1, a1, 13 ; a1 = 0x123456789abce000 addi a1, a1, -257 ; a1 = 0x123456789abcdeff
While GCC takes advantage of using multiple registers, the outcome is not much shorter, we saved only one instruction:
lui a5, 0x4d5e7 ; a5 = 0x000000004d5e7000 lui a1, 0x2469 ; a1 = 0x0000000002469000 slli a5, a5, 1 ; a5 = 0x000000009abce000 addi a1, a1, -1329 ; a1 = 0x0000000002468acf addi a5, a5, -257 ; a5 = 0x000000009abcdeff slli a1, a1, 35 ; a1 = 0x1234567800000000 add a1, a1, a5 ; a1 = 0x123456789abcdeff
Clang gives up, and loads the constant from memory.
li a1, 0xcccccccccccccccc
GNU as output:
lui a1, 0xfcccd ; a1 = 0xfffffffffcccd000 addiw a1, a1, -819 ; a1 = 0xfffffffffccccccd slli a1, a1, 12 ; a1 = 0xffffffccccccd000 addi a1, a1, -819 ; a1 = 0xffffffcccccccccd slli a1, a1, 12 ; a1 = 0xfffcccccccccd000 addi a1, a1, -819 ; a1 = 0xfffccccccccccccd slli a1, a1, 12 ; a1 = 0xccccccccccccd000 addi a1, a1, -820 ; a1 = 0xcccccccccccccccc
GCC output (Clang also decided to load the value from memory):
lui a5, 0x33333 ; a5 = 0x0000000033333000 addi a5, a5, 819 ; a5 = 0x0000000033333333 slli a1, a5, 32 ; a1 = 0x3333333300000000 add a1, a1, a5 ; a1 = 0x3333333333333333 not a1, a1 ; a1 = 0xcccccccccccccccc
li a1, 0xcccccccccccccccd
GNU as output:
lui a1, 0xfcccd ; a1 = 0xfffffffffcccd000 addiw a1, a1, -819 ; a1 = 0xfffffffffccccccd slli a1, a1, 12 ; a1 = 0xffffffccccccd000 addi a1, a1, -819 ; a1 = 0xffffffcccccccccd slli a1, a1, 12 ; a1 = 0xfffcccccccccd000 addi a1, a1, -819 ; a1 = 0xfffccccccccccccd slli a1, a1, 12 ; a1 = 0xccccccccccccd000 addi a1, a1, -819 ; a1 = 0xcccccccccccccccd
GCC output:
lui a1, 0x33333 ; a1 = 0x0000000033333000 lui a5, 0x33333 ; a5 = 0x0000000033333000 addi a1, a1, 819 ; a1 = 0x0000000033333333 addi a5, a5, 818 ; a5 = 0x0000000033333332 slli a1, a1, 32 ; a1 = 0x3333333300000000 add a1, a1, a5 ; a1 = 0x3333333333333332 not a1, a1 ; a1 = 0xcccccccccccccccd
Clang output is better:
lui a0, 0xccccd ; a0 = 0xffffffffccccd000 addiw a0, a0, -819 ; a0 = 0xffffffffcccccccd slli a1, a0, 32 ; a1 = 0xcccccccd00000000 add a0, a0, a1 ; a0 = 0xcccccccccccccccd