Author: | Wojciech Muła |
---|---|

Added on: | 2022-01-31 |

The other day I came across the following line:

len * ("11124811248484"[type < 14 ? type:0]-'0') > 4;

My first reaction was "WTF", but later I realized that such a hackish code has to be a response to poor compiler optimizations. And taking into account that the code might be quite old, this is a perfect solution. Only a bit unreadable.

In fact we have something like `len * coefficient(type) > 4`, where
`coefficient` is a value from the set {1, 2, 4, 8}.

I want to show that this problem can be solved not only cleaner but also faster.

First, let's examine the compiler output for the original expression. We assume both variables are unsigned integers.

bool fun1(unsigned int len, unsigned int type) { return len * ("11124811248484"[type < 14 ? type:0]-'0') > 4; }

And the output from `gcc -O3 -march=skylake -s`; GCC version is 10.2.1.:

cmpl $13, %esi ja .L2 movl %esi, %esi leaq .LC0(%rip), %rax # coef = "11124811248484"[type] if type < 14 movsbl (%rax,%rsi), %eax subl $48, %eax # coef -= '0' imull %eax, %edi # edi = len * coef .L2: cmpl $4, %edi # len * coef > 4 seta %al ret

It's worth to note that the compiler knows that for `type >= 14`, we
always fetch the value `"11124811248484"[0] - '0'` that equals one. Thus,
jumps instantly to the comparison.

**Optimization #1**

We may omit subtraction of constant `'0'`, if we don't use ASCII
digits, but hex strings. Maybe for older compiler we need to use
octal digits. Either way, we avoid a subtraction.

bool fun2(unsigned int len, unsigned int type) { if (type >= 14) { type = 0; } const unsigned coef = "\x01\x01\x01\x02\x04\x08\x01\x01\x02\x04\x08\x04\x08\x04"[type]; return (len * coef) > 4; }

The assembly code:

cmpl $13, %esi ja .L5 movl %esi, %esi leaq .LC1(%rip), %rax movsbl (%rax,%rsi), %eax imull %eax, %edi .L5: cmpl $4, %edi seta %al ret

**Optimization #2**

We removed one instruction. Can we do it better? The comparison `type < 14`
cannot be avoided,b ut the constants are {1, 2, 4, 8}. All are powers of two,
thus we can replace the multiplication with a binary shift left just by
adjusting the constants.

bool fun3(unsigned int len, unsigned int type) { if (type >= 14) { type = 0; } const unsigned shift = "\x00\x00\x00\x01\x02\x03\x00\x00\x00\x01\x02\x03\x02\x03\x02"[type]; return (len << shift) > 4; }

The assembly code:

cmpl $13, %esi ja .L7 movl %esi, %esi leaq .LC2(%rip), %rax movsbl (%rax,%rsi), %eax shlx %eax, %edi, %edi .L7: cmpl $4, %edi seta %al ret

Since we are targeting the Skylake, it was possible to use the instruction
`SHLX` from the BMI extensions. The instruction performs shift left, but
does not alter the CPU flags register. By not doing this, it does not create
any indirect dependencies between instructions.

**Optimization #3**

Seems we reached the end? Not really. We are calculating four cases:

len * 1 > 4 len * 2 > 4 len * 4 > 4 len * 8 > 4

Remembering that `len` is an unsigned integer, we may rewrite the
expressions:

len > 5 # 5 * 1 > 4 len > 3 # 3 * 2 > 4 len > 2 # 2 * 4 > 4 len > 1 # 1 * 8 > 4

We just figured out the minimum value of `len` for which the expression `len
* coefficient > 4` is true. As a result, we may get rid of
multiplication/shift.

bool fun4(unsigned int len, unsigned int type) { if (type >= 14) { type = 0; } const unsigned bound = "\x05\x05\x05\x03\x02\x01\x05\x05\x03\x02\x01\x02\x01\x02"[type]; return len > bound; }

The compiler output:

movl $5, %eax # bound = lookup[0] cmpl $13, %esi # type >= 14 ja .L9 movl %esi, %esi leaq .LC3(%rip), %rax movsbl (%rax,%rsi), %eax # bound = lookup[type] .L9: cmpl %eax, %edi # len > cond seta %al ret

Conclusions:

- All in all, the trick with subscripting a string is neat. This is
what a compiler
**may**do underneath for`switch`statements, but it is not guaranteed. - Clang uses conditional moves instead of jumps.