# Faster fractional exponents

Author: Wojciech Muła 2023-02-05

# Introduction

A well known method of calculating powers of integers is based on the binary representation of an exponent. Let's consider a simple example. Exponent equals y = 9 = 0b1001; its value can be expressed as 1 ⋅ 20 + 0 ⋅ 21 + 0 ⋅ 22 + 1 ⋅ 23; after constant folding it simplifies to 20 + 23 = 1 + 8 = 9. Thus x9 can be expanded into x20 + 23 = x20x23 = x1x8.

The main observation is that the product contains x2i only if the i-th bit of exponent is 1.

An algorithm utilizing this observation is quite simple:

```func powint(x float64, y uint) float64 {
result := 1.0   // x^0
tmp := x        // tmp is x^{2^i}

for y > 0 {
if y & 1 != 0 {             // i-th bit set?
result = result * tmp   // result times x^{2^i}
}

// square in each iteration
tmp = tmp * tmp

// scan bits starting from the least significant one
y >>= 1
}

return result
}
```

Exactly the same schema can be used if an exponent is fractional; for simplicity let's assume the exponent is positive and less than 1.

We also use binary representation of fraction, we just need to remember that weights of bits are fractions: x2i. They equals 1/2, 1/4, 1/8, 1/16, 1/32, and so on, so forth.

The algorithm is almost identical: we scan bits starting from the most significant one — bit weights are decreasing by factor 1/2. Value x½ is a square root.

Let's assume that we have a fraction expressed as a uint64, where the decimal dot is before the most significant bit. For instance fraction 0.10101112 would have the following representation as uint64:

```decimal dot
|
[10101110|00000000|00000000|00000000|00000000|00000000|00000000|00000000]
|||                                                                   |
||+- bit 61, weight 1/8                                           bit 0
|+-- bit 62, weight 1/4
+--- bit 63, weight 1/2
```

In the next section we will see how to convert a normalized float into that representation.

The algorithm is:

```func powfracaux(x float64, fraction uint64) float64 {
res := 1.0              // res = 2^0
sq := x
for fraction != 0 {
sq = math.Sqrt(sq)       // sq = x^(1/2^i)

if int64(fraction) < 0 { // i-th bit set (MSB)
res *= sq            // update result
}

fraction <<= 1
}

return res
}
```

# Float to fraction

A normalized 64-bit floating point value has the bit following layout:

```[S|eeeeeeeeeee|mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm]
|  exponent  |               significand (mantissa)
sign bit     |
+- decimal dot is here
```

Biased exponent spans 11 bits and significand spans 52 bits. The significand has implicit 1 at 53rd position.

We assume only positive floats less than 1. The algorithm to convert such number into a fraction goes as follows.

1. Get the float64 as raw bits.

```u64 := math.Float64bits(x)
```
2. Then extract the significand bits.

```const mask = (1 << 52) - 1
```
```  [0|eeeeeeeeeee|mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm] u64
[0|00000000000|mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm] fraction
```
3. Complete the significand with the implicit 1. The integer value can be interpreted as (1 + significand) ⋅ 252.

```fraction |= 1 << 52
```
```  [S|00000000000|mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm] fraction
| [0|00000000001|0000000000000000000000000000000000000000000000000000] one
[0|00000000001|mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm] fraction
```
4. Extract the exponent value; the bias equals 1023. The exponent will always be negative.

```const bias = 1023
const mask = (1 << 11) - 1
exp := ((u64 >> 52) & mask) - bias
```
```[S|eeeeeeeeeee|mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm] u64
[0|00000000000|00000000000000000000000000000000000000000eeeeeeeeeee] exp
```
5. Adjust the fraction. We first shift left fraction by 11 bits left; we do it to place the most significant bit at bit #63. Then we shift right by -exp - 1. Why : − 1? For example if x = 0.5, than the exponent equals − 1. The completed significand equals 1000…0002, and after shifting 11 bits left, its value as a fraction is valid. Thus shifting right is not needed; only for values less than 0.5 this shift is required.

```fraction <<= 11
fraction >>= -exp - 1   // for example exp = -5
```
```[0|00000000001|mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm] fraction
[1|mmmmmmmmmmm|mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm00000000000] <<= 1
[0|0001mmmmmmm|mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm0000000] >>= -exp - 1 = 4
```

# Evaluation

The square root function is a native instruction SQRT on x86 CPUs, and compilers emit this instruction. Here is a part of objdump of powfracaux function:

```...
main.go:39 4885c0         TESTQ AX, AX
main.go:39 740c           JE 0x4f53b6
main.go:40 f20f51d0       SQRTSD X0, X2   // <--- here
main.go:41 7def           JGE 0x4f539f
main.go:42 f20f59ca       MULSD X2, X1
main.go:42 ebe9           JMP 0x4f539f
main.go:12 0f10c1         MOVUPS X1, X0
...
```

The problem with SQRT instruction is its hight latency. And the latency is the major issue here, as we have dependencies between iterations. It's not possible to calculate SQRT independently, even the hardware is capable of doing this. Latency for the instruction varies from 13 to 19 cycles; while latency for multiplication is 4 cycles. The timings are the same for Skylake-X, Cannon Lake, Ice Lake and Adler Lake-P.

The number of square root calculation equals the number of significant bits in the fraction — in the case of 64-bit floats, this is 53 max.

Below are benchmark results, comparing the default Go math.Pow with implementation of the algorithm. The benchmarks were run on an Ice Lake machine with Go 1.20.

```benchmark               old ns/op     new ns/op     delta
Benchmark/2^{-1}-8      4.20          5.45          +29.80%
Benchmark/2^{-2}-8      46.0          11.3          -75.46%
Benchmark/2^{-3}-8      46.0          16.4          -64.31%
Benchmark/2^{-4}-8      46.0          21.9          -52.45%
Benchmark/2^{-5}-8      46.0          28.3          -38.57%
Benchmark/2^{-6}-8      46.0          33.2          -27.79%
Benchmark/2^{-7}-8      46.0          38.7          -15.90%
Benchmark/2^{-8}-8      46.0          45.1          -1.98%
Benchmark/2^{-9}-8      46.0          50.0          +8.77%
Benchmark/2^{-10}-8     46.0          55.5          +20.69%
Benchmark/2^{-11}-8     46.0          61.9          +34.67%
Benchmark/2^{-12}-8     46.0          66.8          +45.33%
Benchmark/2^{-13}-8     46.0          72.3          +57.32%
```

As we can observe, the complexity of Go algorithm is not value-sensitive. The linear algorithm wins for a quite small set of numbers. Only approx 27 exponents can be evaluated faster than a generic algorithm.

However, detecting if the linear algorithm can be used is cheap. It costs counting of trailing zero bits and a single comparison.

# Sample code

Sample source code is available on Github.