Author: | Wojciech Muła |
---|---|
Added on: | 2013-12-27 |
This short article shows how a normalized floating point value could be safely converted to an integer value without assistance of FPU/SSE. Only basic bit and arithmetic operations are used. In the worst case following operations are performed:
The floating point value is calculated as: − 1sign ⋅ (1 + fraction) ⋅ 2exponent − bias. The fraction part is in range [0, 1). For 32-bit values sign has 1 bit, exponent has 8 bits, fraction has 23 bits, and bias has value 127; exponent + bias is saved as an unsigned number.
The layout of binary word:
+-+--------+-----------------------+ |S|exp+bias| fraction | +-+--------+-----------------------+ 31 30 23 22 0
Let clear fields exponent + bias and sign and restore the implicit integer 1 at 24-th bit:
+-+--------+-----------------------+ |0|00000001|xxxxxxxxxxxxxxxxxxxxxxx| +-+--------+-----------------------+ 31 30 23 22 0
The value of such 32-bit word treated as an unsigned integer is (1 + fraction) ⋅ 223. To calculate the result this word have to be shifted left or right depending on value and sign of shift := exponent - 23; only few cases have to be considered:
Sample program is available.