FBSTP — the most complex instruction in x86 ISA

Author: Wojciech Muła
Added on:2013-11-07

As documentation says FBSTP "Store BCD Integer and Pop".

FBSTP saves integer part of floating point value as BCD number. Instruction expects an 11-bytes buffer, the first byte is reserved for a sign, rest are BCD digits.

Disadvantages of this instruction:

  1. BCD output — conversion to ASCII is needed.
  2. All digits are saved, including leading zeros.
  3. Limited range of numbers, not all valid double values are converted. There is limit to ~18 digits.
  4. Most important: FBST is very, very slow.

Sample sources

Sources are available at github.

Program fbst_tests.c converts a number to a string, remove leading zeros, and detects errors:

$ ./test 0 12 5671245 -143433 334535 4543985349054 999999999999999999999
printf => 0.000000
FBSTP  => 0
printf => 12.000000
FBSTP  => 12
printf => 5671245.000000
FBSTP  => 5671245
printf => -143433.000000
FBSTP  => -143433
printf => 334535.000000
FBSTP  => 334535
printf => 4543985349054.000000
FBSTP  => 4543985349054
printf => 10000000000000000000000.000000
FBSTP  => NaN/overflow

Program fbst_speed.c compares instruction FBSTP with simple implementation of itoa. There are no formatting, BCD to ASCII conversion, etc. Numbers from 1 to 10,000,000 are converted.

Results from quite old Pentium M:

FBSTP...
... 2.285 s
simple itoa...
... 0.589 s

and recent Core i7:

FBSTP...
... 2.165 s
simple itoa...
... 0.419 s

There is no difference! FBSTP is just 5% faster on Core i7.