Author: | Wojciech Muła |
---|---|
Added on: | 2018-11-18 |
The current Intel CPUs translate instructions into so called uops (micro-ops), which is a kind of internal ISA. For simple operations, like addition or bitops, translation is one-to-one, i.e. there's exactly one uop for given instruction. When an instruction gets a memory argument we usually will get two uops: one for load, another for actual operation; please note that most instructions has many forms, usually reg, reg and reg, mem.
I was curious how it looks in case of SIMD instructions. I used data from uops.info, and picked recent SkylakeX architecture; results are from IACA 3.0,
Observations:
uops | number of CPU instructions | % | CPU instructions |
---|---|---|---|
0 | 8 | 0.17 | vgatherdps, vgatherdps, vgatherqps, vpgatherdd, vpgatherdd, vpgatherqq, vpscatterqd, vscatterqps |
1 | 1752 | 36.17 | too many, omitted |
2 | 2616 | 54.00 | too many, omitted |
3 | 234 | 4.83 | too many, omitted |
4 | 140 | 2.89 | too many, omitted |
5 | 38 | 0.78 | dpps, vdpps, vdpps, vgatherdpd, vgatherdpd, vgatherdpd, vgatherdpd, vgatherdpd, vgatherdps, vgatherdps, vgatherdps, vgatherqpd, vgatherqpd, vgatherqpd, vgatherqpd, vgatherqpd, vgatherqps, vgatherqps, vgatherqps, vgatherqps, vmovdqu8, vpgatherdd, vpgatherdd, vpgatherdd, vpgatherdq, vpgatherdq, vpgatherdq, vpgatherdq, vpgatherdq, vpgatherqd, vpgatherqd, vpgatherqd, vpgatherqd, vpgatherqd, vpgatherqq, vpgatherqq, vpgatherqq, vpgatherqq |
7 | 4 | 0.08 | vpscatterdq, vpscatterqq, vscatterdpd, vscatterqpd |
8 | 10 | 0.21 | pcmpestri, rex.w pcmpestri, rex.w vpcmpestri, vpcmpestri, vpconflictd, vpconflictd, vpscatterqd, vpscatterqd, vscatterqps, vscatterqps |
9 | 8 | 0.17 | pcmpestri, pcmpestrm, rex.w pcmpestri, rex.w pcmpestrm, rex.w vpcmpestri, rex.w vpcmpestrm, vpcmpestri, vpcmpestrm |
10 | 4 | 0.08 | pcmpestrm, rex.w pcmpestrm, rex.w vpcmpestrm, vpcmpestrm |
11 | 6 | 0.12 | vaeskeygenassist, vaeskeygenassist, vpscatterdq, vpscatterqq, vscatterdpd, vscatterqpd |
12 | 2 | 0.04 | vpscatterdd, vscatterdps |
14 | 2 | 0.04 | vpconflictd, vpconflictq |
15 | 2 | 0.04 | vpconflictq, vpconflictq |
16 | 1 | 0.02 | vzeroall |
19 | 4 | 0.08 | vpscatterdq, vpscatterqq, vscatterdpd, vscatterqpd |
20 | 2 | 0.04 | vpscatterdd, vscatterdps |
21 | 2 | 0.04 | vpconflictd, vpconflictq |
22 | 4 | 0.08 | vpconflictd, vpconflictd, vpconflictq, vpconflictq |
35 | 1 | 0.02 | vpconflictd |
36 | 4 | 0.08 | vpconflictd, vpconflictd, vpscatterdd, vscatterdps |
Scripts used to collect the data are available.