How many uops are there?

Author:Wojciech Muła
Added on:2018-11-18

The current Intel CPUs translate instructions into so called uops (micro-ops), which is a kind of internal ISA. For simple operations, like addition or bitops, translation is one-to-one, i.e. there's exactly one uop for given instruction. When an instruction gets a memory argument we usually will get two uops: one for load, another for actual operation; please note that most instructions has many forms, usually reg, reg and reg, mem.

I was curious how it looks in case of SIMD instructions. I used data from uops.info, and picked recent SkylakeX architecture; results are from IACA 3.0,

Observations:

uops number of CPU instructions % CPU instructions
0 8 0.17 vgatherdps, vgatherdps, vgatherqps, vpgatherdd, vpgatherdd, vpgatherqq, vpscatterqd, vscatterqps
1 1752 36.17 too many, omitted
2 2616 54.00 too many, omitted
3 234 4.83 too many, omitted
4 140 2.89 too many, omitted
5 38 0.78 dpps, vdpps, vdpps, vgatherdpd, vgatherdpd, vgatherdpd, vgatherdpd, vgatherdpd, vgatherdps, vgatherdps, vgatherdps, vgatherqpd, vgatherqpd, vgatherqpd, vgatherqpd, vgatherqpd, vgatherqps, vgatherqps, vgatherqps, vgatherqps, vmovdqu8, vpgatherdd, vpgatherdd, vpgatherdd, vpgatherdq, vpgatherdq, vpgatherdq, vpgatherdq, vpgatherdq, vpgatherqd, vpgatherqd, vpgatherqd, vpgatherqd, vpgatherqd, vpgatherqq, vpgatherqq, vpgatherqq, vpgatherqq
7 4 0.08 vpscatterdq, vpscatterqq, vscatterdpd, vscatterqpd
8 10 0.21 pcmpestri, rex.w pcmpestri, rex.w vpcmpestri, vpcmpestri, vpconflictd, vpconflictd, vpscatterqd, vpscatterqd, vscatterqps, vscatterqps
9 8 0.17 pcmpestri, pcmpestrm, rex.w pcmpestri, rex.w pcmpestrm, rex.w vpcmpestri, rex.w vpcmpestrm, vpcmpestri, vpcmpestrm
10 4 0.08 pcmpestrm, rex.w pcmpestrm, rex.w vpcmpestrm, vpcmpestrm
11 6 0.12 vaeskeygenassist, vaeskeygenassist, vpscatterdq, vpscatterqq, vscatterdpd, vscatterqpd
12 2 0.04 vpscatterdd, vscatterdps
14 2 0.04 vpconflictd, vpconflictq
15 2 0.04 vpconflictq, vpconflictq
16 1 0.02 vzeroall
19 4 0.08 vpscatterdq, vpscatterqq, vscatterdpd, vscatterqpd
20 2 0.04 vpscatterdd, vscatterdps
21 2 0.04 vpconflictd, vpconflictq
22 4 0.08 vpconflictd, vpconflictd, vpconflictq, vpconflictq
35 1 0.02 vpconflictd
36 4 0.08 vpconflictd, vpconflictd, vpscatterdd, vscatterdps

Scripts used to collect the data are available.