Author: | Wojciech Muła |
---|---|
Added on: | 2021-02-17 |
Updated on: | 2021-02-18 (MSVC 19.6 didn't autovectorize accumulate_custom_epi8, my mistake; noticed by Harold Aptroot) |
This year I re-checked the status of autovectorization in the latest GCC and Clang. MSVC was omitted because I didn't see any new version of this compiler on godbolt. More precisely, I didn't believe that there is a difference between versions 19.28 and 19.16 (that was tested two years ago).
Harold Aptroot pointed out that there are some differences in code generated for the AVX2 target. Additionally, in 2020 MSVC started to support AVX512. These two reasons forced me to recheck MSVC too.
In this comparison we consider two targets:
A few basic algorithm available in C++ algorithm library were picked.
For sake of completeness also GCC and Clang results are included. Please refer to the article dedicated to these compilers.
algorithm | procedure | MSVC 19.28.29333 | MSVC 19.16.27023.1 | GCC 10 | Clang 11 | ||||
---|---|---|---|---|---|---|---|---|---|
AVX2 | AVX512 | AVX2 | AVX512 | AVX2 | AVX512 | AVX2 | AVX512 | ||
accumulate — custom | accumulate_custom_epi8 | no | no | no | --- | no | no | no | no |
accumulate_custom_epi32 | yes | yes | yes | --- | yes | yes | yes | yes | |
accumulate — default | accumulate_epi8 | yes | yes | yes | --- | yes | yes | yes | yes |
accumulate_epi32 | yes | yes | yes | --- | yes | yes | yes | yes | |
all_of | all_of_epi8 | no | no | no | --- | no | no | no | no |
all_of_epi32 | no | no | no | --- | no | no | no | no | |
any_of | any_of_epi8 | no | no | no | --- | no | no | no | no |
any_of_epi32 | no | no | no | --- | no | no | no | no | |
copy | copy_epi8 | no | no | no | --- | no | no[1] | no | no |
copy_epi32 | no | no | no | --- | no | no[1] | no | no | |
copy_if | copy_if_epi8 | no | no | no | --- | no | no[1] | no | no |
copy_if_epi32 | no | no | no | --- | no | no[1] | no | no | |
count | count_epi8 | yes | yes | yes | --- | yes | yes | yes | yes |
count_epi32 | yes | yes | yes | --- | yes | yes | yes | yes | |
count_if | count_if_epi8 | no | no | no | --- | yes | yes | yes | yes |
count_if_epi32 | no | no | no | --- | yes | yes | yes | yes | |
fill | fill_epi8 | no[2] | no[2] | no | --- | no[2] | no[2] | no[2] | no[2] |
fill_epi32 | no[3] | no[3] | no | --- | yes | yes | yes | yes | |
find | find_epi8 | no[4] | no[4] | no[4] | --- | no | no | no | no |
find_epi32 | no | no | no | --- | no | no | no | no | |
find_if | find_if_epi8 | no | no | no | --- | no | no | no | no |
find_if_epi32 | no | no | no | --- | no | no | no | no | |
is_sorted | is_sorted_epi8 | no | no | no | --- | no | no | no | no |
is_sorted_epi32 | no | no | no | --- | no | no | no | no | |
none_of | none_of_epi8 | no | no | no | --- | no | no | no | no |
none_of_epi32 | no | no | no | --- | no | no | no | no | |
remove | remove_epi8 | no | no | no | --- | no | no | no | no |
remove_epi32 | no | no | no | --- | no | no | no | no | |
remove_if | remove_if_epi8 | no | no | no | --- | no | no | no | no |
remove_if_epi32 | no | no | no | --- | no | no | no | no | |
replace | replace_epi8 | no | no | no | --- | no | yes | yes | yes |
replace_epi32 | no | no | no | --- | yes | yes | yes | yes | |
replace_if | replace_if_epi8 | no | no | no | --- | no | yes | no | no |
replace_if_epi32 | no | no | no | --- | yes | yes | no | no | |
reverse | reverse_epi8 | no[5] | no[5] | no[5] | --- | yes | yes | no | no |
reverse_epi32 | no[6] | no[6] | no[6] | --- | yes | yes | no | no | |
transform — abs | transform_abs_epi8 | yes | yes | yes | --- | yes | yes | yes | yes |
transform_abs_epi32 | yes | yes | yes | --- | yes | yes | yes | yes | |
transform — increment | transform_inc_epi8 | yes | yes | yes | --- | yes | yes | yes | yes |
transform_inc_epi32 | yes | yes | yes | --- | yes | yes | yes | yes | |
transform — negation | transform_neg_epi8 | yes | yes | no | --- | yes | yes | yes | yes |
transform_neg_epi32 | yes | yes | no | --- | yes | yes | yes | yes | |
unique | unique_epi8 | no | no | no | --- | no | no | no | no |
unique_epi32 | no | no | no | --- | no | no | no | no |
[1] | (1, 2, 3, 4) SIMD instructions present, but not in the main loop |
[2] | (1, 2, 3, 4, 5, 6) calls memset |
[3] | (1, 2) emits rep stosd |
[4] | (1, 2, 3) calls memchr |
[5] | (1, 2, 3) calls ___std_reverse_trivially_swappable_1 |
[6] | (1, 2, 3) calls ___std_reverse_trivially_swappable_4 |
All implementations are available at github.