- 2018-12-08
- Base64 encoding & decoding using AVX512BW instructions — performance results from Cannon Lake

- 2018-11-26:
- Penalties of errors in SSE floating point calculations — added timings from Skylake and SkylakeX; there is some improvement

- 2018-11-18:
- How many uops are there? — are SIMD instructions mapped on simple uops?
- A short report from code::dive 2018 — I was there again
- SIMDized sum of all bytes in the array — part 2: signed bytes — std::accumulate(array, array + size, int32_t(0)) can be 2.5 times faster

- 2018-11-14:
- Speeding up multiple vector operations using SIMD — rewriting a nested loop can help

- 2018-11-05:
- Base64 encoding & decoding using AVX512BW instructions — finally performance results for the AVX512BW implementation are available

- 2018-11-04:
- AVX512 implementation of JPEG zigzag transformation — updated code with description of shuffling array of
`uint16_t` - AVX512: ternary functions evaluation — fixed few mistakes spotted by
**Mark Parker**, added more real-world example (hash algorithms MD5, SHA-1, SHA-2)

- AVX512 implementation of JPEG zigzag transformation — updated code with description of shuffling array of

- 2018-10-28:
- SIMD — why you shouldn't use static vector constants — ... in a C++ code
- SIMDized check which bytes are in a set — a few updates

- 2018-10-24:
- SIMDized sum of all bytes in the array —
`std::accumulate(array, array + size, 0)`can be six times faster

- SIMDized sum of all bytes in the array —

- 2018-10-18:
- SIMDized check which bytes are in a set — functions like
`isspace`with SSE/AVX2/AVX512 instructions

- SIMDized check which bytes are in a set — functions like

- 2018-10-03:
- Finding index of the minimum value using SIMD instructions — compilers can't do this (yet)

- 2018-05-18:
- AVX512 mask registers support in compilers — there is still room for improvement

- 2018-05-13:
- AVX512 implementation of JPEG zigzag transformation — AVX512 code is 14 times faster than scalar transformation and almost 2 times faster than SSE;

- 2018-04-28:
- Be careful with directory_iterator — C++17
`std::filesystem::directory_iterator`is weird

- Be careful with directory_iterator — C++17

- 2018-04-19:
- Parsing series of integers with SIMD — parse multiple decimal integers separated by arbitrary number of delimiters can be really fast with SSE.

- 2018-04-18:
- Is sorted using SIMD instructions — added unrolled versions of SSE and AVX2 code, there are also performance tests of AVX512 procedures.
- Base64 encoding & decoding using AVX512BW instructions — I confused AVX512VBMI with AVX512BW... now the article properly descibes AVX512BW, AVX512VBMI and AVX512VL implementation details.

- 2018-04-11:

- 2018-03-26:

- 2018-03-25:
- SSE: conversion integers to decimal representation — recently I translated inline assembly into intrinsics code and thanks to that was able to benchmark the algorithms on new CPUs.

- 2018-03-25:
- SWAR check if all chars are digits — a shorter scalar version

- 2018-03-22:
- Parsing decimal numbers — part 2: SSE — added SSSE3 version and performance results

- 2018-03-16:

- 2018-03-14:
- Intersection of ordered sets — a study of a special case (SIMD approach included)

- 2018-03-11:
- A set of short notes, usually one-liners; it will grow

- 2018-03-11:

Archive: 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003