Sections
You will find here a whole lot of SIMD (i.e. SSE, SSE2, SSE3, SSE4, AVX, AVX2, AVX512).
Some of Polish texts also contain IMO interesting stuff — you can use Google Translate or at least take a look at source codes.
AVX512 implementation of JPEG zigzag transformation [2018-11-04]
SSSE3/SSE4: alpha blending — operator over [2016-03-03]
16bpp/15bpp to 32bpp pixel conversions — different methods [2016-03-06]
SSSE3: PMADDUBSW and image crossfading [2016-05-03]
SSE: modify 32bpp images with lookup tables [2016-03-04]
See the papers I co-authored:
AVX512 8-bit positional population count procedure [2019-12-31]
AVX512VBMI — remove spaces from text [2019-01-05]
SSSE3: printing hex values [2016-03-07]
Use AVX512 to calculate binomial coefficient [2020-03-21]
AVX512 — first bit set in a large array [2016-12-21]
Software emulation of PDEP instruction [2014-09-23]
Speedup reversing table of bytes [2010-05-01]
STL: map with string as key — access speedup [2010-04-03]
See the papers by Daniel Lemire and me:
ARM Neon and Base64 encoding & decoding [2017-01-07]
Interesting resources: Bit Twiddling Hacks (Sean Eron Anderson), Low Level Bit Hacks You Absolutely Must Know (Peteris Krumins), Bit twiddling (Jari Komppa).
Conditionally fill word (for limited set of input values) [2014-10-01]
Average of two unsigned integers [2012-07-02]
Branchless set mask if value greater or how to print hex values [2010-06-09]
Determining if an integer is a power of 2 [2010-04-11]
Branchless signum [2010-04-01]
Fill word with selected bit [2010-04-01]
Transpose bits in byte using SIMD instructions [2010-03-31]
Keywords: FPU, MMX, SSE, SSE2, SSE3, SSSE3, SSE4, AVX, AVX2, AVX512
SSE4: PTEST & strlen [2007-09-08]
SSE4: greater/less or equal relations for unsigned bytes/words [2015-04-03]
Integer log 10 of an unsigned integer — SIMD version [2014-03-09]
SSE: conversion uint32 to float [2008-06-18]
PABSQ — absolute value of two singed 64-bit numbers [2008-06-08]
SSE/AVX: absolute value of difference of unsigned integers [2018-03-11]
Convert float to int without FPU/SSE [2013-12-27]
Calculate floor value without FPU/SSE instruction [2013-12-29]
Floating point tricks [2008-06-15]
Speeding up LIKE '%text%' queries (at least in PostgeSQL) [2013-11-03]
PostgreSQL — faster reads from static tables [2013-11-03]