Modern perfect hashing for strings

Author:	Wojciech Muła
Added on:	2023-04-30
Updated on:	2023-04-02 (grammar corrections)

Contents

Introduction
- PEXT recap
Constructing perfect hashing function
Compile-time switches
Experiments
Sources

Introduction

Looking up in a static set of strings is a common problem we encounter when parsing any textual formats. Such sets are often keywords of a programming language or protocol.

Parsing HTTP verbs appeared to be the fastest when we use a compile-time trie: a series of nested switch statements. I could not believe that a perfect hash function is not better, and that led to a novel hashing approach that is based on the instruction PEXT (Parallel Bits Extract).

Briefly, when constructing a perfect hash function, we are looking for the smallest set of input bytes that can be then the input for some function combines them into a single value. The instruction PEXT allows to quickly construct any n-bit subword from a 64-bit word; the latency of the instruction is 3 CPU cycles on the current processors. This allows us to extend the schema for looking for the smallest subset of bits. This n-bit word is then the input for a function that translates the word into the desired value.

Instead of something like:

func hash(s []byte) uint64 {
    // read bytes at indices a, b, c
    // and push forward
    return hash_bytes(s[a], s[b], s[c])
}

We have:

func hash(s []byte) uint64 {
    // read bytes at indices d and e, and form a temp value
    uint64 tmp = (uint64(s[d]) << 8) | uint64(s[e])

    // from the temp value (bytes d and e) extract crucial bits
    uint64 val = PEXT(tmp, precomputed_mask)

    return hash_uint64(val)
}

Please note that depending on the strings set, the number of bytes read in both schemas can vary. It is not the rule that a bit-level hash function would touch fewer bytes than a byte-level hash function.

Apart from the above hashing schema, this text describes also constructing a compile-time hash table and compile-time switch.

All source codes are available.

PEXT recap

The instruction PEXT gets two arguments: the input word and the input mask. Bits from the input word for which the input mask is 1 are copied to the output. For example:

word:   0010101011010111
mask:   0011100100100010
masked: __101__0__0___1_
PEXT:   __________101001

Constructing perfect hashing function

Short recap:

A collision of hashes occurs when a hash function yields the same value for two different keys.
A perfect hash function is a hash function that does not yield any collisions for the given set of keys.
A minimal perfect hash function (MPH) is a perfect hash function that maps the keys into the range from 0 to N-1, where N is the number of keys.

While minimal hash functions are hard to find, perfect hash functions are quite easy to compute. We will describe two methods:

using open addressing,
using the minimum number of bits.

Open addressing

When we build a generic hash table implementation, collisions are unavoidable. The most common solution to this problem is using some kind of container (a linked list or balanced tree) to keep all objects sharing the same hash value.

A less common strategy is open addressing. We store values directly in the table. If there is a collision — that is, the given index is already allocated — we try another index. There are several approaches to pick a new index; it simply may be the next index (called linear probing), but there are of course more sophisticated procedures (quadratic probing, double hashing, etc.).

In our case, where we have a static set of strings, we use the simplest solution. Our goal is to find the minimum size of table (N) in which the number of collisions (k) is also the least possible, preferably without collisions.

Input parameters are: 1) a set of strings and 2) hash function; to avoid calling the hash function over and over, we cache the hash value for each string.

The main procedure is responsible for settings the maximum number of collisions; it starts from 1 and increases it until we succeeded. The number of collisions depends on the quality of hash function, but often it is just 2 or 3 collisions. The Go implementation of procedure is shown below.

Then, for some range of table sizes, we check the actual number of collisions, and we report the minimum table size.

func computerHashParameters(keywords []Keyword, hash func([]byte) uint64) (size int, collisions int) {
    hashes := make([]uint64, len(keywords))
    for i := range keywords {
        hashes[i] = hash(keywords[i].word)
    }
    collisions = 1
    for {
        size = hashtableaux(hashes, collisions)
        if size > 0 {
            fmt.Printf("size = %5d, maxcollisions = %d\n", size, collisions)
            return
        }
        collisions += 1
    }
}

Below is the implementation of a helper function. It checks the table size starting from the size of input set, up to its tenfold; the multiply of 10 appeared to be a quite good upper bound.

func hashtableaux(hashes []uint64, maxcollisions int) int {
    n := len(hashes)
    const sizeOverhead = 10

outer:
    for N := n; N < sizeOverhead*n; N++ {
        table := make(map[uint64]int)
        for _, h := range hashes {
            idx := h % uint64(N)
            table[idx] += 1
            if table[idx] > maxcollisions {
                continue outer
            }
        }
        return N
    }

    return -1
}

Now, having the parameters N and k, we can finally write a procedure for lookup. The following C++ procedure lookups for one of Go keywords.

int lookup_go_hash1(std::string_view s) {
    const uint64_t idx = (hash1(s) % 38) * 2;
    static std::string_view lookup[76] = {
        "chan", // 836 (0x344)
        "", "", "",
        "case", // 800 (0x320)
        "for", // 648 (0x288)
        "", "",
        "continue", // 1600 (0x640)
        "", "", "",
        "defer", // 1070 (0x42e)
        "func", // 804 (0x324)
        "", "", "", "",
        "package", // 1491 (0x5d3)
        "",
        "else", // 808 (0x328)
        "go", // 428 (0x1ac)
        "const", // 1075 (0x433)
        "range", // 1075 (0x433)
        "var", // 696 (0x2b8)
        "", "", "",
        "return", // 1344 (0x540)
        "", "", "", "", "",
        "map", // 663 (0x297)
        "",
        "select", // 1386 (0x56a)
        "struct", // 1386 (0x56a)
        "", "",
        "goto", // 856 (0x358)
        "", "", "",
        "switch", // 1314 (0x522)
        "", "", "",
        "fallthrough", // 2266 (0x8da)
        "", "", "", "", "", "", "", "", "", "", "",
        "default", // 1512 (0x5e8)
        "interface", // 1854 (0x73e)
        "", "",
        "type", // 868 (0x364)
        "", "", "",
        "if", // 414 (0x19e)
        "import", // 1326 (0x52e)
        "", "", "", "",
        "break", // 1025 (0x401)
        "", };
    static int values[76] = {
        2, // 836 (0x344)
        -1, -1, -1,
        1, // 800 (0x320)
        9, // 648 (0x288)
        -1, -1,
        4, // 1600 (0x640)
        -1, -1, -1,
        6, // 1070 (0x42e)
        10, // 804 (0x324)
        -1, -1, -1, -1,
        17, // 1491 (0x5d3)
        -1,
        7, // 808 (0x328)
        11, // 428 (0x1ac)
        3, // 1075 (0x433)
        18, // 1075 (0x433)
        24, // 696 (0x2b8)
        -1, -1, -1,
        19, // 1344 (0x540)
        -1, -1, -1, -1, -1,
        16, // 663 (0x297)
        -1,
        20, // 1386 (0x56a)
        21, // 1386 (0x56a)
        -1, -1,
        12, // 856 (0x358)
        -1, -1, -1,
        22, // 1314 (0x522)
        -1, -1, -1,
        8, // 2266 (0x8da)
        -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
        5, // 1512 (0x5e8)
        15, // 1854 (0x73e)
        -1, -1,
        23, // 868 (0x364)
        -1, -1, -1,
        13, // 414 (0x19e)
        14, // 1326 (0x52e)
        -1, -1, -1, -1,
        0, // 1025 (0x401)
        -1,
    };
    for (int i=0; i < 2; i++) {
        if (lookup[idx + i] == s) {
            return values[idx + i];
        }
    }
    return -1;
}

First we have a call to hash1 on the input string. Its result is taken modulo 38 — that is the value N found by the above algorithm. Likewise, the magic constant 2 is parameter k, and it is the number of collisions we have to handle.

The lookup procedure contains two auxiliary tables of size N * k = 76: one for strings and another for values; the not-found-value was set to -1. (Note that we can have a single array of structures, however it's hard to tell what's better for typically small arrays.) After getting the index idx we just run two times the following check:

if (lookup[idx + i] == s) {
    return values[idx + i];
}

Please note that all parameters for hash table are known in compile time, thus a compiler can rid of division and apply additional optimizations.

Below is the assembly code of the procedure, compiled for Skylake architecture with GCC 12.2.0.

<lookup_go_hash1(std::basic_string_view<char, std::char_traits<char> >)>:
    push   %r14
    push   %r13
    mov    %rsi,%r13
    push   %r12
    push   %rbp
    mov    %rdi,%rbp
    push   %rbx

    movsbq (%rsi),%rax                  // f := first char
    movsbq -0x1(%rdi,%rsi,1),%rcx       // l := last char
    add    %rax,%rcx                    // their sum
    imul   %rdi,%rcx                    // (f + l) * s.size()

    movabs $0xd79435e50d79435f,%rax     //
    mul    %rcx                         //
    shr    $0x5,%rdx                    //
    imul   $0x26,%rdx,%rax              //
    sub    %rax,%rcx                    //
    mov    %rcx,%rdx                    //
    lea    (%rcx,%rcx,1),%r12           //
    shl    $0x5,%rdx                    // rdx = ((f + l) * s.size()) % 38

    // The for loop. GCC split the loop -- it quickly compares lengths
    // of lookup[idx] + i with s.size (rbp) and only if they are equal
    // jumps to bytes comparison.
    lea    0x267b47(%rip),%rax          // rax = lookup address
    lea    (%rdx,%rax,1),%rbx           // rbx = size
    lea    0x2(%r12),%r14
    mov    0x8(%rbx),%rdi
    cmp    (%rbx),%rbp                  // jump to full compare only if lengths are equal
    je     <lookup_go_hash1(std::basic_string_view<char, std::char_traits<char> >)+0x78>
    inc    %r12
    add    $0x10,%rbx
    cmp    %r14,%r12
    jne    <lookup_go_hash1(std::basic_string_view<char, std::char_traits<char> >)+0x52>
    pop    %rbx
    pop    %rbp
    pop    %r12
    pop    %r13
    mov    $0xffffffff,%eax
    pop    %r14
    ret
    nopl   (%rax)

    // check if bytes are equal
    test   %rbp,%rbp
    je     <lookup_go_hash1(std::basic_string_view<char, std::char_traits<char> >)+0x8c>
    mov    %rbp,%rdx
    mov    %r13,%rsi
    call   <memcmp@plt>
    test   %eax,%eax
    jne    <lookup_go_hash1(std::basic_string_view<char, std::char_traits<char> >)+0x5b>
    pop    %rbx
    pop    %rbp
    lea    0x1f56bb(%rip),%rax        # <lookup_go_hash1(std::basic_string_view<char, std::char_traits<char> >)::values>
    mov    (%rax,%r12,4),%eax
    pop    %r12
    pop    %r13
    pop    %r14
    ret

Hash functions

Following hash functions were tested:

hash1 — sum of the first and last char multiplied by the string length;
hash2 — sum of the first and last char bit-xor'ed with the string length;
hash3 — first, second and last characters combined with the string length: ((f << 8) | s) + l + (n * 0x111);
hash_sum0 — sum of all bytes;
hash_sumN — sum of all bytes plus the string length;
hash_djb2 — Berenstein's hash;
hash_sdb — sdbm hash.

Splitting by input length

I observed that the string length is a decent discriminator: STL: map with string as key — access speedup. In most cases we obtain the length of strings in constant time; for instance this C++ function simplifies to a few CPU instructions.

#include <cstdint>
#include <string_view>

uint64_t len(std::string_view s) {
   return s.size();
}

$ g++ -O2 -S test.cpp
$ cat test.s
    ...
    movq    %rdi, %rax
    ret
    ...

Taking this into account, we first split the set of strings into the subsets of strings having the same length. These subsets are usually small. We can fine-tune looking up in these subsets independently. One common optimization that is present in sample code is using plain if-ladder of compares when a subset is small (has at most two or three elements).

Additionally, working with same-length strings enables more low-level optimizations:

The length of string is usually known in the compile-time.
Thus, bound checks can be eliminated.
Equality of two strings from a subset is simple memcmp, and this procedure may be further inlined and optimized by a compiler.

Using the minimum number of bits

Note: this method was inspired by the hashing procedure found in GNU gperf.

The presented method requires input string to have the same lengths. We guaranteed that property by pre-classifying the input string by its length, as described in the previous section.

gperf uses the following observation: not all input characters contribute to uniqueness of set. In most cases we may discard many input characters and such modified set still holds unique items.

For instance, five-letter Go keywords are break, const, defer, and range. The last character is sufficient to distinguish them:

break => ____k
const => ____t
defer => ____r
range => ____e

In the case of six-letter keywords, we need two characters, as shown below. The letter 't' repeats at the last position, but the second character makes this set unique.

import => ___o_t
return => ___u_n
select => ___e_t
struct => ___u_t
switch => ___t_h

The major problem is that we need to combine somehow these unique characters into a unique and relatively small value. gperf adds the characters after passing them through an auxiliary lookup table. Like: lookup[s[0]] + lookup[s[5]].

Similarly to discarding individual bytes, we may discard individual bits, ending up with a bitmask that can be used to extract significant bits. To do this, we use a very fast instruction PEXT.

Before we dig into details, let us re-examine the examples. For five-letter Go keywords we have:

01100010.01110010.01100101.01100001.01101011 => ________.________.________.________.______11
01100011.01101111.01101110.01110011.01110100 => ________.________.________.________.______00
01100100.01100101.01100110.01100101.01110010 => ________.________.________.________.______10
01110010.01100001.01101110.01100111.01100101 => ________.________.________.________.______01

And for six-letter ones:

01101001.01101101.01110000.01101111.01110010.01110100 => ________.________.________.___0____._______0._____1__
01110010.01100101.01110100.01110101.01110010.01101110 => ________.________.________.___1____._______0._____1__
01110011.01100101.01101100.01100101.01100011.01110100 => ________.________.________.___0____._______1._____1__
01110011.01110100.01110010.01110101.01100011.01110100 => ________.________.________.___1____._______1._____1__
01110011.01110111.01101001.01110100.01100011.01101000 => ________.________.________.___1____._______1._____0__

Thus, in the first case we need just 2 bits, and in the second bits only 3 bits. The 2-bit subset contains all possible values (00, 01, 11, 10), while 3-bit subset is not full (001, 101, 011, 111, 110).

A skeleton of C++ procedure that uses this approach is:

...
/* 1 */ const uint64_t word = load_bytes(s);
/* 2 */ const uint64_t idx  = _pext_u64(word, mask);
/* 3 */ if (memcmp(lookup[idx], s.data(), size) == 0) {
            return value[idx];
        }
...

Once we know which bits are significant, we need to find which bytes have to be loaded (1); a byte is the minimum amount of data we can transfer. As we can see, for five-letter keywords it is sufficient to load just one byte. But for six-letter keywords we need three bytes — in practise it is still a single load, but done on an 32-bit entity.

Then, we extract the bits using the instruction PEXT (2) — these bits form an n-bit index.

We need an auxiliary table(s) of size 2ⁿ, having exactly the same meaning as in hashing. We compare the input string with keyword (3) and if they equal, we return the associated value.

This is a snippet from procedure for looking up the Go keywords:

int lookup_go_pext(std::string_view s) {
    switch (s.size()) {
        // ...
        case 5: {
            static char lookup[4][5] = {
                {'b', 'r', 'e', 'a', 'k'},
                {'r', 'a', 'n', 'g', 'e'},
                {'d', 'e', 'f', 'e', 'r'},
                {'c', 'o', 'n', 's', 't'},
            };
            static int value[4] = {
                0,
                18,
                6,
                3,
            };
            const uint8_t w2 = s[4];
            const uint64_t idx = _pext_u64(w2, 0x14);
            if (memcmp(lookup[idx], s.data(), sizeof(lookup[0])) == 0) {
                return value[idx];
            }
        }
        break;
        case 6: {
            static char lookup[8][6] = {
                {}, // no match
                {'s', 'w', 'i', 't', 'c', 'h'},
                {}, // no match
                {'r', 'e', 't', 'u', 'r', 'n'},
                {'s', 'e', 'l', 'e', 'c', 't'},
                {'s', 't', 'r', 'u', 'c', 't'},
                {'i', 'm', 'p', 'o', 'r', 't'},
                {}, // no match
            };
            static int value[8] = {
                -1,
                22,
                -1,
                19,
                20,
                21,
                14,
                -1,
            };
            uint32_t w3 = 0;
            memcpy(&w3, &s[2], 4);
            const uint64_t idx = _pext_u64(w3, 0x10101000);
            if (memcmp(lookup[idx], s.data(), sizeof(lookup[0])) == 0) {
                return value[idx];
            }
        }
        break;
        // ...
    }
    return -1;
}

Variable-length set of words

Having strings of the same length is not a strict requirement. We always can complete the past-end characters with some easy-to-compute values. The following values were tested during writing this text:

zero (0x00),
the first character,
the last character,
the input length,
the first/last character combined with the input length.

Depending on the chosen method and string set, the outcome differs. Including length and one of chars seems to help. However, pre-classifying inputs by their length appeared to be easier and faster, thus this topic is not covered here.

Finding mask

Finding the mask is surprisingly straightforward. We start with a full mask, and iterate over all bits. For each bit we unset it, and then we check if the set of masked bytes is still unique. If it's true, we keep that bit unset, otherwise revert the change.

Below are shown two procedures that compute the mask.

func computeMask(words []Keyword) (mask []byte, err error) {
    max := 0
    for _, w := range words {
        if n := len(w.word); n > max {
            max = n
        }
    }

    mask = make([]byte, max)
    for i := range mask {
        mask[i] = 0xff
    }

    ok := checksamesize(words, mask)
    if !ok {
        err = fmt.Errorf("set of words is not unique")
        return
    }

    for byteIdx := 0; byteIdx < max; byteIdx++ {
        for bitIdx := 0; bitIdx < 8; bitIdx++ {
            old := mask[byteIdx]
            mask[byteIdx] &= ^(1 << bitIdx)
            if !checksamesize(words, mask) {
                mask[byteIdx] = old
            }
        }
    }

    return
}

func checksamesize(words []Keyword, mask []byte) bool {
    M := make(map[string]struct{})
    word := make([]byte, len(mask))
    for i := range words {
        for j := range mask {
            word[j] = words[i].word[j] & mask[j]
        }

        s := string(word)
        _, ok := M[s]
        if ok {
            return false
        }
        M[s] = struct{}{}
    }

    return true
}

Loads

As it was shown in examples above, often a single load is sufficient, as the significant bits fit in a 8-, 16-, 32-, or 64-bit word.

However, sometimes more loads are needed. In the generic case we concatenate results of individual loads, forming a 32- or 64-bit word. Below is an example of such code, where we combine 8-bit and 16-bit input words into a single 32-bit word, which is the final argument to PEXT. Note that the we need only to concatenate values in runtime, the mask for PEXT is know in the compile-time.

uint32_t w1;
uint16_t w2 = 0;
memcpy(&w2, &s[0], 2);
w1 = uint32_t(w2);
const uint8_t w3 = s[2];
w1 |= (uint32_t(w3) << 16);
const uint64_t idx = _pext_u64(w1, 0x1e1104);

Another possibility is when we have exactly two loads and their masks do not collide, their bit-and result is zero. Then we can merge these two words; sample code for such case is shown below.

uint16_t w7 = 0;
memcpy(&w7, &s[10], 2);
const uint8_t w8 = s[2];
const uint16_t w9 = (uint16_t(w7) & 0x410) | (uint16_t(w8) & 0x4);
const uint64_t idx = _pext_u64(w9, 0x414);

Compile-time switches

To complete the image, we present a better alternative to compile-time trie expansion. As the first step, the method splits input by the length. Then we compute switch expression for same-lengths strings; each subset is processed separately.

Like in the trie approach, we use a single character to select further processing. But since we deal with subset of strings having the same sizes, we can pick a character at any index. We select an index that splits the subset into the greatest number of smaller subsets, ideally into single-element subsets (singletons). The process is repeated recursively, we need only to remember which indices were already used.

For example, the six-letter keywords of Java, can be fully classified by the second character. Five-letter keywords are classified by the last character, but for one case (the letter 't') more tests are needed.

Below is shown the corresponding fragment of lookup procedure.

int lookup_java_split(std::string_view s) {
    switch (s.size()) {
        // ...
        case 5: {
            switch (s[4]) {
                case 'w': { // 0x77
                    if (s == "throw") {
                        return 4;
                    }
                }
                break;
                case 'k': { // 0x6b
                    if (s == "break") {
                        return 10;
                    }
                }
                break;
                case 't': { // 0x74
                    switch (s[0]) {
                        case 's': { // 0x73
                            if (s == "short") {
                                return 23;
                            }
                        }
                        break;
                        case 'f': { // 0x66
                            if (s == "float") {
                                return 36;
                            }
                        }
                        break;
                        case 'c': { // 0x63
                            if (s == "const") {
                                return 39;
                            }
                        }
                        break;
                    }
                }
                break;
                case 'h': { // 0x68
                    if (s == "catch") {
                        return 25;
                    }
                }
                break;
                case 'l': { // 0x6c
                    if (s == "final") {
                        return 26;
                    }
                }
                break;
                case 'r': { // 0x72
                    if (s == "super") {
                        return 33;
                    }
                }
                break;
                case 'e': { // 0x65
                    if (s == "while") {
                        return 34;
                    }
                }
                break;
                case 's': { // 0x73
                    if (s == "class") {
                        return 35;
                    }
                }
                break;
            }
        }
        break;
        case 6: {
            switch (s[1]) {
                case 'h': { // 0x68
                    if (s == "throws") {
                        return 9;
                    }
                }
                break;
                case 'o': { // 0x6f
                    if (s == "double") {
                        return 11;
                    }
                }
                break;
                case 'm': { // 0x6d
                    if (s == "import") {
                        return 12;
                    }
                }
                break;
                case 'u': { // 0x75
                    if (s == "public") {
                        return 13;
                    }
                }
                break;
                case 'e': { // 0x65
                    if (s == "return") {
                        return 18;
                    }
                }
                break;
                case 't': { // 0x74
                    if (s == "static") {
                        return 28;
                    }
                }
                break;
                case 'a': { // 0x61
                    if (s == "native") {
                        return 37;
                    }
                }
                break;
                case 'w': { // 0x77
                    if (s == "switch") {
                        return 38;
                    }
                }
                break;
            }
        }
        break;
        // ...
    }
    return -1;
}

Experiments

Input sets

The input sets contains mostly keywords from programming languages.

Input sets
name	number of words	notes
`ada-pred`	54	gperf tests
`ada-res`	63	gperf tests
`ada`	63	gperf tests
`adadefs`	54	gperf tests
`c`	32	gperf tests
`countries`	192	English names of countries
`cplusplus`	106	gperf tests
`cpp`	47	gperf tests
`go`	25
`gpc`	35	gperf tests
`irc`	45	gperf tests
`java`	47	gperf tests
`jscript`	53	gperf tests
`modula`	53	gperf tests
`modula2`	40	gperf tests
`modula3`	53	gperf tests
`pascal`	36	gperf tests
`sneller`	240	SQL keywords and some functions from Sneller
`snellerbuiltin`	119	builtin functions from Sneller
`states`	50	USA state names

Input data

The input words were repeated in random order to produce approx 1MB block of text, not counting the new line character.

Keywords were mutated by changing a single letter (at random position) into a random character or digit. Chances that keyword is added intact were set to: 0%, 25%, 50% and 75%. This is what we call "valid words density".

Tested procedures

Tested procedures
procedure	comments
`pext`	PEXT-based, bit-level function
`split`	Compile-time switches
`hash1`	Compile-time hash table with hash1 function
`hash2`	Compile-time hash table with hash2 function
`hash3`	Compile-time hash table with hash3 function
`hash_sum0`	Compile-time hash table with hash_sum0 function
`hash_sumN`	Compile-time hash table with hash_sumN function
`hash_djb2`	Compile-time hash table with hash_djb2 function
`hash_sdb`	Compile-time hash table with hash_sdb function

Skylake timings

CPU:	Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Compiler:	g++-9 (Ubuntu 9.3.0-10ubuntu2~16.04) 9.3.0

The numbers are given in CPU cycles.

	valid words density
procedure	0%	speed-up	25%	speed-up	50%	speed-up	75%	speed-up
dataset `ada`
pext	33.20	1.000	41.47	1.000	45.67	1.000	39.57	1.000
split	61.81	0.537	67.50	0.614	67.69	0.675	65.62	0.603
hash1	75.38	0.440	77.63	0.534	73.33	0.623	64.66	0.612
hash2	72.78	0.456	76.73	0.540	72.23	0.632	64.01	0.618
hash3	55.63	0.597	67.57	0.614	68.79	0.664	63.79	0.620
hash_sum0	47.63	0.697	60.64	0.684	69.10	0.661	69.96	0.566
hash_sumN	50.68	0.655	61.91	0.670	70.02	0.652	68.14	0.581
hash_sdb	39.91	0.832	50.90	0.815	61.03	0.748	56.61	0.699
hash_djb2	41.22	0.805	51.66	0.803	60.32	0.757	55.12	0.718
dataset `ada-pred`
pext	37.41	1.000	45.55	1.000	46.60	1.000	41.09	1.000
split	63.65	0.588	68.54	0.665	66.71	0.699	62.59	0.657
hash1	87.09	0.430	88.23	0.516	84.02	0.555	70.16	0.586
hash2	88.20	0.424	89.78	0.507	79.61	0.585	70.49	0.583
hash3	64.07	0.584	64.80	0.703	60.81	0.766	49.78	0.825
hash_sum0	49.99	0.748	62.74	0.726	70.37	0.662	71.98	0.571
hash_sumN	52.95	0.706	64.19	0.710	75.90	0.614	74.72	0.550
hash_sdb	49.55	0.755	61.30	0.743	72.63	0.642	66.59	0.617
hash_djb2	46.79	0.800	58.65	0.777	66.92	0.696	64.21	0.640
dataset `ada-res`
pext	33.28	1.000	41.21	1.000	45.13	1.000	39.77	1.000
split	65.65	0.507	71.41	0.577	67.72	0.666	69.91	0.569
hash1	74.25	0.448	76.05	0.542	73.33	0.615	62.88	0.632
hash2	73.24	0.454	78.54	0.525	72.33	0.624	65.06	0.611
hash3	54.56	0.610	66.09	0.623	69.05	0.654	62.02	0.641
hash_sum0	47.62	0.699	58.82	0.700	72.94	0.619	68.58	0.580
hash_sumN	50.34	0.661	63.12	0.653	70.00	0.645	65.14	0.610
hash_sdb	40.05	0.831	51.05	0.807	61.46	0.734	56.13	0.709
hash_djb2	41.53	0.801	52.13	0.790	60.56	0.745	56.34	0.706
dataset `adadefs`
pext	37.85	1.000	45.68	1.000	46.70	1.000	41.30	1.000
split	62.59	0.605	67.59	0.676	67.50	0.692	63.26	0.653
hash1	90.23	0.419	91.55	0.499	82.56	0.566	72.14	0.573
hash2	85.91	0.441	86.50	0.528	82.34	0.567	67.33	0.613
hash3	66.51	0.569	67.66	0.675	58.06	0.804	52.81	0.782
hash_sum0	48.92	0.774	60.24	0.758	74.57	0.626	68.25	0.605
hash_sumN	53.62	0.706	66.31	0.689	73.98	0.631	78.04	0.529
hash_sdb	50.43	0.750	62.91	0.726	72.20	0.647	64.81	0.637
hash_djb2	45.90	0.824	58.03	0.787	69.26	0.674	62.68	0.659
dataset `c`
pext	31.99	1.000	40.50	1.000	44.03	1.000	38.06	1.000
split	50.58	0.632	55.28	0.733	58.78	0.749	53.15	0.716
hash1	61.26	0.522	62.69	0.646	58.98	0.746	49.40	0.770
hash2	61.59	0.519	62.57	0.647	53.50	0.823	48.16	0.790
hash3	51.56	0.620	58.94	0.687	56.15	0.784	45.65	0.834
hash_sum0	50.07	0.639	62.08	0.652	69.28	0.635	67.87	0.561
hash_sumN	45.62	0.701	58.30	0.695	66.23	0.665	61.98	0.614
hash_sdb	42.02	0.761	52.80	0.767	61.86	0.712	52.40	0.726
hash_djb2	40.45	0.791	51.62	0.784	58.87	0.748	55.34	0.688
dataset `countries`
pext	36.51	1.000	45.13	1.000	47.62	1.000	41.73	1.000
split	97.77	0.373	103.25	0.437	102.72	0.464	94.96	0.439
hash1	132.88	0.275	129.03	0.350	116.18	0.410	100.02	0.417
hash2	129.50	0.282	126.89	0.356	118.28	0.403	99.82	0.418
hash3	92.53	0.395	97.42	0.463	88.54	0.538	79.79	0.523
hash_sum0	62.07	0.588	78.25	0.577	86.10	0.553	88.92	0.469
hash_sumN	63.41	0.576	80.48	0.561	84.76	0.562	89.56	0.466
hash_sdb	50.96	0.716	64.70	0.698	76.18	0.625	76.97	0.542
hash_djb2	63.47	0.575	74.61	0.605	87.23	0.546	86.02	0.485
dataset `cplusplus`
pext	33.82	1.000	42.58	1.000	46.32	1.000	40.12	1.000
split	67.07	0.504	72.82	0.585	72.88	0.636	71.07	0.565
hash1	109.82	0.308	109.25	0.390	101.32	0.457	90.27	0.444
hash2	96.47	0.351	97.92	0.435	92.32	0.502	81.82	0.490
hash3	86.25	0.392	91.60	0.465	87.83	0.527	78.80	0.509
hash_sum0	51.95	0.651	66.86	0.637	76.79	0.603	80.85	0.496
hash_sumN	57.29	0.590	70.77	0.602	81.49	0.568	87.37	0.459
hash_sdb	44.25	0.764	56.91	0.748	67.76	0.684	59.27	0.677
hash_djb2	45.77	0.739	56.94	0.748	66.75	0.694	62.54	0.642
dataset `cpp`
pext	31.13	1.000	39.15	1.000	43.77	1.000	37.27	1.000
split	60.01	0.519	64.33	0.609	61.03	0.717	61.17	0.609
hash1	79.09	0.394	80.43	0.487	79.58	0.550	64.79	0.575
hash2	77.41	0.402	79.82	0.490	71.89	0.609	64.33	0.579
hash3	52.87	0.589	58.08	0.674	54.88	0.798	45.60	0.817
hash_sum0	47.12	0.661	60.08	0.652	67.08	0.653	65.49	0.569
hash_sumN	48.65	0.640	62.08	0.631	68.92	0.635	67.31	0.554
hash_sdb	42.10	0.740	52.74	0.742	61.94	0.707	55.81	0.668
hash_djb2	40.41	0.770	50.86	0.770	58.90	0.743	55.69	0.669
dataset `go`
pext	34.09	1.000	42.02	1.000	44.33	1.000	38.98	1.000
split	53.48	0.638	57.68	0.729	53.66	0.826	52.37	0.744
hash1	77.18	0.442	80.18	0.524	73.74	0.601	67.21	0.580
hash2	75.88	0.449	79.44	0.529	77.16	0.575	64.39	0.605
hash3	50.11	0.680	60.97	0.689	52.59	0.843	49.99	0.780
hash_sum0	45.55	0.748	55.76	0.754	64.22	0.690	59.05	0.660
hash_sumN	43.86	0.777	54.65	0.769	60.94	0.728	56.83	0.686
hash_sdb	44.85	0.760	56.65	0.742	65.52	0.677	55.36	0.704
hash_djb2	47.05	0.725	57.45	0.731	65.85	0.673	59.28	0.658
dataset `gpc`
pext	31.63	1.000	39.37	1.000	43.57	1.000	37.99	1.000
split	54.04	0.585	58.34	0.675	60.28	0.723	55.08	0.690
hash1	67.64	0.468	77.23	0.510	77.72	0.561	67.73	0.561
hash2	56.41	0.561	61.56	0.640	55.65	0.783	49.95	0.761
hash3	36.41	0.869	51.94	0.758	53.33	0.817	44.76	0.849
hash_sum0	48.02	0.659	59.74	0.659	68.07	0.640	69.51	0.547
hash_sumN	48.15	0.657	59.59	0.661	67.78	0.643	64.33	0.591
hash_sdb	39.74	0.796	51.98	0.757	60.72	0.718	55.70	0.682
hash_djb2	45.31	0.698	56.10	0.702	62.26	0.700	56.72	0.670
dataset `irc`
pext	30.92	1.000	39.45	1.000	43.61	1.000	35.90	1.000
split	61.98	0.499	66.63	0.592	67.25	0.649	63.39	0.566
hash1	91.91	0.336	91.33	0.432	84.69	0.515	75.90	0.473
hash2	73.70	0.420	76.54	0.515	75.02	0.581	61.11	0.587
hash3	49.63	0.623	58.92	0.670	50.82	0.858	46.29	0.776
hash_sum0	52.71	0.587	65.81	0.600	73.05	0.597	67.20	0.534
hash_sumN	57.54	0.537	71.23	0.554	75.28	0.579	70.94	0.506
hash_sdb	41.56	0.744	52.76	0.748	61.69	0.707	56.09	0.640
hash_djb2	41.20	0.751	52.06	0.758	62.05	0.703	56.49	0.636
dataset `java`
pext	32.81	1.000	41.97	1.000	44.86	1.000	40.05	1.000
split	61.79	0.531	63.11	0.665	67.20	0.668	60.71	0.660
hash1	84.10	0.390	87.53	0.479	80.63	0.556	70.61	0.567
hash2	79.72	0.412	79.53	0.528	74.78	0.600	61.47	0.652
hash3	54.22	0.605	61.39	0.684	52.91	0.848	48.92	0.819
hash_sum0	50.32	0.652	64.39	0.652	73.70	0.609	79.53	0.504
hash_sumN	51.68	0.635	64.23	0.654	70.61	0.635	72.76	0.550
hash_sdb	43.18	0.760	55.42	0.757	64.60	0.694	57.78	0.693
hash_djb2	43.35	0.757	53.60	0.783	64.25	0.698	59.08	0.678
dataset `jscript`
pext	32.62	1.000	41.07	1.000	45.48	1.000	39.26	1.000
split	55.46	0.588	60.59	0.678	63.60	0.715	57.62	0.681
hash1	76.26	0.428	78.06	0.526	76.60	0.594	63.68	0.617
hash2	74.21	0.440	75.27	0.546	67.39	0.675	60.76	0.646
hash3	53.13	0.614	60.49	0.679	57.44	0.792	49.12	0.799
hash_sum0	50.57	0.645	62.13	0.661	71.23	0.639	71.21	0.551
hash_sumN	53.58	0.609	64.87	0.633	76.94	0.591	76.87	0.511
hash_sdb	46.41	0.703	57.42	0.715	68.15	0.667	58.71	0.669
hash_djb2	44.39	0.735	54.69	0.751	63.35	0.718	59.26	0.663
dataset `modula`
pext	32.74	1.000	40.76	1.000	45.30	1.000	39.60	1.000
split	60.07	0.545	64.43	0.633	63.79	0.710	61.26	0.646
hash1	69.22	0.473	71.27	0.572	66.81	0.678	58.11	0.682
hash2	71.44	0.458	76.04	0.536	71.22	0.636	62.85	0.630
hash3	43.89	0.746	56.34	0.723	51.91	0.873	45.92	0.862
hash_sum0	48.84	0.670	60.48	0.674	69.19	0.655	69.94	0.566
hash_sumN	50.78	0.645	61.46	0.663	70.09	0.646	70.17	0.564
hash_sdb	44.64	0.733	55.30	0.737	63.62	0.712	57.84	0.685
hash_djb2	42.33	0.773	52.38	0.778	61.94	0.731	56.49	0.701
dataset `modula2`
pext	31.87	1.000	40.22	1.000	44.70	1.000	38.37	1.000
split	58.01	0.549	62.07	0.648	57.86	0.773	58.33	0.658
hash1	69.44	0.459	74.37	0.541	67.05	0.667	60.74	0.632
hash2	69.66	0.458	78.43	0.513	77.69	0.575	65.48	0.586
hash3	41.98	0.759	56.11	0.717	51.12	0.874	47.33	0.811
hash_sum0	45.71	0.697	57.02	0.705	71.58	0.624	64.89	0.591
hash_sumN	53.05	0.601	63.66	0.632	74.12	0.603	68.24	0.562
hash_sdb	45.22	0.705	55.33	0.727	63.21	0.707	59.78	0.642
hash_djb2	40.88	0.780	51.74	0.777	61.84	0.723	56.80	0.676
dataset `modula3`
pext	32.15	1.000	40.28	1.000	45.30	1.000	39.02	1.000
split	56.67	0.567	61.46	0.655	63.68	0.711	58.65	0.665
hash1	70.24	0.458	73.43	0.549	70.79	0.640	60.96	0.640
hash2	71.59	0.449	76.39	0.527	69.77	0.649	63.72	0.612
hash3	43.70	0.736	56.68	0.711	55.22	0.820	46.28	0.843
hash_sum0	50.16	0.641	61.09	0.659	68.90	0.657	70.24	0.556
hash_sumN	50.20	0.640	61.30	0.657	69.35	0.653	69.73	0.560
hash_sdb	40.66	0.791	51.76	0.778	61.05	0.742	51.66	0.755
hash_djb2	40.96	0.785	51.04	0.789	59.26	0.764	56.20	0.694
dataset `pascal`
pext	32.61	1.000	40.32	1.000	44.18	1.000	38.97	1.000
split	56.47	0.578	61.20	0.659	58.03	0.761	58.26	0.669
hash1	66.03	0.494	75.19	0.536	70.14	0.630	65.05	0.599
hash2	57.36	0.569	59.72	0.675	56.49	0.782	46.41	0.840
hash3	39.59	0.824	55.37	0.728	51.04	0.865	48.82	0.798
hash_sum0	42.79	0.762	52.50	0.768	59.01	0.749	54.74	0.712
hash_sumN	49.75	0.656	60.97	0.661	67.10	0.658	66.58	0.585
hash_sdb	45.78	0.712	56.05	0.719	63.83	0.692	56.62	0.688
hash_djb2	41.33	0.789	52.38	0.770	62.20	0.710	55.77	0.699
dataset `sneller`
pext	36.32	1.000	45.18	1.000	48.15	1.000	41.59	1.000
split	84.98	0.427	91.96	0.491	91.27	0.528	85.14	0.489
hash1	111.37	0.326	109.84	0.411	101.95	0.472	87.03	0.478
hash2	107.71	0.337	108.25	0.417	99.64	0.483	84.69	0.491
hash3	85.25	0.426	91.38	0.494	86.67	0.556	76.52	0.543
hash_sum0	63.46	0.572	79.08	0.571	87.85	0.548	90.40	0.460
hash_sumN	68.17	0.533	81.57	0.554	90.65	0.531	92.30	0.451
hash_sdb	50.37	0.721	64.13	0.705	77.16	0.624	80.36	0.518
hash_djb2	67.26	0.540	80.46	0.562	92.01	0.523	94.85	0.438
dataset `snellerbuiltin`
pext	39.24	1.000	47.75	1.000	49.16	1.000	43.15	1.000
split	67.34	0.583	72.82	0.656	70.72	0.695	70.04	0.616
hash1	118.59	0.331	116.16	0.411	109.08	0.451	89.63	0.481
hash2	123.17	0.319	120.14	0.397	105.14	0.468	90.62	0.476
hash3	105.75	0.371	109.16	0.437	102.81	0.478	87.78	0.492
hash_sum0	62.24	0.630	74.92	0.637	84.30	0.583	86.59	0.498
hash_sumN	59.27	0.662	70.96	0.673	82.69	0.594	81.20	0.531
hash_sdb	53.43	0.734	68.37	0.698	81.32	0.604	74.38	0.580
hash_djb2	59.08	0.664	72.51	0.658	83.62	0.588	86.05	0.501
dataset `states`
pext	37.22	1.000	45.84	1.000	47.53	1.000	41.11	1.000
split	66.25	0.562	71.06	0.645	69.24	0.686	64.75	0.635
hash1	88.44	0.421	88.77	0.516	80.99	0.587	67.90	0.605
hash2	95.52	0.390	95.42	0.480	87.17	0.545	73.57	0.559
hash3	66.76	0.557	67.49	0.679	60.97	0.779	52.47	0.784
hash_sum0	48.00	0.775	62.88	0.729	69.39	0.685	64.14	0.641
hash_sumN	51.32	0.725	63.15	0.726	68.72	0.692	64.42	0.638
hash_sdb	45.26	0.822	59.31	0.773	71.25	0.667	60.47	0.680
hash_djb2	48.32	0.770	59.57	0.770	70.71	0.672	64.98	0.633

Summary

The PEXT-based hash is the fastest.
Fast hashesh (hash1, hash2, hash3) indeed do not touch all input bytes, but their quality is poor, and as a result, performance is not good.
Surprisingly, summing all input bytes is comparable to more complex djb2 or sdbm hashes.

Sources

All sources are available on Github: https://github.com/WojciechMula/toys/tree/master/lookup-in-string-set.