Efficient trie representation

Author:Wojciech Muła
Last update:2011-03-31
Added on:2011-03-26

Trie is a multi-way tree, where each edge is labelled with a single letter. In this text several trie representations are shown.

The article is the side effect of work on the pyahocorasic module.

Contents

Introduction

Each node of a tree store two kinds of data: user data and trie-related data. Let say an alphabet has at most 256 letters, thus a letter could be saved on a byte.

During maintaining tree structure following issues appear:

In experiments Polish, English, French and German words lists were used. User data has got two pointers, i.e. additional 8 bytes per node.

list words trie nodes
english 138 622 312 855
german 162 032 610 470
french 629 420 1 297 080
polish 3 588 729 5 933 658

The sample program were linked against GNU libc, and procedure malloc_stats from malloc.h was used to obtain statistics about the real memory usage.

Simple layouts

Array of all possible edges [version 1]

[image]

The simplest solution is to store all possible pointers to children nodes in array of constant size. C-structure:

struct TrieNode {
        /* user data */

        struct TrieNode* children[256];
}

Procedure that returns child node or NULL depending on a letter works in constant time:

TrieNode* NEXT(TrieNode* node, uint8_t letter) {
        return node->children[letter]
}

Cons

  • Huge memory requirements — size of node 256 * size of pointer + size of user data.

Pros

  • Very simple implementation and use.
  • Theoretically constant time addressing, but because of large data cache misses may be visible.

Experiment results

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 1,032 322,866,360 1032.00 325,369,216 2,502,856 0.77
german 1,032 630,005,040 1032.00 634,888,816 4,883,776 0.77
french 1,032 1,338,586,560 1032.00 1,348,963,216 10,376,656 0.77
polish n/a n/a n/a n/a n/a n/a

Pointer to array of all possible edges [version 1a]

Instead of a static array of pointers allocate such array only when needed, i.e. leaf nodes don't occupy much memory. This cost additional test when accessing children.

Node structure:

struct TrieNode {
        /* user data */

        struct TrieNode** children;
}

Next function:

TrieNode* NEXT(TrieNode* node, uint8_t letter) {
        if (node->children)
                return node->children[letter];
        else
                return NULL;
}

Pros

  • smaller memory, but still large

Experiments results

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 12 232,249,620 742.36 235,286,176 3,036,556 1.29
german 12 495,588,296 811.81 501,844,744 6,256,448 1.25
french 12 894,746,784 689.82 906,803,728 12,056,944 1.33
polish n/a n/a n/a n/a n/a n/a

Pointer to array of limited set of edges [version 1b]

Do not store the array of all possible pointers, but only a continuous range between first and last non-null pointer.

Node structure:

struct TrieNode {
        /* user data */

        uint8_t lo;
        uint8_t hi;
        struct TrieNode** children;
}

The array has size hi - lo + 1.

Next function:

TrieNode* NEXT(TrieNode* node, uint8_t letter) {
        if (node->children) {
                if (letter >= node->lo and letter <= node->hi)
                        return node->children[letter - node->lo];
                else
                        return NULL;
        }
        else
                return NULL;
}

Pros

  • much smaller memory (but still quite large and in degenerated cases allocated memory could be as large as in version 1a)

Cons

  • more tests in order to retrieve a child node
  • inserting/deleting nodes require more operations

Experiments results

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 16 12,688,512 40.56 17,552,000 4,863,488 27.71
german 16 18,303,588 29.98 28,461,208 10,157,620 35.69
french 16 44,657,336 34.43 63,965,320 19,307,984 30.19
polish 16 312,236,524 52.62 393,812,632 81,576,108 20.71

Pointer to array of limited set of edges — lookup tables

This method extends 1b approach. The size of chars set stored in entire trie may be smaller than 256, for example words in english dictionary contains small and big letters and apostrophe, i.e. about 53 chars. Thus the size of array in each node could be much smaller, with cost of additional translation table(s).

Such translation tables can be applied for all nodes or be specific for nodes at the same depth. Sample functions retrieving child node:

TrieNode* NEXT(TrieNode* node, uint8_t letter) {
        const uint16_t L = trie->translate[letter];
        if (L == trie->INVALID_LETTER)
                return NULL;
        else
                return node->children[L]
}


TrieNode* NEXT(TrieNode* node, uint8_t letter) {
        /* depth could be parameter of function! */
        const size_t depth = get_depth(node);

        const uint16_t L = trie->translate[depth][letter];
        if (L == trie->INVALID_LETTER)
                return NULL;
        else
                return node->children[L]
}

Pros

  • even less memory (but still large)

Cons

  • 3 or 4 memory accesses, additional tests
  • translation tables can be constructed when a trie already exists
  • adding new words containing invalid characters forces to rebuild lookup tables

Experiments results

Global translation table has 54 elements, while size of translation tables created for each depth vary from 1 to 53:

Results for tries with additional lookups.

method node struct size total size bytes/node
All possible edges 1036 324,117,780 1036.00
Global translation table 16 72,582,360 232.00
Depth-local translation table 16 41,797,316 133.60

Dynamic table of pointers [version 2]

[image]

Dynamic table of pointers — store only non-null pointers. This require store also size of a table and a edge label.

struct TrieNode {
        /* user data */

        uint8_t letter; // edge label
        uint16_t n;             // array size
        TrieNode** children;    // array
}

Procedure returns next node:

TrieNode* g(TrieNode* node, uint8_t letter) {
        int i;
        const int n = node->n;
        for (i=0; i < n; i++)
                if (node->child[i]->letter == letter)
                        return node->child[i];

        return NULL;
}

Pros

  • Simple to use, less memory usage than the linked list version.

Cons

  • Linear time to obtain a child node.
  • Many indirect memory operations.
  • Adding/removing require reallocating memory, this brings internal fragmentation.

Improvements

  • Keep a table sorted and thus allow logarithmic time to obtain a child (binary search).
  • Or use move-to-front strategy.
  • For n = 1 do not allocate 1-element table, but save pointer directly in a node structure (version 2a).
struct TrieNode {
        /* user data */

        uint8_t letter;
        uint16_t n;
        union {
                TrieNode*       single_child;   // valid when n = 1
                TrieNode**      ptr_array;
        }
}

Experiments results

Version 2:

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 16 6,257,096 20.00 11,193,376 4,936,280 44.10
german 16 12,209,396 20.00 22,448,216 10,238,820 45.61
french 16 25,941,596 20.00 45,420,392 19,478,796 42.89
polish 16 118,673,156 20.00 201,448,152 82,774,996 41.09

Version 2a:

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 16 5,563,904 17.78 8,419,680 2,855,776 33.92
german 16 10,607,980 17.38 16,040,816 5,432,836 33.87
french 16 23,480,024 18.10 35,573,008 12,092,984 33.99
polish 16 109,473,048 18.45 164,644,264 55,171,216 33.51

Linked list of pointers [version 2b]

[image]

Use single-linked list instead of dynamic table. This requires also store a letter (label's edge) on a list.

struct TrieNode {
        /* user data */

        ListItem* head;
};

ListItem {
        uint8_t         letter;
        TrieNode*       child;
        ListItem*       next;
};

Procedure returns next node:

TrieNode* g(TrieNode* node, uint8_t letter) {
        ListItem* item = node->head;

        while (item) {
                if (item->letter == letter)
                        return item->child;
                else
                        item = item->next;
        }

        return NULL
}

Pros

  • Store only required links.

Cons

  • Linear time to obtain a child node.
  • Many indirect memory operations.
  • Each node of linked list has small size (2 pointers = 8 bytes on 32-bit machine), thus large internal memory fragmentation when a standard C library routines are used (malloc/free).

Improvements

  • Use move-to-front strategy, i.e. move last found item to the front of a list — next search will find this item faster.
  • Use memory pools to reduce memory wasting. See section Memory allocators

Experiments results

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 12 7,508,508 24.00 10,011,360 2,502,852 25.00
german 12 14,651,268 24.00 19,535,040 4,883,772 25.00
french 12 31,129,908 24.00 41,506,560 10,376,652 25.00
polish 12 142,407,780 24.00 189,877,056 47,469,276 25.00

Dynamic table modifications

Table of letters and pointers [version 3]

[image]

The dynamic table require two memory reads: the first gets a pointer to child, then get a letter from a child. In this representation letters are separated from the node structure and saved in front of the table of pointers. Sample memory layout for node with 3 children:

|     letters   |       pointers        |
+---+---+---+---+-------+-------+-------+
| a | b | c | _ | ptr_0 | ptr_1 | ptr_2 |
+---+---+---+---+-------+-------+-------+
  0   1   2   x     0       1       2     <- index

Letters area has size rounded to alignment, in this case 4 bytes.

Structure of node is simple:

struct TrieNode {
        /* user data */

        uint16_t n;
        void* children;
}

Procedure returns next node:

TrieNode* NEXT_v3(TrieNode* node, uint8_t letter) {
        const int n = node->n;
        const letters_size = round_up(n, aligment);

        char* letters =
                (char*)(node->children);

        TrieNode* pointers =
                (TrieNode*)(node->children + letters_size);

        /* find index */
        for (i=0; i < n; i++)
                if (chars[i] == letter)
                        return pointers[i];

        return NULL;
}

Pros

  • Letters are stored in a single memory area, thus specialized routines to locating chars could be used, for example functions that use SSE instructions.

Cons

  • Linear time to obtain a child node.
  • Adding/removing words require reallocating memory.
  • Internal memory fragmentation in letters area.
  • Because of structure packing size of node don't become smaller.

Improvements

  • Keep a table sorted and thus allow logarithmic time to obtain child nodes.
  • Or use move-to-front strategy.
  • After all structure packing could be ally. If alignment is 4 bytes, then there are two free bytes after n element. If n == 1 or n == 2 we don't need to allocate letters part of dynamic table, but use these spare bytes. (version 3a)
struct TrieNode {
        /* user data */

        uint16_t n;
        uint8_t  byte0; // n == 1 or n == 2
        uint8_t  byte1; // n == 2
        void*    children;
}

Procedure returns next node:

TrieNode* NEXT(TrieNode* node, uint8_t letter) {
        const int n = node->n;
        switch (n) {
                case 0:
                        return NULL;

                case 1:
                        if (node->byte0 == letter)
                                return (TrieNode**)(node->children)[0];
                        else
                                return NULL;

                case 2:
                        if (node->byte0 == letter)
                                return (TrieNode**)(node->children)[0];
                        else if (node->byte1 == letter)
                                return (TrieNode**)(node->children)[1];
                        else
                                return NULL;

                default:
                        return NEXT_v3(node, letter);
        }
}

Experiments results

Version 3:

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 16 7,588,476 24.26 11,349,160 3,760,684 33.14
german 16 18,070,716 24.17 28,197,544 10,126,828 35.91
french 16 62,388,760 24.20 97,167,832 34,779,072 35.79
polish 16 310,656,956 24.24 480,136,984 169,480,028 35.30

Version 3a:

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 16 6,406,060 20.48 11,349,160 4,943,100 43.55
german 16 15,098,296 20.19 28,122,600 13,024,304 46.31
french 16 52,107,676 20.21 96,885,336 44,777,660 46.22
polish 16 259,933,364 20.28 478,677,552 218,744,188 45.70

Table of letters, children count and pointers [version 3b]

[image]

A modification of the previous scheme — move counter n to a dynamic array. Also type of n could be change to uint8_t. If node has no children, then pointer is NULL, and n never get value 0, so we have values ouside range 1..256.

Structure of node is simple:

struct TrieNode {
        /* user data */

        void* children;
}

Procedure returns next node:

TrieNode* NEXT(TrieNode* node, uint8_t letter) {
        const int n = (uint_8)(node->children[0]);
        const letters_size = round_up(n, aligment);

        char* letters =
                (char*)(node->children + 1);

        TrieNode* pointers =
                (TrieNode*)(node->children + letters_size + 1);

        /* find index */
        for (i=0; i < n; i++)
                if (chars[i] == letter)
                        return pointers[i];

        return NULL;
}

Sample memory layout for node with 3 children:

| n |  letters  |       pointers        |
+---+---+---+---+-------+-------+-------+
| 3 | a | b | c | ptr_0 | ptr_1 | ptr_2 |
+---+---+---+---+-------+-------+-------+
      0   1   2     0       1       2     <- index

Pros

  • smaller size of a node

Cons

see previous section

Experiments results

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 12 6,297,676 20.13 8,841,424 2,543,748 28.77
german 12 12,267,360 20.09 17,782,664 5,515,304 31.02
french 12 26,152,132 20.16 35,824,896 9,672,764 27.00
polish 12 120,114,864 20.24 158,666,648 38,551,784 24.30

Array of children nodes [version 4]

[image]

Instead of storing pointers to children it's possible to store all children nodes in a single array and save only one pointer to that array.

Structure of node:

struct TrieNode {
        /* user data */

        uint16_t  n
        uint8_t   letter;
        TrieNode* array;
}

Pros

  • much smaller memory requirements
  • memory locality (spatial locality)

Cons

  • adding/removing nodes require reallocation, this bring internal memory fragmentation

Size improvements

If a trie is not modified, then memory could be even more compacted, but following constraints are set:

  • all nodes are stored in a continuous area of memory (array)
  • there are up to 223 − 1 nodes

Structure of node:

struct TrieNode {
        /* user data */

        uint32_t        letter:8;       // 8-bit for letter
        uint32_t        eol:1;          // 1-bit for End-of-list marker
        uint32_t        index:23;       // index of subarray of children
}

Procedure thar returns child node:

TrieNode nodes[k];

TrieNode* NEXT(TrieNode* node, const uint8_t letter) {
        if (node->index == 0)
                return NULL;

        TrieNode* children = &nodes[index];
        size_t i = 0;

        while (1) {
                if (children[i].letter == letter)
                        return &children[i];
                else {
                        if (children[i].eol)
                                // last element of array
                                return NULL;
                        else
                                i += 1;
                }
        }
}

Such trie takes exactly nodes count * sizeof(TrieNode).

Experiments results

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 16 5,005,680 16.00 6,814,952 1,809,272 26.55
german 16 9,767,520 16.00 13,623,080 3,855,560 28.30
french 16 20,753,280 16.00 27,680,536 6,927,256 25.03
polish 16 94,938,528 16.00 122,940,568 28,002,040 22.78

Table of letter and array of children nodes [version 4a]

This is merge of version 4 and version 3b — children nodes are saved in an array (v4), a children counter and edge lables are also stored in the table (v3b).

Structure of node:

struct TrieNode {
        /* user data */

        uint8_t* array;
}

(Pseudo)structure od array:

pseudostruct array {
        uint8_t n;
        uint8_t letters[n];
        uint8_t __alignment__[?];

        TrieNode next[n];
}

Procedure returns next node:

TrieNode* NEXT(TrieNode* node, uint8_t letter) {
        const int n = (int)(node->array[0]);
        const letters_size = round_up(n + 1, aligment);

        char* letters =
                (char*)(node->array + 1);

        TrieNode* next =
                (TrieNode*)(node->array + letters_size);

        /* find index */
        for (i=0; i < n; i++)
                if (letters[i] == letter)
                        return &next[i];

        return NULL;
}

Pros & cons

same as for version 4

Experiments results

language node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
english 12 4'687'400 14.98 6'340'624 1'653'224 26.07
german 12 9'290'880 15.22 12'901'616 3'610'736 28.00
french 12 19'209'800 14.81 25'460'408 6'250'608 25.55
polish 12 86'460'392 14.57 111'249'824 24'789'432 22.28

Experiments results summary

English words:

layout node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
version 1 1,032 322,866,360 1032.00 325,369,216 2,502,856 0.77
version 1a 12 232,249,620 742.36 235,286,176 3,036,556 1.29
version 1b 16 12,688,512 40.56 17,552,000 4,863,488 27.71
version 2 16 6,257,096 20.00 11,193,376 4,936,280 44.10
version 2a 16 5,563,904 17.78 8,419,680 2,855,776 33.92
version 2b 12 7,508,508 24.00 10,011,360 2,502,852 25.00
version 3 16 7,588,476 24.26 11,349,160 3,760,684 33.14
version 3a 16 6,406,060 20.48 11,349,160 4,943,100 43.55
version 3b 12 6,297,676 20.13 8,841,424 2,543,748 28.77
version 4 16 5,005,680 16.00 6,814,952 1,809,272 26.55
version 4a 12 4'687'400 14.98 6'340'624 1'653'224 26.07

German words:

layout node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
version 1 1,032 630,005,040 1032.00 634,888,816 4,883,776 0.77
version 1a 12 495,588,296 811.81 501,844,744 6,256,448 1.25
version 1b 16 18,303,588 29.98 28,461,208 10,157,620 35.69
version 2 16 12,209,396 20.00 22,448,216 10,238,820 45.61
version 2a 16 10,607,980 17.38 16,040,816 5,432,836 33.87
version 2b 12 14,651,268 24.00 19,535,040 4,883,772 25.00
version 3 16 18,070,716 24.17 28,197,544 10,126,828 35.91
version 3a 16 15,098,296 20.19 28,122,600 13,024,304 46.31
version 3b 12 12,267,360 20.09 17,782,664 5,515,304 31.02
version 4 16 9,767,520 16.00 13,623,080 3,855,560 28.30
version 4a 12 9'290'880 15.22 12'901'616 3'610'736 28.00

French words:

layout node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
version 1 1,032 1,338,586,560 1032.00 1,348,963,216 10,376,656 0.77
version 1a 12 894,746,784 689.82 906,803,728 12,056,944 1.33
version 1b 16 44,657,336 34.43 63,965,320 19,307,984 30.19
version 2 16 25,941,596 20.00 45,420,392 19,478,796 42.89
version 2a 16 23,480,024 18.10 35,573,008 12,092,984 33.99
version 2b 12 31,129,908 24.00 41,506,560 10,376,652 25.00
version 3 16 62,388,760 24.20 97,167,832 34,779,072 35.79
version 3a 16 52,107,676 20.21 96,885,336 44,777,660 46.22
version 3b 12 26,152,132 20.16 35,824,896 9,672,764 27.00
version 4 16 20,753,280 16.00 27,680,536 6,927,256 25.03
version 4a 12 19'209'800 14.81 25'460'408 6'250'608 25.55

Polish words:

layout node size [bytes] trie size [bytes] bytes/node allocated [bytes] wasted [bytes] wasted [%]
version 1 n/a n/a n/a n/a n/a n/a
version 1a n/a n/a n/a n/a n/a n/a
version 1b 16 312,236,524 52.62 393,812,632 81,576,108 20.71
version 2 16 118,673,156 20.00 201,448,152 82,774,996 41.09
version 2a 16 109,473,048 18.45 164,644,264 55,171,216 33.51
version 2b 12 142,407,780 24.00 189,877,056 47,469,276 25.00
version 3 16 310,656,956 24.24 480,136,984 169,480,028 35.30
version 3a 16 259,933,364 20.28 478,677,552 218,744,188 45.70
version 3b 12 120,114,864 20.24 158,666,648 38,551,784 24.30
version 4 16 94,938,528 16.00 122,940,568 28,002,040 22.78
version 4a 12 86'460'392 14.57 111'249'824 24'789'432 22.28

Speed

Procedure tests if word is present in trie:

bool exists(Trie* trie, char* string) {
        char* c = string;
        TrieNode* node = trie->root;
        while (*c and node)
                node = NEXT(node, *c++);

        return (node && node->end_of_word)
}

All representations has been tested using similar procedure, and there are two conclusion:

Memory allocators

Experiments clearly shows that internal memory fragmentation is significant, at least when GNU libc is used. The answer to this problem are specialized allocators.

All representations require allocate (or free) single nodes, thus memory pools could be used. For example GNU libc provides obstacks, but such stack-based memory pool is quite simple to write.

Representations using dynamic arrays require more complex memory pool, thus I wrote simple bitmap-based allocator. There is a linked list of arenas:

typedef struct MemPool {
        MemPool*        next_arena;
        size_t          free_count;
        uint8_t         bitmap[MAX_SIZE/8];
} MemPool;

The size of allocated array isn't stored in arena structure, because it's already saved in nodes. Thanks to that memory required by allocator is much smaller, but API is slightly different and isn't safe:

void* mempool_alloc(MemPool** pool, size_t n);
void  mempool_free(MemPool* pool, size_t n);

Experiments

Two representations were tested with the English dictionary:

  • linked list — separate stack-based memory pools for nodes and for list elements (both served 1024 elements per arena);
  • dynamic array — stack-based memory pool for nodes (1024 elements per arena) and bitmap-based allocator for arrays elements (256 elements per arena).

Linked list

  • memory pool for nodes:
    • arenas: 306
    • memory allocated for user: 3'760'128 bytes (306 * 1024 * 12)
    • memory used: 3'754'260 bytes
    • memory consumed by allocator structures: 4'896 bytes (306 * 16)
  • memory pool for list elements:
    • arenas: 306
    • memory allocated for user: 3'760'128 bytes (306 * 1024 * 12)
    • memory used: 3'754'248 bytes
    • memory consumed by allocator structures: 4'896 bytes (306 * 16)
  • theoretical size of tree: 7'508'508 bytes
  • malloc used bytes: 7'534'960 bytes
  • wasted: 36'244 bytes, 0.48% (26'452 - malloc, 2 * 4'896 - pools)

Dynamic array

  • memory pool for nodes:
    • arenas: 306
    • memory allocated for user: 5'013'504 bytes (305 * 1024 * 16)
    • memory used by user: 5'005'680 bytes
    • memory consumed by allocator structures: 4'896 bytes (306 * 16)
  • memory pool for arrays:
    • arenas: 1224
    • memory allocated for user: 1'253'376 bytes (1224 * 256 * 4)
    • memory used by user: 1'251'416 bytes (312854 * 4)
    • memory consumed by allocator structures: 48'960 bytes (1224 * 40)
  • theoretical size of tree: 6'257'096 bytes
  • malloc used bytes: 6'332'992 bytes
  • wasted: 129'752 bytes, 2.05% (75'896 - malloc, 4'896 and 48'960 - pools)

Updates

2011-03-31
new representation [v4a], info about speed
2011-03-30
added 2 new representations [1a, 1b], reorganization of text
2011-03-28
added experiments results for memory pools