Author: | Wojciech Muła |
---|---|
Added on: | 2008-12-03 |
man locatedb says: "Databases can not be concatenated together, even if the first (dummy) entry is trimmed from all but the first database. This is because the offset-differential count in the first entry of the second and following databases will be wrong".
It's true if we follow man authors — but concatenation is possible without reencoding any database.
For details about the compression scheme algorithm please refer to Wikipedia, the file format is described in man locatedb. In short: compression is based on common prefix elimination in a sequence of strings — when a string share prefix with the previous string, we store pair (length of prefix, rest of string). For example if previous string is "aaabbb" and current is "aaabcd", then output is (4, "cd"), where 4 is length of common prefix: "aaab". Locate files also store differences between prefixes lengths; for example (4, "..."), (5, "..."), (2, "...") is encoded as (4, "..."), (5-4=1, "..."), (2-5=-3, "...") — this is the reason why we can't simply join database files.
However joining locate files isn't very complicated and, as I previously stated, do not require reencoding databases. We have to set diff value for the first entry of an appended file to negative value of the length of common prefix for the last entry of first file.
For example when the first file contains three entries (0, "..."), (10, "..."), (-2, "..."), then last length is 0+10-2 = 8. The second file contains (0, "..."), (5, "..."). After join: (0, "..."), (10, "..."), (-2, "..."), (-8, "..."), (5, "...").
Some time ago I wrote python utility/library, and now extended it to perform this task. Implementation details:
I've tested joined database with native Linux locate (under Cygwin) and didn't notice any problems.