# Benefits from the obsession

Author: Wojciech Muła 2015-12-13

Everything has started few years ago when I found John Regher's blog. If you don't know the blog I highly recommend it. Among other things (I like the photos!) the author studies bugs in compilers, undefined behaviours and similar things. The word "overflow" appears quite often in his posts due to the great number of errors related to improper use of the integer arithmetic. Well, I don't know when the obsession has exactly started, but recently I realized that I am alert of all integer operations in my programs. If a variable has type int and later it is incremented then I'm pretty sure that the application will eventually blow up. When I can't prevent it, I left comments like: "XXX: possible overflow". This post shows three cases when I haven't left a comment, but fixed the issue.

# Case 1

On the other day I saw such innocent code:

string decode_base64(string& base64) {

// some validation

string result;
const size_t n = 3*base64.size() / 4;
result.resize(n);

// decoding stuff

return result;
}

The size of result is three fouth of the base64's size, so it is obviously smaller than the maximum value of size_t which is used to store the string size. However, the overflow can occur during multiplying by 3. On 64-bit machines the size of base64 have to be really huge to trigger the error — it is 1.23e+19. But on 32-bit machines it's "merely" 28 GB (try to imagine base64-encoded movie sent via e-mail...) Despite the CPU architecture, the problem still exists. And the solution is not very complicated.

Expression 3/4 * x is equivalent to x - 1/4 * x. Dividing x by 4 never cause an overflow, but since we operate on integers these two expressions are not equal. The latter expression have to be corrected (rounded up) with following conditional expression (x % 4 != 0) ? 1 : 0. Thus the final expression is:

const size_t x = base64.size();
const size_t n = x - x/4 + ((x % 4 != 0) ? 1 : 0);

The expression requires: 2 additions, 1 right shift, 1 comparison and 1 condition expression. And it's perfectly safe.

# Case 2

Later I spotted a similar mistake in a procedure which encodes data to base64. The expression calculating the size of an output buffer:

string base64_encode(const string& input) {

const size_t x = input.size();
const size_t n = 4*x / 3;

string result;
result.resize(n);

// ...

return result;
}

Here the overflow is clearly visible, because value 4/3 * x could be greater than the maximum value of an integer. Again, to trigger the error on 64-bit CPUs the input size have to be greater than 4.61e+18. But on 32-bit CPUs the overflow will occur for 1GB inputs. OK, so how to detect the overflow in this case? After rewriting expression to x + x/3 we can easily detect overflow during addition:

size_t add(size_t x, size_t y) {
if (std::limits<size_t>::max() - x < y) {
throw "overflow";
}

return x + y;
}

string base64_encode(const string& input) {

const size_t x = input.size();
const size_t n = add(x, x/3);

// ...
}

In this case the only additional cost is a single comparison before the addition. And, what is more important, the procedure detects the overflows, while the original version silently ignored them.

# Case 3

The last example: joining a list of strings. The naive implementation looks like this:

string join(const string& sep, const vector<string>& items) {

if (items.empty()) {
return "";
}

string result;

result = items[0];
for (size_t i=1; i < items.size(); i++) {
result.append(sep)
result.append(items[i]);
}

return result;
}

In my version, before actual appending the items to the results, there is an extra loop which calculates the size of the result string, like this:

string join(const string& sep, const vector<string>& items) {

// ...

size_t size = items[0].size();
for (size_t i=1; i < items.size(); i++) {