C++ — how to read a file into a string

Author: Wojciech Muła
Added on:2019-01-07
Updated on:2019-01-17 (performance of POISX read)

Contents

Introduction

To my surprise I quite often need to read the whole contents of a file into a string. Sometimes it's easier to generate data with an external program, sometimes unittests require to read generated file, etc.

A signature of such loader function is:

std::string load_file(const std::string& path);

In C++ an official way to deal with files are streams. There are at least two methods to load data into a string:

std::string load1(const std::string& path) {
    std::ifstream file(path);
    return std::string((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
}


std::string load2(const std::string& path) {
    auto ss = std::ostringstream{};
    std::ifstream file(path);
    ss << file.rdbuf();
    return ss.str();
}

Both functions do their jobs, but reportedly are slow. While C++ still exposes good old C API, i.e. fread (libc) or read (POSIX), I compared performance of all solutions. Although the C solution using fread — which is shown below — is much longer than the C++ counterparts, its performance is significantly better than anything based on C++ streams. Performance of read is almost identical to fread, differences are negligible.

Of course, the performance boost highly depends on a machine type, hard drive, etc., but clearly the overhead of C++ streams is really huge compared to libc and POSIX calls.

Implementation using fread:

std::string load3(const std::string& path) {

    auto close_file = [](FILE* f){fclose(f);};

    auto holder = std::unique_ptr<FILE, decltype(close_file)>(fopen(path.c_str(), "rb"), close_file);
    if (!holder)
      return "";

    FILE* f = holder.get();

    // in C++17 following lines can be folded into std::filesystem::file_size invocation
    if (fseek(f, 0, SEEK_END) < 0)
      return "";

    const long size = ftell(f);
    if (size < 0)
      return "";

    if (fseek(f, 0, SEEK_SET) < 0)
        return "";

    std::string res;
    res.resize(size);

    // C++17 defines .data() which returns a non-const pointer
    fread(const_cast<char*>(res.data()), 1, size, f);

    return res;
}

Implementation using read:

std::string load4(const std::string& path) {

    int fd = open(path.c_str(), O_RDONLY);
    if (fd < 0)
        return "";

    struct stat sb;
    fstat(fd, &sb);

    std::string res;
    res.resize(sb.st_size);

    read(fd, const_cast<char*>(res.data()), sb.st_size);
    close(fd);

    return res;
}

Evaluation

Performance tests were run on three Linux machines. Different sizes of files were tested, each file was read 10 times and minumum times were noted.

Computer #1

size [MB] istreambuf_iterator stream::rdbuf LibC fread POSIX read
  time [us] time [us] speed-up time [us] speed-up time [us] speed-up
1 5301 845 6.27 207 25.61 218 24.32
2 10780 2123 5.08 854 12.62 833 12.94
4 22971 4327 5.31 2280 10.07 2206 10.41
8 47589 8520 5.59 4550 10.46 4550 10.46
16 98424 17620 5.59 9381 10.49 9336 10.54
32 202870 52075 3.90 18755 10.82 18756 10.82

Computer #2

size [MB] istreambuf_iterator stream::rdbuf LibC fread POSIX read
  time [us] time [us] speed-up time [us] speed-up time [us] speed-up
1 3817 438 8.71 171 22.32 163 23.42
2 7214 874 8.25 362 19.93 358 20.15
4 14586 2156 6.77 813 17.94 803 18.16
8 28785 4746 6.07 2034 14.15 2028 14.19
16 59732 9553 6.25 4213 14.18 4337 13.77
32 114713 31419 3.65 8028 14.29 8017 14.31

Computer #3

size [MB] istreambuf_iterator stream::rdbuf LibC fread POSIX read
  time [us] time [us] speed-up time [us] speed-up time [us] speed-up
1 2544 184 13.83 76 33.47 75 33.92
2 4801 448 10.72 151 31.79 149 32.22
4 9688 1103 8.78 372 26.04 365 26.54
8 19749 2484 7.95 1088 18.15 1087 18.17
16 39414 5149 7.65 2524 15.62 2526 15.60
32 78949 19692 4.01 5059 15.61 5051 15.63

Source code

Test programs are available on github.