Author: | Wojciech Muła |
---|---|
Added on: | 2019-01-07 |
Updated on: | 2019-01-17 (performance of POISX read) |
To my surprise I quite often need to read the whole contents of a file into a string. Sometimes it's easier to generate data with an external program, sometimes unittests require to read generated file, etc.
A signature of such loader function is:
std::string load_file(const std::string& path);
In C++ an offical way to deal with files are streams. There are at least two methods to load data into a string:
std::string load1(const std::string& path) { std::ifstream file(path); return std::string((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>()); } std::string load2(const std::string& path) { auto ss = std::ostringstream{}; std::ifstream file(path); ss << file.rdbuf(); return ss.str(); }
Both functions do their jobs, but reportedly are slow. While C++ still exposes good old C API, i.e. fread (libc) or read (POSIX), I compared performance of all solutions. Although the C solution using fread — which is shown below — is much longer than the C++ counterparts, its performance is significantly better than anything based on C++ streams. Performance of read is almost identical to fread, differences are negligible.
Of course, the performance boost highly depends on a machine type, hard drive, etc., but clearly the overhead of C++ streams is really huge compared to libc and POSIX calls.
Implementation using fread:
std::string load3(const std::string& path) { auto close_file = [](FILE* f){fclose(f);}; auto holder = std::unique_ptr<FILE, decltype(close_file)>(fopen(path.c_str(), "rb"), close_file); if (!holder) return ""; FILE* f = holder.get(); // in C++17 following lines can be folded into std::filesystem::file_size invocation if (fseek(f, 0, SEEK_END) < 0) return ""; const long size = ftell(f); if (size < 0) return ""; if (fseek(f, 0, SEEK_SET) < 0) return ""; std::string res; res.resize(size); // C++17 defines .data() which returns a non-const pointer fread(const_cast<char*>(res.data()), 1, size, f); return res; }
Implementation using read:
std::string load4(const std::string& path) { int fd = open(path.c_str(), O_RDONLY); if (fd < 0) return ""; struct stat sb; fstat(fd, &sb); std::string res; res.resize(sb.st_size); read(fd, const_cast<char*>(res.data()), sb.st_size); close(fd); return res; }
Performance tests were run on three Linux machines. Different sizes of files were tested, each file was read 10 times and minumum times were noted.
size [MB] | istreambuf_iterator | stream::rdbuf | LibC fread | POSIX read | |||
---|---|---|---|---|---|---|---|
time [us] | time [us] | speed-up | time [us] | speed-up | time [us] | speed-up | |
1 | 5301 | 845 | 6.27 | 207 | 25.61 | 218 | 24.32 |
2 | 10780 | 2123 | 5.08 | 854 | 12.62 | 833 | 12.94 |
4 | 22971 | 4327 | 5.31 | 2280 | 10.07 | 2206 | 10.41 |
8 | 47589 | 8520 | 5.59 | 4550 | 10.46 | 4550 | 10.46 |
16 | 98424 | 17620 | 5.59 | 9381 | 10.49 | 9336 | 10.54 |
32 | 202870 | 52075 | 3.90 | 18755 | 10.82 | 18756 | 10.82 |
size [MB] | istreambuf_iterator | stream::rdbuf | LibC fread | POSIX read | |||
---|---|---|---|---|---|---|---|
time [us] | time [us] | speed-up | time [us] | speed-up | time [us] | speed-up | |
1 | 3817 | 438 | 8.71 | 171 | 22.32 | 163 | 23.42 |
2 | 7214 | 874 | 8.25 | 362 | 19.93 | 358 | 20.15 |
4 | 14586 | 2156 | 6.77 | 813 | 17.94 | 803 | 18.16 |
8 | 28785 | 4746 | 6.07 | 2034 | 14.15 | 2028 | 14.19 |
16 | 59732 | 9553 | 6.25 | 4213 | 14.18 | 4337 | 13.77 |
32 | 114713 | 31419 | 3.65 | 8028 | 14.29 | 8017 | 14.31 |
size [MB] | istreambuf_iterator | stream::rdbuf | LibC fread | POSIX read | |||
---|---|---|---|---|---|---|---|
time [us] | time [us] | speed-up | time [us] | speed-up | time [us] | speed-up | |
1 | 2544 | 184 | 13.83 | 76 | 33.47 | 75 | 33.92 |
2 | 4801 | 448 | 10.72 | 151 | 31.79 | 149 | 32.22 |
4 | 9688 | 1103 | 8.78 | 372 | 26.04 | 365 | 26.54 |
8 | 19749 | 2484 | 7.95 | 1088 | 18.15 | 1087 | 18.17 |
16 | 39414 | 5149 | 7.65 | 2524 | 15.62 | 2526 | 15.60 |
32 | 78949 | 19692 | 4.01 | 5059 | 15.61 | 5051 | 15.63 |
Test programs are avilable on github.