Tue 16 Sep 2025
Reading time: (~ mins)
This should be an easy question to answer, right? It's a fun little adventure on my Apple M3 Pro that I fell into while watching some videos about the 1 billion row challenge.
A quick google of this provides conflicting answers. Lots of people saying read is faster than fread or that you should just use mmap. Just as many people saying memory mapping is not great due to page faults. Even a source I consider very reputable provides values that do not match up to my experience. Why is such a simple question so hard to get an answer for?
Nuuk;12.4 Yellowknife;15.4 Vladivostok;14.3 Indianapolis;7.8 Belize City;19.2 Kunming;14.4 ...
#include <stdio.h> #include <mach/mach_time.h> #define TIMER_SETUP mach_timebase_info_data_t _timebase; \ mach_timebase_info(&_timebase); \ uint64_t _timer_start; \ uint64_t _timer_end; \ char _done #define TIMER for(_done = 0, _timer_start = mach_absolute_time(); !_done; _timer_end = mach_absolute_time(), _done = 1) #define TIMER_NSEC ((_timer_end - _timer_start) * _timebase.numer / _timebase.denom) #define TIMER_SEC (TIMER_NSEC / 1e9) #define GB_SEC(bytes) ((bytes / 1e9) / TIMER_SEC) typedef struct { uint8_t *str; int64_t len; } String; String function() { String result; // // implementation we are testing goes here // return result; } int main(void) { TIMER_SETUP; String file_data; TIMER { file_data = function(); } printf("Read %lld bytes in %.3f seconds (%.2f GB/s)\n", file_data.len, TIMER_SEC, GB_SEC(file_data.len)); uint32_t sum = 0; TIMER { for (int64_t i = 0; i < file_data.len; i++) { if (file_data.str[i] == '\n') ++sum; } } printf("Touch all bytes: %.3f seconds (%.2f GB/s), sum=%d\n", TIMER_SEC, GB_SEC(file_data.len), sum); return 0; }
sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out
to allow compiler optimizations and ensure that before each run we force disk cache to be purged (flushed and emptied). Final score will be the highest of ten runs.#include <stdlib.h> String function() { String result; FILE* file; file = fopen("measurements.txt", "rb"); fseek(file, 0, SEEK_END); result.len = ftell(file); fseek(file, 0, SEEK_SET); result.str = malloc(result.len + 1); fread(result.str, result.len, 1, file); result.str[result.len] = 0; return result; }Result:
% sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out Read 13795429575 bytes in 2.113 seconds (6.53 GB/s) Touch all bytes: 0.853 seconds (16.18 GB/s), sum=1000000000
Using setvbuf with large, small or no buffer has no tangible effect on result despite advice online stating otherwise.
#include <fcntl.h> #include <stdlib.h> #include <unistd.h> String function() { String result; int fd = open("measurements.txt", O_RDONLY); result.len = lseek(fd, 0, SEEK_END); lseek(fd, 0, SEEK_SET); result.str = malloc(result.len + 1); uint64_t bytes = 0; while(bytes < result.len) { bytes += read(fd, result.str + bytes, INT_MAX); // largest size we can read } result.str[result.len] = 0; return result; }Result:
% sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out Read 13795429575 bytes in 2.107 seconds (6.55 GB/s) Touch all bytes: 0.860 seconds (16.03 GB/s), sum=1000000000
Using fcntl(fd, F_NOCACHE, 1) and/or fcntl(fd, F_RDADVISE, &ra) makes it slower despite advice online stating otherwise.
#include <fcntl.h> #include <stdlib.h> #include <sys/mman.h> #include <unistd.h> String function() { String result; int fd = open("measurements.txt", O_RDONLY); result.len = lseek(fd, 0, SEEK_END); lseek(fd, 0, SEEK_SET); result.str = mmap(NULL, result.len, PROT_READ, MAP_PRIVATE, fd, 0); return result; }Result:
% sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out Read 13795429575 bytes in 0.000 seconds (759409.31 GB/s) Touch all bytes: 22.475 seconds (0.61 GB/s), sum=1000000000
madvise flags have no significant impact. mmap is not "reading" anything and so it trips over itself with tons of page faults(~842008 faults * 16K pages == file size). Apple specifically states that reading a file via mapping is bad. Maybe it's better on Linux with the magic flag MAP_POPULATE I see mentioned.
% sudo purge && /usr/bin/time wc -l measurements.txt 1000000000 measurements.txt 11.41 real 10.36 user 1.04 sys
% sudo purge && /usr/bin/time ruby -e 'x = 0; File.foreach("measurements.txt") {|i| x+=1}; puts x' 1000000000 79.11 real 76.13 user 2.82 sys
Does not return in a reasonable time when using File#readlines.
% sudo purge && /bin/dd if=measurements.txt of=/dev/null bs=1M 13156+1 records in 13156+1 records out 13795429575 bytes transferred in 2.103454 secs (6558465065 bytes/sec)
This is a sanity check to see how fast our ssd can just raw read.
file to memory (GB/s) |
touch bytes (GB/s) |
|
---|---|---|
read(2) | ~6.5 | ~16 |
fread(3) | ~6.5 | ~16 |
mmap(2) | - | 0.61 |
wc(1) | 1.21 | |
ruby(1) | 0.17 | |
dd(1) | 6.56 | - |
Higher is better
So what do we take away from this? At least on my laptop the choice is obvious, use fread if you aren't doing anything fancy. It needs the least twiddling and does everything for you while giving you max speed.
uint64_t sum = 0; // was originally uint32_t TIMER { for (int64_t i = 0; i < file_data.len; i++) { if (file_data.str[i] == '\n') ++sum; } } printf("Touch all bytes: %.3f seconds (%.2f GB/s), sum=%d\n", TIMER_SEC, GB_SEC(file_data.len), sum);
% sudo purge && gcc -O3 -std=c99 -march=native brc.c && ./a.out
Read 13795429575 bytes in 2.114 seconds (6.52 GB/s)
Touch all bytes: 1.673 seconds (8.25 GB/s), sum=1000000000 -! was 16 GB/s before !-
A brief look at the assembly generated by the compiler does a much more complicated handling of the value despite the computer being a 64-bit machine. I don't know enough to explain why it isn't optimized away. Another reminder that the compiler is not magic!
% sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out Read 13795429575 bytes in 3.533 seconds (3.90 GB/s) -! about 1.5x slower !- Touch all bytes: 1.291 seconds (10.69 GB/s), sum=1000000000 -! also 1.5x slower !-