How fast can you read a file (C)

Tue 16 Sep 2025
Reading time: (~ mins)

This should be an easy question to answer, right? It's a fun little adventure on my Apple M3 Pro that I fell into while watching some videos about the 1 billion row challenge.

A quick google of this provides conflicting answers. Lots of people saying read is faster than fread or that you should just use mmap. Just as many people saying memory mapping is not great due to page faults. Even a source I consider very reputable provides values that do not match up to my experience. Why is such a simple question so hard to get an answer for?

Setup
fread
read
mmap
wc
ruby
dd
Results
Epilogue

Setup

I will be testing on my laptop: Apple M3 Pro 1TB SSD, 36GB RAM, macOS 15.2 (24C101).

I will be reading a ~14GB file with 1billion lines as produced by the 1brc repo that looks like:

Nuuk;12.4
Yellowknife;15.4
Vladivostok;14.3
Indianapolis;7.8
Belize City;19.2
Kunming;14.4
...

I will be using the following test harness to read the file into memory and touch every byte to count lines:

#include <stdio.h>
#include <mach/mach_time.h>
#define TIMER_SETUP mach_timebase_info_data_t _timebase; \
                    mach_timebase_info(&_timebase); \
                    uint64_t _timer_start; \
                    uint64_t _timer_end; \
                    char _done
#define TIMER for(_done = 0, _timer_start = mach_absolute_time(); !_done; _timer_end = mach_absolute_time(), _done = 1)
#define TIMER_NSEC ((_timer_end - _timer_start) * _timebase.numer / _timebase.denom)
#define TIMER_SEC (TIMER_NSEC / 1e9)
#define GB_SEC(bytes) ((bytes / 1e9) / TIMER_SEC)

typedef struct {
  uint8_t *str;
  int64_t len;
} String;

String function()
{
    String result;
    //
    // implementation we are testing goes here
    //
    return result;
}

int main(void)
{
  TIMER_SETUP;

  String file_data;
  TIMER {
    file_data = function();
  }
  printf("Read %lld bytes in %.3f seconds (%.2f GB/s)\n",
         file_data.len, TIMER_SEC, GB_SEC(file_data.len));

  uint32_t sum = 0;
  TIMER {
    for (int64_t i = 0; i < file_data.len; i++) {
      if (file_data.str[i] == '\n') ++sum;
    }
  }
  printf("Touch all bytes: %.3f seconds (%.2f GB/s), sum=%d\n",
         TIMER_SEC, GB_SEC(file_data.len), sum);

  return 0;
}

I will run each test with this command: sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out to allow compiler optimizations and ensure that before each run we force disk cache to be purged (flushed and emptied). Final score will be the highest of ten runs.

Using fread

#include <stdlib.h>
String function()
{
  String result;

  FILE* file;
  file = fopen("measurements.txt", "rb");
  fseek(file, 0, SEEK_END);
  result.len = ftell(file);
  fseek(file, 0, SEEK_SET);
  result.str = malloc(result.len + 1);
  fread(result.str, result.len, 1, file);
  result.str[result.len] = 0;

  return result;
}

% sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out
Read 13795429575 bytes in 2.113 seconds (6.53 GB/s)
Touch all bytes: 0.853 seconds (16.18 GB/s), sum=1000000000

Using setvbuf with large, small or no buffer has no tangible effect on result despite advice online stating otherwise.

Using read

#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
String function()
{
  String result;

  int fd = open("measurements.txt", O_RDONLY);
  result.len = lseek(fd, 0, SEEK_END);
  lseek(fd, 0, SEEK_SET);
  result.str = malloc(result.len + 1);
  uint64_t bytes = 0;
  while(bytes < result.len) {
    bytes += read(fd, result.str + bytes, INT_MAX); // largest size we can read
  }
  result.str[result.len] = 0;

  return result;
}

% sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out
Read 13795429575 bytes in 2.107 seconds (6.55 GB/s)
Touch all bytes: 0.860 seconds (16.03 GB/s), sum=1000000000

Using fcntl(fd, F_NOCACHE, 1) and/or fcntl(fd, F_RDADVISE, &ra) makes it slower despite advice online stating otherwise.

Using mmap

#include <fcntl.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
String function()
{
  String result;

  int fd = open("measurements.txt", O_RDONLY);
  result.len = lseek(fd, 0, SEEK_END);
  lseek(fd, 0, SEEK_SET);
  result.str = mmap(NULL, result.len, PROT_READ, MAP_PRIVATE, fd, 0);

  return result;
}

% sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out
Read 13795429575 bytes in 0.000 seconds (759409.31 GB/s)
Touch all bytes: 22.475 seconds (0.61 GB/s), sum=1000000000

madvise flags have no significant impact. mmap is not "reading" anything and so it trips over itself with tons of page faults(~842008 faults * 16K pages == file size). Apple specifically states that reading a file via mapping is bad. Maybe it's better on Linux with the magic flag MAP_POPULATE I see mentioned.

Using command-line wc

% sudo purge && /usr/bin/time wc -l measurements.txt
 1000000000 measurements.txt
       11.41 real        10.36 user         1.04 sys

Using ruby

% sudo purge && /usr/bin/time ruby -e 'x = 0; File.foreach("measurements.txt") {|i| x+=1}; puts x'
1000000000
       79.11 real        76.13 user         2.82 sys

Does not return in a reasonable time when using File#readlines.

Using command-line dd

% sudo purge && /bin/dd if=measurements.txt of=/dev/null bs=1M
13156+1 records in
13156+1 records out
13795429575 bytes transferred in 2.103454 secs (6558465065 bytes/sec)

This is a sanity check to see how fast our ssd can just raw read.

Results

	file to memory (GB/s)	touch bytes (GB/s)
read(2)	~6.5	~16
fread(3)	~6.5	~16
mmap(2)	-	0.61
wc(1)	1.21
ruby(1)	0.17
dd(1)	6.56	-

Higher is better

So what do we take away from this? At least on my laptop the choice is obvious, use fread if you aren't doing anything fancy. It needs the least twiddling and does everything for you while giving you max speed.

Epilogue

A single change can make the "touch bytes" part of our test 2x slower!

  uint64_t sum = 0; // was originally uint32_t
  TIMER {
    for (int64_t i = 0; i < file_data.len; i++) {
      if (file_data.str[i] == '\n') ++sum;
    }
  }
  printf("Touch all bytes: %.3f seconds (%.2f GB/s), sum=%d\n",
         TIMER_SEC, GB_SEC(file_data.len), sum);

% sudo purge && gcc -O3 -std=c99 -march=native brc.c && ./a.out
Read 13795429575 bytes in 2.114 seconds (6.52 GB/s)
Touch all bytes: 1.673 seconds (8.25 GB/s), sum=1000000000 -! was 16 GB/s before !-

A brief look at the assembly generated by the compiler does a much more complicated handling of the value despite the computer being a 64-bit machine. I don't know enough to explain why it isn't optimized away. Another reminder that the compiler is not magic!

Running off battery slows everything down!

% sudo purge && gcc -O3 -std=c99 -march=native test.c && ./a.out
Read 13795429575 bytes in 3.533 seconds (3.90 GB/s) -! about 1.5x slower !-
Touch all bytes: 1.291 seconds (10.69 GB/s), sum=1000000000 -! also 1.5x slower !-