Read and write data in C

The first C function that most people learn is the puts function to print a simple string. That’s a great function to print information to the user, but if you want to do more than that, you’ll need to explore other functions.

Let’s learn about how to read and write data in C by writing a simple implementation of a common Linux command. The cp command will copy one file to another, like this simple example:

$ cp file.txt copy.txt

You can write your own version of the cp command by using only a few basic C functions to read and write data.

The simplest case: reading one character at a time

A trivial way to read input and print output is by using the fgetc and fputc functions, to read and write data one character at a time. The usage is defined in stdio.h like this:

int fgetc(FILE *stream);
int fputc(int c, FILE *stream);

Copying one file to another then becomes a matter of opening the source and destination files, then reading one character at a time from the first file, then writing that character to the second file. The fgetc function returns either the single character read from the input file or the end of file (EOF) marker when the file is done. Once you’ve read EOF, you’ve finished copying and you can close both files.

At the core of this program is a do loop that reads data one character at a time, then prints it one at a time to the output:

    do {
        ch = fgetc(infile);
        if (ch != EOF) {
            fputc(ch, outfile);
        }
    } while (ch != EOF);

The full implementation needs to open the source file and output file in binary mode, so the “new line” characters don’t get translated, such as when working on other systems that might have a different “new line” character:

#include <stdio.h>

int main(int argc, char **argv)
{
    FILE *infile;
    FILE *outfile;
    int ch;

    /* parse the command line */

    /* usage: cp infile outfile */

    if (argc != 3) {
        fprintf(stderr, "Incorrect usage\n");
        fprintf(stderr, "Usage: cp infile outfile\n");
        return 1;
    }

    /* open the input file */

    infile = fopen(argv[1], "rb");
    if (infile == NULL) {
        fprintf(stderr, "Cannot open file for reading: %s\n", argv[1]);
        return 2;
    }

    /* open the output file */

    outfile = fopen(argv[2], "wb");
    if (outfile == NULL) {
        fprintf(stderr, "Cannot open file for writing: %s\n", argv[2]);
        fclose(infile);
        return 3;
    }

    /* copy one file to the other */

    /* use fgetc and fputc */

    do {
        ch = fgetc(infile);
        if (ch != EOF) {
            fputc(ch, outfile);
        }
    } while (ch != EOF);

    /* done */

    fclose(infile);
    fclose(outfile);

    return 0;
}

Let’s save this program as 1cp.c and compile it using the GNU C compiler:

$ gcc -Wall -o 1cp 1cp.c

Programming your own cp command by reading and writing data one character at a time does the job, but it’s not very fast. You might not notice when copying “everyday” files like documents and text files, but you’ll really notice the difference when copying large files. That’s because working on one character at a time requires significant overhead.

Here’s an example of copying the program’s source code and comparing it with diff. We don’t see any output from diff, which means the two files are the same, as we would expect:

$ ./1cp 1cp.c /tmp/1cp.c

$ diff 1cp.c /tmp/1cp.c

The better way: reading and writing in blocks

A better way to write this cp command is by reading a chunk of the input into memory (called a buffer), then writing that collection of data to the second file. This is much faster because the program can read more of the data at one time, which requires fewer “reads” from the file.

You can read a file into a variable by using the fread function. This function takes several arguments: the array or memory buffer to read data into, the size of the smallest thing you want to read, how many of those things you want to read, and the file to read from:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

The different options provide quite a bit of flexibility for more advanced file input and output, such as reading and writing files with a certain data structure. But in the simple case of reading data from one file and writing data to another file, you can use a buffer that is just an array of characters.

You can write the buffer to another file using the fwrite function. This uses a similar set of options to the fread function: the array or memory buffer to read data from, the size of the smallest thing you need to write, how many of those things you need to write, and the file to write to.

size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

In the case where the program reads a file into a buffer, then writes that buffer to another file, the array can be of a fixed size. For example, you can use a char array that is 200 characters long:

    char buffer[200];

With that assumption, we need to change the loop in the cp program to read data from a file into a buffer then write that buffer to another file:

    while (!feof(infile)) {
        buffer_length = fread(buffer, sizeof(char), 200, infile);
        fwrite(buffer, sizeof(char), buffer_length, outfile);
    }

Here’s the full source code to the updated cp program, which now uses a buffer to read and write data:

#include <stdio.h>

int main(int argc, char **argv)
{
    FILE *infile;
    FILE *outfile;
    char buffer[200];
    size_t buffer_length;

    /* parse the command line */

    /* usage: cp infile outfile */

    if (argc != 3) {
        fprintf(stderr, "Incorrect usage\n");
        fprintf(stderr, "Usage: cp infile outfile\n");
        return 1;
    }

    /* open the input file */

    infile = fopen(argv[1], "r");
    if (infile == NULL) {
        fprintf(stderr, "Cannot open file for reading: %s\n", argv[1]);
        return 2;
    }

    /* open the output file */

    outfile = fopen(argv[2], "w");
    if (outfile == NULL) {
        fprintf(stderr, "Cannot open file for writing: %s\n", argv[2]);
        fclose(infile);
        return 3;
    }

    /* copy one file to the other */

    /* use fread and fwrite */

    while (!feof(infile)) {
        buffer_length = fread(buffer, sizeof(char), 200, infile);
        fwrite(buffer, sizeof(char), buffer_length, outfile);
    }

    /* done */

    fclose(infile);
    fclose(outfile);

    return 0;
}

Let’s save this source code as cpbuf.c and compile it using the GNU C compiler:

$ gcc -Wall -o cpbuf cpbuf.c

As before, we can use this new program to copy its source code and compare it with the diff program to verify that the original and the copy are the same:

$ ./cpbuf cpbuf.c /tmp/cpbuf.c

$ diff cpbuf.c /tmp/cpbuf.c

Yes, it really is faster

Reading and writing data using buffers is the better way to write this version of the cp program. Because it reads chunks of a file into memory at once, the program doesn’t need to read data as often. You might not notice a difference in using either method on smaller files, but you’ll really see the difference if you need to copy something that’s much larger or when copying data on slower media like over a network connection.

I ran a comparison using the Linux time command, which runs another program, and tells you how long that program took to complete. For my test, I wanted to see the difference in time, so I copied a big ISO file I had on my system.

I first copied the image file using the standard Linux cp command to see how long that takes. By running the Linux cp command first, I also eliminated the possibility that Linux’s built-in file-cache system wouldn’t give my program a false performance boost. The test with Linux cp took much less than one second to run:

$ ls -sh install.iso 
2.7G install.iso

$ time cp install.iso copy.iso
real    0m0.008s
user    0m0.002s
sys 0m0.006s

Copying the same file using my own version of the cp command took significantly longer. But reading data from an input into a buffer and then writing that buffer to an output file is much faster:

$ time ./1cp install.iso copy1.iso
real    0m21.974s
user    0m18.120s
sys 0m3.651s

$ time ./cpbuf install.iso copy2.iso
real    0m4.079s
user    0m1.104s
sys 0m2.895s

And in all cases, the copies are exactly the same as the original. Using a checksum program like md5sum or sha256sum verifies that each file is the same:

$ md5sum *.iso
96b917d39c83ae579994752d2051091a  copy1.iso
96b917d39c83ae579994752d2051091a  copy2.iso
96b917d39c83ae579994752d2051091a  copy.iso
96b917d39c83ae579994752d2051091a  install.iso

My demonstration cp program used a buffer that was 200 characters. I’m sure the program would run much faster if I read more of the file into memory at once. But for this comparison, you can already see the huge difference in performance, even with a small 200-character buffer.


This article is adapted from Learn how file input and output works in C by Jim Hall, and is republished with the author’s permission.

Leave a Reply