You don’t have to go far in certain programming circles before you find the classic “tabs versus spaces” argument. Some people like to use tabs to provide indenting in their program source code; others live like animals and use spaces. You can see which side of the fence I live on.
I’m joking! I might use tabs to indent my code when I’m working in the editor, then convert those tabs to spaces when I need to share my work with someone else. The indent program can reformat C source files, including changing tabs to spaces.
But if you need to work with other files that aren’t source code, you need another tool. The expand command is a standard Unix program that should be available on every Linux system. This command converts tabs to spaces. For example, I often use expand if I need to display the contents of a file in an article or book chapter; those tabs can “get in the way” in a publishing system.
Writing your own
If you’ve ever wondered how the expand program works, it’s not a very difficult program to write on your own. Writing your own version also means you can create a similar program for other operating systems that lack an expand command, such as DOS.
Let’s call our program untab because it will convert tabs to spaces. The program only needs to track a few things: how “wide” is a tab, what column it’s printing to, and what character to print. In the simplest case, we can use a constant value for the tab width, as TABSIZE:
#define TABSIZE 8
The other values can be tracked as variables in a program: c for the character to print, and col for the column it’s printing to.
short col = TABSIZE;
int c;
We can get away with a short variable type for the column, because it will only track values up to 8. However, the c variable needs to be an int because that’s how C reads characters one at a time from a file.
The program can read a file one character at a time. As it prints each character, the program needs to track what column it’s in. This is just a matter of incrementing a counter, until the program encounters a tab. For a tab, it needs to expand the tab into spaces:
while ((c = getchar()) != EOF) {
if (c == '\t') {
while (col > 0) {
putchar(' ');
col--;
}
}
else {
putchar(c);
col--;
}
if ((c == '\n') || (col == 0)) {
col = TABSIZE;
}
}
Actually, the program doesn’t need to track the actual column that it’s printing to. Instead, it can simply track how many spaces until the next tab stop, and count down to the next tab stop. That’s why every time the program prints a character, it subtracts one from the col variable.
If the program finds the end of a line, or when the program reaches the next tab stop, it resets the col counter to the tab size, ready to count down to the next tab stop.
Putting it all together
Let’s put everything together into one source file, so we can compile our own program that expands tabs to spaces. Save this as untab.c on your system:
#include <stdio.h>
#define TABSIZE 8
int main()
{
short col = TABSIZE;
int c;
while ((c = getchar()) != EOF) {
if (c == '\t') {
while (col > 0) {
putchar(' ');
col--;
}
}
else {
putchar(c);
col--;
}
if ((c == '\n') || (col == 0)) {
col = TABSIZE;
}
}
return 0;
}
Compile it using your favorite C compiler, such as GCC on Linux. This is a very simple program, so you don’t need a special compiler or library to make it work:
$ gcc -o untab untab.c
Now we need to test the program. It’s difficult to show a test file that has tabs in it in an article like this, so let’s work around that limitation by writing a file that uses a different character that we can turn into a tab later using the tr command. In this case, let’s use # as a “tab” placeholder:
123456789012345678901234567890
#|tab#|tab#|tab
,,,,,,,,|space,,|space,,|space
a#|tab##|tab
b,,,,,,,|space,,,,,,,,,,|space
a##|tab#|tab
b,,,,,,,,,,,,,,,|space,,|space
Save this file as tab.txt. I’ve started the test file with a list of numbers so you can see the first 30 columns. For each line with a tab, I’ve also written another line that has the correct number of extra spaces (typed as commas) to reach the next tab stop.
If you convert the # placeholders to tabs, then run the output through the new untab program, everything should line up:
$ tr '#' '\t' < tab.txt | ./untab
123456789012345678901234567890
|tab |tab |tab
,,,,,,,,|space,,|space,,|space
a |tab |tab
b,,,,,,,|space,,,,,,,,,,|space
a |tab |tab
b,,,,,,,,,,,,,,,|space,,|space
We can verify that this is the correct behavior by replacing the untab program with the standard expand program:
$ tr '#' '\t' < tab.txt | expand
123456789012345678901234567890
|tab |tab |tab
,,,,,,,,|space,,|space,,|space
a |tab |tab
b,,,,,,,|space,,,,,,,,,,|space
a |tab |tab
b,,,,,,,,,,,,,,,|space,,|space
Make it your own
A program like this one is a great starting point if you want to learn about programming. There’s not much going on here; the program only needs to keep track of a few values, and is quite short, but it performs a very useful function to translate text files.
If you want to learn about programming, try adapting this program to add other features. For example, you could replace TABSIZE with a variable, and use a command line value to set the tab size. That makes the program more useful, especially if you prefer to have tabs displayed as four spaces instead of eight spaces. Feel free to make this program your own.