Skip to content

Both.org

News, Opinion, Tutorials, and Community for Linux Users and SysAdmins

Primary Menu
  • About Us
  • Computers 101
    • Hardware 101
    • Operating Systems 101
  • End of 10 Events
    • Wake Forest, NC, — 2025-09-20
  • Linux
    • Why I use Linux
    • The real reason we use Linux
  • My Linux Books
    • systemd for Linux SysAdmins
    • Using and Administering Linux – Zero to SysAdmin: 2nd Edition
    • The Linux Philosophy for SysAdmins
    • Linux for Small Business Owners
    • Errata
      • Errata for The Linux Philosophy for SysAdmins
      • Errata for Using and Administering Linux — 1st Edition
      • Errata for Using and Administering Linux — 2nd Edition
  • Open Source Resources
    • What is Open Source?
    • What is Linux?
    • What is Open Source Software?
    • The Open Source Way
  • Write for us
    • Submission and Style guide
    • Advertising statement
  • Downloads
  • Home
  • Regular Expressions #2: An example
  • Command Line
  • Linux
  • System Administration

Regular Expressions #2: An example

In the previous article, Regular Expressions #1: Introduction, I covered what they are and why they’re useful. The example in this article highlights the power and flexibility of the Linux command line, especially regular expressions, for their ability to automate common tasks.
David Both May 6, 2024 10 minutes read
Starting_Line

Last Updated on February 3, 2025 by David Both

“BXP135671” by tableatny is licensed under CC BY 2.0

Dive right into a regular expression example in this second of four articles about regular expressions.

In the previous article, Regular Expressions #1: Introduction, I covered what they are and why they’re useful. Now, we need a real-world example to use as a learning tool. Here is one I encountered several years ago.

This example highlights the power and flexibility of the Linux command line, especially regular expressions, for their ability to automate common tasks. I have administered several listservs during my career and still do. People send me email addresses to add to those lists. In more than one case, I have received a list of names and email addresses in a Microsoft Word format to be added to one of the lists.

The troublesome list

The list itself was not very long, but it was inconsistent in its formatting. An abbreviated version of that list, with name and domain changes, is shown in Figure 1.

Team 1	Apr 3 
Leader  Virginia Jones  vjones88@example.com	
Frank Brown  FBrown398@example.com	
Cindy Williams  cinwill@example.com	
Marge smith   msmith21@example.com 
 [Fred Mack]   edd@example.com	

Team 2	March 14
leader  Alice Wonder  Wonder1@example.com	
John broth  bros34@example.com	
Ray Clarkson  Ray.Clarks@example.com	
Kim West    kimwest@example.com	
[JoAnne Blank]  jblank@example.com	

Team 3	Apr 1 
Leader  Steve Jones  sjones23876@example.com	
Bullwinkle Moose bmoose@example.com	
Rocket Squirrel RJSquirrel@example.com	
Julie Lisbon  julielisbon234@example.com	
[Mary Lastware) mary@example.com

Figure 1: A sample taken from the problematic list.

It was obvious that I needed to manipulate the data in order to mangle it into an acceptable format for inputting to the list. It is possible to use a text editor or a word processor such as LibreOffice Writer to make the necessary changes to this small file. However, people send me files like this quite often, so it becomes a chore to use a word processor to make these changes. Despite the fact that Writer has a good search and replace function, each character or string must be replaced singly, and there is no way to save previous searches.

LibreOffice Write does have a powerful macro feature, but I am not familiar with either of its two languages: LibreOffice Basic or Python. I do know Bash shell programming. I did what comes naturally to a sysadmin—I automated the task. The first thing I did was to copy the address data to a text file so I could work on it using command-line tools. After a few minutes of work, I developed the Bash command-line program shown in the first article of this series and shown again in Figure 2.

$ cat Experiment_6-1.txt | grep -v Team | grep -v "^\s*$" | sed -e "s/[Ll]eader//" -e "s/\[//g" -e "s/\]//g" -e "s/)//g" | awk '{print $1" "$2" <"$3">"}' > addresses.txt

Figure 2: The solution to my list problem involves some interesting regular expressions.

This code produced the desired output as the file addresses.txt. I used my normal approach to writing command-line programs like this by building up the pipeline one command at a time.

Let’s break this pipeline down into its component parts to see how it works and all fits together. All of the experiments in this series should be performed as a non-privileged user.

Getting started with the sample file

First, we need to create the sample file. Create a directory named testing on your local machine, and then copy the text from Figure 1 into into a new text file named Experiment_6-1.txt, which contains the three team entries shown above.

Removing unnecessary lines with grep

The first things I see that can be done are a couple of easy ones. Since the team names and dates are on lines by themselves, we can use the following to remove those lines that have the word “Team:”

[student@studentvm1 testing]$  cat Experiment_6-1.txt | grep -v Team

I won’t reproduce the results of each stage of building this Bash program, but you should be able to see the changes in the data stream as it shows up on STDOUT, the terminal session. We won’t save it in a file until the end.

In this first step in transforming the data stream into one that is usable, we use the grep command with a simple literal pattern, Team. Literals are the most basic type of pattern we can use as a regular expression, because there is only a single possible match in the data stream being searched, and that is the string Team.

We need to discard empty lines, so we can use another grep statement to eliminate them. I find that enclosing the regular expression for the second grep command in quotes ensures that it gets interpreted properly:

[student@studentvm1 testing]$ cat Experiment_6-1.txt | grep -v Team | grep -v "^\s*$"
Leader  Virginia Jones  vjones88@example.com
Frank Brown  FBrown398@example.com
Cindy Williams  cinwill@example.com
Marge smith   msmith21@example.com 
 [Fred Mack]   edd@example.com  
leader  Alice Wonder  Wonder1@example.com
John broth  bros34@example.com  
Ray Clarkson  Ray.Clarks@example.com
Kim West    kimwest@example.com 
[JoAnne Blank]  jblank@example.com
Leader  Steve Jones  sjones23876@example.com
Bullwinkle Moose bmoose@example.com
Rocket Squirrel RJSquirrel@example.com  
Julie Lisbon  julielisbon234@example.com
[Mary Lastware) mary@example.com
[student@studentvm1 testing]$

The expression "^\s*$" illustrates the use of anchors, and using the backslash (\) as an escape character to change the meaning of a literal “s” (in this case) to a metacharacter that means any whitespace such as spaces, tabs, or other characters that are unprintable. We cannot see these characters in the file, but it does contain some of them.

The asterisk, aka splat (*), specifies that we are to match zero or more of the whitespace characters. This addition would match multiple tabs, multiple spaces, or any combination of those in an otherwise empty line.

Viewing extra whitespace with Vim

Next, I configured my Vim editor to display whitespace using visible characters. Do this by adding the line in Figure 3 to your own ~.vimrc file, or to the global /etc/vimrc configuration file.

set listchars=eol:$,nbsp:_,tab:<->,trail:~,extends:>,space:+

Figure 3: Add this line to your own ~.vimrc file, or to the global /etc/vimrc configuration file.

Then, start—or restart—Vim.

I have found a lot of bad, incomplete, and contradictory information on the internet in my searches for how to do this. The built-in Vim help has the best information, and the data line I created from that above is one that works for me.

Note: In the example below, regular spaces are shown as +; tabs are shown as <, <>, or <–>, and fill the length of the space that the tab covers. The end of line (EOL) character is shown as $.

The result, before any operation on the file, is shown in Figure 4.

Team+1<>Apr+3~$
Leader++Virginia+Jones++vjones88@example.com<-->$
Frank+Brown++FBrown398@example.com<---->$
Cindy+Williams++cinwill@example.com<--->$
Marge+smith+++msmith21@example.com~$
+[Fred+Mack]+++edd@example.com<>$
$
Team+2<>March+14$
leader++Alice+Wonder++Wonder1@example.com<----->$
John+broth++bros34@example.com<>$
Ray+Clarkson++Ray.Clarks@example.com<-->$
Kim+West++++kimwest@example.com>$
[JoAnne+Blank]++jblank@example.com<---->$
$
Team+3<>Apr+1~$
Leader++Steve+Jones++sjones23876@example.com<-->$
Bullwinkle+Moose+bmoose@example.com<--->$
Rocket+Squirrel+RJSquirrel@example.com<>$
Julie+Lisbon++julielisbon234@example.com<------>$
[Mary+Lastware)+mary@example.com$

Figure 4: Viewing “whitespace” in Vim.

Removing unnecessary characters with sed

You can see that there are a lot of whitespace characters that need to be removed from our file. We also need to get rid of the word “leader,” which appears twice and is capitalized once. Let’s get rid of “leader” first. This time, we will use sed (stream editor) to perform this task by substituting a new string—or a null string in our case—for the pattern it matches.

Adding sed -e "s/[Ll]eader//" to the pipeline does just what we want.

[student@studentvm1 testing]$ cat Experiment_6-1.txt | grep -v Team | grep -v "^\s*$" | sed -e "s/[Ll]eader//"

In this sed command, -e means that the quote-enclosed expression is a script that produces a desired result. In the expression, the s means that this is a substitution. The basic form of a substitution is s/<regex>/<replacement string>/, so /[Ll]eader/ is our search string. The set [Ll] matches L or l, so [Ll]eader matches leader or Leader. In this case, the replacement string is null because it looks like a double forward slash with no characters or whitespace between the two slashes (//).

Let’s also get rid of some of the extraneous characters like []() that will not be needed.

[student@studentvm1 testing]$ cat Experiment_6-1.txt | grep -v Team | grep -v "^\s*$" | sed -e "s/[Ll]eader//" -e "s/\[//g" -e "s/]//g" -e "s/)//g" -e "s/(//g" 

We have added four new expressions to the sed statement. Each one removes a single character. The first of these additional expressions is a bit different, because the left square brace ([) character can mark the beginning of a set. We need to escape the brace to ensure that sed interprets it correctly as a regular character and not a special one.

Tidying up with awk

We could use sed to remove the leading spaces from some of the lines, but the awk command can do that, reorder the fields if necessary, and add the <> characters around the email address:

[student@studentvm1 testing]$ cat Experiment_6-1.txt | grep -v Team | grep -v "^\s*$" | sed -e "s/[Ll]eader//" -e "s/\[//g" -e "s/]//g" -e "s/)//g" -e "s/(//g" | awk '{print $1" "$2" <"$3">"}'

The awk utility is actually a powerful programming language that can accept data streams on its STDIN. This fact makes it extremely useful in command-line programs and scripts. The awk utility works on data fields, and the default field separator is spaces—any amount of white space. The data stream we have created so far has three fields separated by whitespace (<first>, <last>, and <email>).

awk '{print $1" "$2" <"$3">"}' 

This little program takes each of the three fields ($1, $2, and $3) and extracts them without leading or trailing whitespace. It then prints them in sequence, adding a single space between each as well as the <> characters needed to enclose the email address.

Wrapping up

The last step here would be to redirect the output data stream to a file, but that is trivial, so I leave it with you to perform that step. It is not really necessary that you do so for this experiment.

I saved the Bash program in an executable file, and now I can run this program anytime I receive a new list. Some of those lists are fairly short, as is the one in this example. Others have been quite long, sometimes containing up to several hundred addresses and many lines of “stuff” that do not contain addresses to be added to the list.


Here’s a list of all the articles in this series.

  1. Regular Expressions #1: Introduction
  2. Regular Expressions #2: An example
  3. Regular Expressions #3: grep — Data flow and building blocks
  4. Regular Expressions #4: Pulling it all together

Note: This series is a slightly modified version from Chapter 25 of Volume 2 of my Linux self-study trilogy, Using and Administering Linux: Zero to SysAdmin, 2nd Edition.

Tags: Data Streams Regular Expressions

Post navigation

Previous: Open source School tools for our adult learners!
Next: Build your own DNS server on Linux

Related Stories

connections_wires_sysadmin_cable
  • Linux
  • Networking
  • Router

How to Make your Linux Box Into a Router

David Both April 29, 2026
f44-01-day-cropped
  • Fedora
  • Linux
  • Upgrades

Fedora 44 Released

David Both April 28, 2026
command_line_prompt
  • Command Line
  • Linux
  • Programming

Writing a replacement seq command

Jim Hall April 27, 2026

System upgrades this Sunday, May 3

Tools illustrationFedora 44 was released this week and I’ve upgraded all my systems except for the two that directly affect Both.org. I’ll be upgrading the hosts for my server and firewall to Fedora 44 this Sunday afternoon, May 3.

Both.org will be down for most of the afternoon for these upgrades.

Thanks for your patience.

Random Quote

Most of the good programmers do programming not because they expect to get paid or get adulation by the public, but because it is fun to program.

— Linus Torvalds

Why I’ve Never Used Windows

On February 12 I gave a presentation at the Triangle Linux Users Group (TriLUG) about why I use Linux and why I’ve never used Windows.

Here’s the link to the video: https://www.youtube.com/live/uCK_haOXPFM 

Why there’s no such thing as AI

Last October at All Things Open (ATO) I was interviewed by Jason Hibbits of We Love Open Source. It’s posted in the article “Why today’s AI isn’t intelligent (yet)“.

Technically We Write — Our Partner Site

Our partner site, Technically We Write, has published a number of articles from several contributors to Both.org. Check them out.

Technically We Write is a community of technical writers, technical editors, copyeditors, web content writers, and all other roles in technical communication.

Subscribe to Both.org

To comment on articles, you must have an account.

Send your desired user ID, first and last name, and an email address for login (this must be the same email address used to register) to subscribe@both.org with “Subscribe” as the subject line.

You’ll receive a confirmation of your subscription with your initial password as soon as we are able to process it.

Administration

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

License and AI Statements

Both.org aims to publish everything under a Creative Commons Attribution ShareAlike license. Some items may be published under a different license. You are responsible to verify permissions before reusing content from this website.

The opinions expressed are those of the individual authors, not Both.org.

You may not use this content to train AI.

 

Advertising Statement

Both.org does not sell advertising on this website.


Advertising may keep most websites running—but at Both.org, we’re committed to keeping our corner of the web ad-free. Both.org does not sell advertising on the website. Nor do we offer sponsored articles at this time. We’ll update this page if our position on sponsorships changes.

We want to be open about how the website is funded. Both.org is supported entirely by David Both and a few other dedicated individuals.

 

 

Copyright © All rights reserved. | MoreNews by AF themes.