Skip to content

Both.org

News, Opinion, Tutorials, and Community for Linux Users and SysAdmins

Primary Menu
  • About Us
  • Computers 101
    • Hardware 101
    • Operating Systems 101
  • End of 10 Events
    • Wake Forest, NC, — 2025-09-20
  • Linux
    • Why I use Linux
    • The real reason we use Linux
  • My Linux Books
    • systemd for Linux SysAdmins
    • Using and Administering Linux – Zero to SysAdmin: 2nd Edition
    • The Linux Philosophy for SysAdmins
    • Linux for Small Business Owners
    • Errata
      • Errata for The Linux Philosophy for SysAdmins
      • Errata for Using and Administering Linux — 1st Edition
      • Errata for Using and Administering Linux — 2nd Edition
  • Open Source Resources
    • What is Open Source?
    • What is Linux?
    • What is Open Source Software?
    • The Open Source Way
  • Write for us
    • Submission and Style guide
    • Advertising statement
  • Downloads
  • Home
  • Regular Expressions #1: Introduction
  • Command Line
  • Linux
  • System Administration

Regular Expressions #1: Introduction

Regular expressions don't have to invoke anxiety and fear, although they do for many of us. The function of regular expressions is to provide a highly flexible tool for matching strings of characters in a stream of data. When a match is found, the program's action can be as simple as to pass the line of data in which it's found on to STDOUT, or as copmplex as replacing that string with another.
David Both May 4, 2024 7 minutes read
fiber-81623-e1692192333302-Pixabay-Cropped

Last Updated on February 3, 2025 by David Both

Regular expressions don’t have to invoke anxiety and fear, although they do for many of us. The function of regular expressions is to provide a highly flexible tool for matching strings of characters in a stream of data. When a match is found, the program’s action can be as simple as to pass the line of data in which it’s found on to STDOUT, or as complex as replacing that string with another before sending it to STDOUT.

This article, part one of four, introduces you to the need for regular expressions and shows you how to create a simple REGEX with the grep command.

Why we need Regular Expressions

We have all used file globbing with wildcard characters like * and ? as a means to select specific files or lines of data from a data stream. These tools are powerful and I use them many times a day. Yet, there are things that cannot be done with wildcards.

Regular expressions (regexes or REs) provide us with more complex and flexible pattern matching capabilities. Just as certain characters take on special meaning when using file globbing, REs also have special characters. There are two main types of regular expressions (REs), Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs).

The first thing we need are some definitions. There are many definitions for the term regular expressions, but many are dry and uninformative. Here are mine.

Regular Expressions are strings of literal and metacharacters that can be used as patterns by various Linux utilities to match strings of ASCII plain text data in a data stream. When a match occurs, it can be used to extract or eliminate a line of data from the stream, or to modify the matched string in some way.

Basic Regular Expressions (BREs) and Extended Regular Expressions (EREs) are not significantly different in terms of functionality. (See the grep info page’s Section 3.6, “Basic vs. Extended Regular Expressions.”) The primary difference is in the syntax used and how metacharacters are specified. In basic regular expressions, the metacharacters ?, +, {, |, (, and ) lose their special meaning. Instead, it is necessary to use the backslashed versions: \?, \+, \{, \|, \(, and \). The ERE syntax is believed by many to be easier to use.

When I talk about regular expressions, in a general sense I usually mean to include both basic and extended regular expressions. If there is a differentiation to be made I will use the acronyms BRE for basic regular expressions or ERE for extended regular expressions.

Regular expressions (REs) take the concept of using metacharacters to match patterns in data streams much further than file globbing, and give us even more control over the items we select from a data stream. REs are used by various tools to parse1 a data stream to match patterns of characters in order to perform some transformation on the data.

Regular expressions have a reputation for being obscure and arcane incantations that only those with special wizardly sysadmin powers use. This single line of code in Figure 1 (that I used to transform a file that was sent to me into a usable form) would seem to confirm that.

$ cat Experiment_6-1.txt | grep -v Team | grep -v "^\s*$" | sed -e "s/[Ll]eader//" -e "s/\[//g" -e "s/\]//g" -e "s/)//g" | awk '{print $1" "$2" <"$3">"}' > addresses.txt

Figure 1: A rather complex Regular Expression like this one can seem obscure until we learn how REGEXs work.

This command pipeline appears to be an intractable sequence of meaningless gibberish to anyone without the knowledge of regex. It certainly seemed that way to me the first time I encountered something similar early in my career. As you will see, regexes are relatively simple once they are explained.

A Simple REGEX

If you use Unix or Linux on the command line, you use regular expressions whether you know it or not. One of the first tools most of us learn is the grep command.

The grep command is used to select lines that match a specified pattern from a stream of data. grep is one of the most commonly used filter utilities and can be used in some very creative and interesting ways. The grep command is one of the few that can correctly be called a filter because it does filter out all the lines of the data stream that you do not want; it leaves only the lines that you do want in the remaining data stream.

According to Seth Kenlon, reviewer for many of my books and articles, “One of the classic Unix commands, developed way back in 1974 by Ken Thompson, is the Global Regular Expression Print (grep) command. It’s so ubiquitous in computing that it’s frequently used as a verb (“grepping through a file”) and, depending on how geeky your audience, it fits nicely into real-world scenarios, too. (For example, “I’ll have to grep my memory banks to recall that information.”) In short, grep is a way to search through a file for a specific pattern of characters. If that sounds like the modern Find function available in any word processor or text editor, then you’ve already experienced grep’s effects on the computing industry.”2

We need to create a file with some random data in it. We can use a tool that generates random passwords but we first need to install it as root. I use dnf on my Fedora host.

# dnf -y install pwgen

Now as a non-root user lets generate some random data and create a file with it. I suggest doing this in the /tmp directory. You could use your home directory if you have enough space. The following command creates a stream of 5000 lines of random data that are each 75 characters long and stores them in the random.txt file.

$ pwgen 75 5000 > random.txt

Considering that there are so many passwords, it is very likely that some character strings in them are the same. Use the grep command to locate some short, randomly selected strings from the last ten passwords on the screen. I saw the words “see” and “loop” in one of those ten passwords, so my command looked like this.

$ grep see random.txt

You can try that, but you should also pick some strings of your own to search for. Short strings of 2 to 4 characters work best. I also used grep to locate all of the lines in the output from dmesg with CPU in them. You need to be root to run the dmesg command.

# dmesg | grep cpu

Do a long listing of all of the directories in your home directory with this command.

$ ls -la | grep ^d

This works because each directory has a “d” as the first character in a long listing. The carat ( ^ ) is used by grep and other tools to anchoe the text being searched to the beginning of the line.

To list all of the files that are not directories, reverse the meaning of the previous grep command with the -v option.

$ ls -la | grep -v ^d

Final Thoughts

We can only begin to touch upon all of the possibilities opened to us by regular expressions in a single article (even in a single series). There are entire books devoted exclusively to regular expressions, so we will explore the basics in this series of articles here on Both.org. By the end, you will know just enough to get started with tasks common to sysadmins. Hopefully, you’ll be hungry to learn more on your own after that.


Here’s a list of all the articles in this series.

  1. Regular Expressions #1: Introduction
  2. Regular Expressions #2: An example
  3. Regular Expressions #3: grep — Data flow and building blocks
  4. Regular Expressions #4: Pulling it all together

Note: This series is a slightly modified version from Chapter 25 of Volume 2 of my Linux self-study trilogy, Using and Administering Linux: Zero to SysAdmin, 2nd Edition.

  1. One general meaning of parse is to examine something by studying its component parts. For our purposes, we parse a data stream to locate sequences of characters that match a specified pattern. ↩︎
  2. Kenlon, Seth, a.k.a. Klaatu, Opensource.com, Practice using the Linux grep command, 18 Mar 2021 ↩︎

Tags: Data Streams grep Regular Expressions

Post navigation

Previous: How to upgrade your Fedora Linux system to the latest release with DNF system-upgrade
Next: Open source School tools for our adult learners!

Related Stories

connections_wires_sysadmin_cable
  • Linux
  • Networking
  • Router

How to Make your Linux Box Into a Router

David Both April 29, 2026
f44-01-day-cropped
  • Fedora
  • Linux
  • Upgrades

Fedora 44 Released

David Both April 28, 2026
command_line_prompt
  • Command Line
  • Linux
  • Programming

Writing a replacement seq command

Jim Hall April 27, 2026

System upgrades this Sunday, May 3

Tools illustrationFedora 44 was released this week and I’ve upgraded all my systems except for the two that directly affect Both.org. I’ll be upgrading the hosts for my server and firewall to Fedora 44 this Sunday afternoon, May 3.

Both.org will be down for most of the afternoon for these upgrades.

Thanks for your patience.

Random Quote

A carelessly planned project takes three times longer to complete than expected; a carefully planned project takes only twice as long.

— Laws of computerdom according to Golub

Why I’ve Never Used Windows

On February 12 I gave a presentation at the Triangle Linux Users Group (TriLUG) about why I use Linux and why I’ve never used Windows.

Here’s the link to the video: https://www.youtube.com/live/uCK_haOXPFM 

Why there’s no such thing as AI

Last October at All Things Open (ATO) I was interviewed by Jason Hibbits of We Love Open Source. It’s posted in the article “Why today’s AI isn’t intelligent (yet)“.

Technically We Write — Our Partner Site

Our partner site, Technically We Write, has published a number of articles from several contributors to Both.org. Check them out.

Technically We Write is a community of technical writers, technical editors, copyeditors, web content writers, and all other roles in technical communication.

Subscribe to Both.org

To comment on articles, you must have an account.

Send your desired user ID, first and last name, and an email address for login (this must be the same email address used to register) to subscribe@both.org with “Subscribe” as the subject line.

You’ll receive a confirmation of your subscription with your initial password as soon as we are able to process it.

Administration

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

License and AI Statements

Both.org aims to publish everything under a Creative Commons Attribution ShareAlike license. Some items may be published under a different license. You are responsible to verify permissions before reusing content from this website.

The opinions expressed are those of the individual authors, not Both.org.

You may not use this content to train AI.

 

Advertising Statement

Both.org does not sell advertising on this website.


Advertising may keep most websites running—but at Both.org, we’re committed to keeping our corner of the web ad-free. Both.org does not sell advertising on the website. Nor do we offer sponsored articles at this time. We’ll update this page if our position on sponsorships changes.

We want to be open about how the website is funded. Both.org is supported entirely by David Both and a few other dedicated individuals.

 

 

Copyright © All rights reserved. | MoreNews by AF themes.