Introduction to the BtrFS Filesystem

1

Over the nearly 30 years I have been using Linux, the default filesystem for Red Hat Linux (not RHEL) and Fedora, the EXT series of filesystems, has evolved considerably. EXT2, the second extended filesystem, was the default when I started using Linux, and it had many drawbacks including but not the least of which was that it took hours and sometimes days to recover from an improper shutdown such as a power failure. Now at EXT4, the extended filesystem can recover from many types of occurrences such as that in only seconds. It is fast and works very well in concert with logical volume management (LVM) to provide a flexible and powerful filesystem structure that works well in many storage use cases.

What it is

BtrFS1 or the B-Tree Filesystem is a relatively new filesystem that employs a Copy-­on-­Write2 (CoW) strategy. Copy-on-Write differs significantly from the EXT4 journaling strategy for committing data to the storage device medium. The next two paragraphs are extremely simplified, conceptual summaries of how they work:

  • With a journaling filesystem, new or revised data is stored in the fixed-size journal, and when all of the data has been committed to the journal, it is then written to the main data space of the storage device, either to replace modified blocks where possible, or into newly allocated blocks. The journal is marked as having been committed when the write operation is completed.
  • In a BtrFS Copy-on-Write filesystem, the original data is not touched. New or revised data is written to a completely new location on the storage device. When the data has been completely written to the storage device, the pointer to the now old data is simply changed to point to the new data in an atomic operation that minimizes the possibility of data corruption. The storage space containing the old data is then released for reuse.

What’s a B-tree?

A B-tree is a logical structure that, when applied to data storage devices such as disk drives or SSDs, can be used to balance chunks of the filesystem across one or several devices as a means to allow storage of massive amounts of data in a data structure from which it can be quickly found and retrieved. BtrFS is designed with integral logical volume management so it doesn’t require separate LVM as does EXT4.

A Tiny Bit of History

Based on a 2007 paper by IBM researcher Ohad Rodeh, BtrFS was designed at Oracle Corporation for use in their version of Linux. In addition to being a general-purpose filesystem, it was intended to address a different and more specific set of problems from the EXT filesystem. BtrFS is designed to accommodate huge storage devices with capacities that don’t exist even yet and large amounts of data, especially massively large databases in highly transactional environments.

A very complete set of BtrFS documentation is available from the BtrFS project website.

More Features

BtrFS is also designed to be fault tolerant, and it is self-healing in case errors occur. It is intended to be easy to maintain. It has built-in volume management, which means the separate logical volume management (LVM) tool used to provide that functionality behind the EXT4 filesystem is not needed.

Copy-on-write (COW) filesystems like BtrFS have many advantages, but they also have disadvantages, one of which is much greater data fragmentation than the EXT4 filesystem. BtrFS stores the data sequentially when possible as files are written to the disk for first time, but a COW design implies that any subsequent modification to the file must not be written on top of the old data, but be placed in a free block, which will cause fragmentation. Additionally, it suffers the fragmentation problems common to all filesystems.

The BtrFS filesystem has a defragmentation tool that can be used from the command line, btrfs filesystem defragment. BtrFS is designed to support automatic, on-line defragmentation which, while important to installations with huge masses of data, is irrelevant to most typical use cases. The BtrFS filesystem must be mounted with the autodefrag option to enable this feature.

I’ve not used BtrFS long enough to get a feel for how much it fragments during normal operation and disk loads. As a result, I’ve not used either defragmentation method.

The BtrFS filesystem supports filesystem and files in sizes up to 16 Exabytes (an Exabyte is 8*1018 or 262) which is many orders of magnitude greater than any current disk or SSDs can support. To attain data structures of that size, multiple devices can be added to a BtrFS volume as needed.

Another of its many interesting features is that BtrFS supports subvolumes. That simply means that a BtrFS volume can be subdivided into multiple subvolumes that provide separate data structures such as directories and iNodes. Thus one subvolume can be managed separately from others. The primary volume is a single storage pool for all of the subvolumes. This we see things like that in Figure 1 when using traditional disk management commands. The df command shows that the / (root) and /home directories show the same size and usage as well as the same device special file. This indicates that they are subvolumes.

# df -hTt btrfs
Filesystem     Type   Size  Used Avail Use% Mounted on
/dev/nvme0n1p3 btrfs  475G  7.2G  465G   2% /
/dev/nvme0n1p3 btrfs  475G  7.2G  465G   2% /home
/dev/sda1      btrfs  466G   23G  442G   5% /var/Pictures

Figure 1: The df command shows that the / (root) and /home directories show the same size and usage as well as the same device special file.

Which Distros Use BtrFS?

Now that I’ve converted one of my systems to BtrFS and am in the process of stress-testing, this seems to be a good time to explore why some distributions offer it as an option or even make it the default filesystem. This list contains only a few of those but it does give you a good idea of its level of acceptance with the more popular mainstream distros.

However, many distros still use other default filesystems, and one of the EXT versions seems to be the most common. One of the most notable exceptions to this conversion to BtrFS is Red Hat Enterprise Linux. RHEL did include it as an experimental option for a couple releases, but has removed all support for BtrFS in release 8. RHEL currently uses the XFS filesystem as its default while continuing to offer EXT4 as an option.

What Does This All Mean to Users?

As a user myself, BtrFS means a simpler filesystem metadata structure and a simpler installation with fewer entries and choices to make. Because all of the available data blocks on a storage device are a single pool, I don’t need to worry about one volume or partition filling up if I didn’t make it large enough; storage can be allocated to any directory or partition on the filesystem. New devices can be added to a volume as they’re needed.

Just remember that BtrFS does not change the Linux Filesystem Hierarchical Standard (FHS). All of the same directories are present and store the same types of data as ever.

Leave a Reply