The Truth About Disk Fragmentation
For many older PC filesystems, such as FAT (and all its variants) and NTFS, fragmentation is a significant problem resulting in degraded disk performance. Defragmentation became an industry in itself with different brands of defragmentation software that ranged from very effective to only marginally so.
Linux’s extended filesystems use data allocation strategies that help to minimize fragmentation of files on the hard drive and reduce the effects of fragmentation when it does occur. You can use the fsck command on EXT filesystems to check the total filesystem fragmentation. The following example checks the home directory of my main workstation, which was only 1.5% fragmented. Be sure to use the -n parameter, because it prevents fsck from taking any action on the scanned filesystem.
fsck -fn /dev/mapper/vg_01-home
I once performed some theoretical calculations to determine whether disk defragmentation might result in any noticeable performance improvement. While I did make some assumptions, the disk performance data I used were from a new 300GB, Western Digital hard drive with a 2.0ms track-to-track seek time. The number of files in this example was the actual number that existed in the filesystem on the day I did the calculation. I did assume that a fairly large amount of the fragmented files (20%) would be touched each day. Figure 3 shows those calculations.
| Total files | 271,794 |
| % fragmentation | 5.00% |
| Discontinuities | 13,590 |
| % fragmented files touched per day | 20% (assume) |
| Number of additional seeks | 2,718 |
| Average seek time | 10.90 ms |
| Total additional seek time per day | 29.63 sec = 0.49 min |
| Track-to-track seek time | 2.00 ms |
| Total additional seek time per day | 5.44 sec = 0.091 min |
Figure 3: The theoretical effects of fragmentation on disk performance
I have done two calculations for the total additional seek time per day, one based on the track-to-track seek time, which is the more likely scenario for most files due to the EXT file allocation strategies, and one for the average seek time, which I assumed would make a fair worst-case scenario.
As you can see from Table 1, the impact of fragmentation on a modern EXT filesystem with a hard drive of even modest performance would be minimal and negligible for the vast majority of applications. You can plug the numbers from your environment into your own similar spreadsheet to see what you might expect in the way of performance impact. This type of calculation most likely will not represent actual performance, but it can provide a bit of insight into fragmentation and its theoretical impact on a system.
Most of my EXT4 partitions are around 1.5% or 1.6% fragmented; I do have one that is 3.3% fragmented but that is a large, 128GB filesystem with fewer than 100 very large ISO image files; I’ve had to expand the partition several times over the years as it got too full.
That is not to say that some environments with huge data stores don’t require that fragmentation be minimized as much as possible. The EXT filesystem can be tuned with care by a knowledgeable admin who can adjust the parameters to compensate for specific workload types. This can be done when the filesystem is created or later using the tune2fs command. The results of each tuning change should be tested, meticulously recorded, and analyzed to ensure optimum performance for the target environment. In the worst case, where performance cannot be improved to desired levels, other filesystem types are available that may be more suitable for a particular workload. And remember that it is common to mix filesystem types on a single host system to match the load placed on each filesystem.
Due to the low amount of fragmentation on most EXT filesystems, it is not necessary to defragment. In any event, there is no safe defragmentation tool for EXT filesystems. There are a few tools that allow you to check the fragmentation of an individual file or the fragmentation of the remaining free space in a filesystem. There is one tool, e4defrag, which will defragment a file, directory, or filesystem as much as the remaining free space will allow. As its name implies, it only works on files in an EXT4 filesystem, and it does have some limitations.
If it becomes necessary to perform a complete defragmentation on an EXT filesystem, there is only one method that will work reliably. You must move all the files from the filesystem to be defragmented, ensuring that they are deleted after being safely copied to another location. If possible, you could then increase the size of the filesystem to help reduce future fragmentation. Then copy the files back onto the target filesystem. Even this does not guarantee that all the files will be completely defragmented.
The BtrFS filesystem
Copy-on-write (COW) filesystems like BtrFS have many advantages, but they also have disadvantages, one of which is much greater data fragmentation than the EXT4 filesystem. BtrFS stores the data sequentially when possible as files are written to the disk for first time, but its COW design implies that any subsequent modification to the file must not be written on top of the old data, but be placed in a free block, which will cause fragmentation. Additionally, it suffers the fragmentation problems common to all filesystems.
The BtrFS filesystem has a defragmentation tool that can be used from the command line, btrfs filesystem defragment. BtrFS is designed to support automatic, on-line defragmentation which, while important to installations with huge masses of data, is irrelevant to most typical use cases. The BtrFS filesystem must be mounted with the autodefrag option to enable this feature.
I’ve not used BtrFS long enough to get a feel for how much it fragments during normal operation and disk loads. As a result, I’ve not used either defragmentation method. As I continue to experiment with BtrFS, I’ll explore fragmentation and the tools used to minimize it.