{"id":13268,"date":"2026-01-15T01:02:00","date_gmt":"2026-01-15T06:02:00","guid":{"rendered":"https:\/\/www.both.org\/?p=13268"},"modified":"2026-01-14T06:26:58","modified_gmt":"2026-01-14T11:26:58","slug":"why-i-prefer-tar-to-zip","status":"publish","type":"post","link":"https:\/\/www.both.org\/?p=13268","title":{"rendered":"Why I prefer tar to zip"},"content":{"rendered":"<div class=\"pld-like-dislike-wrap pld-template-1\">\r\n    <div class=\"pld-like-wrap  pld-common-wrap\">\r\n    <a href=\"javascript:void(0)\" class=\"pld-like-trigger pld-like-dislike-trigger  \" title=\"\" data-post-id=\"13268\" data-trigger-type=\"like\" data-restriction=\"cookie\" data-already-liked=\"0\">\r\n                        <i class=\"fas fa-thumbs-up\"><\/i>\r\n                <\/a>\r\n    <span class=\"pld-like-count-wrap pld-count-wrap\">1    <\/span>\r\n<\/div><\/div>\n<p>I love having choices when it comes to computing, and especially in the world of open source we&#8217;re spoilt when it comes to archiving files. There&#8217;s TAR, ZIP, GZIP, BZIP2, XZ, 7Z, AR, ZOO, and more. Of all compression formats, it seems that ZIP has gained ubiquity. It&#8217;s the one you can use to archive and extract data on nearly every system, including Linux, UNIX, FreeDOS, Android, Windows, macOS, and more. The problem is, ZIP isn&#8217;t the best tool for the job of archival. Here&#8217;s why I use TAR instead of ZIP whenever possible.<\/p>\n\n\n\n<p>Each archiving format has an associated command, such as <code>tar<\/code>, <code>zip<\/code>, <code>gzip<\/code> and <code>gunzip<\/code>, <code>xz<\/code>, and so on. In terms of compression, they all tend to be basically the same at this point. You might save a few kilobytes or megabytes with one compression algorithm given a specific combination of file types, but it&#8217;s fair to say that they all result in broadly similar results. Where they differ is in what each command makes available, and what each file format retains.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The <code>tar<\/code> and <code>zip<\/code> command showdown<\/h2>\n\n\n\n<p>At first glance, <code>tar<\/code> and <code>zip<\/code> are similar in capability.<\/p>\n\n\n\n<p>By default, the <code>tar<\/code> command generates an archive that&#8217;s not compressed. It&#8217;s just a single file object that contains smaller file objects within it. The resulting object is basically the same size as the sum of its parts:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ tar --create --file archive.tar pic.jpg file.txt\n$ ls -lG\n-rw-r--r-- 1 tux 46049280 Jan  7 10:55 archive.tar\n-rw-r--r-- 1 tux 45965374 Jan  7 10:55 file.txt\n-rw-r--r-- 1 tux    77673 Jan  7 08:34 pic.jpg<\/code><\/pre>\n\n\n\n<p>You can use the <code>-0<\/code> option to simulate this with the <code>zip<\/code> command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ zip -0 archive.zip pic.jpg file.txt\n  adding: pic.jpg (stored 0%)\n  adding: file.txt (stored 0%)\n$ ls -lG\n$ ls -lG\n-rw-r--r-- 1 tux 46049280 Jan  7 10:55 archive.tar\n-rw-r--r-- 1 tux 46043355 Jan  7 10:57 archive.zip\n-rw-r--r-- 1 tux 45965374 Jan  7 10:55 file.txt\n-rw-r--r-- 1 tux    77673 Jan  7 08:34 pic.jpg<\/code><\/pre>\n\n\n\n<p>The most common use case of each command, however, definitely includes compression.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Level of compression<\/h2>\n\n\n\n<p>The balance in choosing either an algorithm (in the case of <code>tar<\/code>) or a compression level (in the case of <code>zip<\/code> is between compression speed and size. In theory, the slower you let the command compress, the smaller the resulting archive. The faster the compression, the bigger the archive.<\/p>\n\n\n\n<p>Both commands strive to provide you with some control over this.<\/p>\n\n\n\n<p>By default (without the <code>-0<\/code> option), the <code>zip<\/code> command also compresses the archive it has created. You can adjust the amount of compression with an option ranging from <code>-0<\/code> to <code>-9<\/code>. The default level is <code>-6<\/code>.<\/p>\n\n\n\n<p>To add compression to the <code>tar<\/code> command, you can either use a separate command entirely to compress the resulting TAR file, or you can one of several options to choose what compression algorithm gets applied to the TAR file during its creation. Here&#8217;s an incomplete list:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>-z<\/code> or <code>--gzip<\/code>: Filters the archive through <code>gzip<\/code><\/li>\n\n\n\n<li><code>-j<\/code> or <code>--bzip2<\/code>: Filters the archive through <code>bzip2<\/code><\/li>\n\n\n\n<li><code>-J<\/code> or <code>--xz<\/code>: Filters the archive through <code>xz<\/code><\/li>\n\n\n\n<li><code>--lzip<\/code>: Filters the archive through <code>lzip<\/code><\/li>\n\n\n\n<li><code>-Z<\/code> or <code>--compress<\/code>: Filters the archive through <code>compress<\/code><\/li>\n\n\n\n<li><code>--zstd<\/code>: Filters the archive through <code>zstd<\/code><\/li>\n\n\n\n<li><code>--no-auto-compress<\/code>: Prevents <code>tar<\/code> from using the archive suffix to determine the compression program so you can specify one (or not) yourself<\/li>\n<\/ul>\n\n\n\n<p>Decoupling the process of archiving from compression makes sense to me. While the <code>zip<\/code> command is stuck with basically the same old algorithm year after year, a TAR archive can be compressed using whatever compression algorithm you think is best. In some cases, you might make that determination based on the type of data you&#8217;re compressing, or you might be limited to the capabilities of your target system, or you might just want to test a hot new compression algorithm.<\/p>\n\n\n\n<p>Here&#8217;s what the <code>zip<\/code> command does with a 44 MB text file and a JPEG file, at maximum compression:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ zip -9 archive.zip file.txt pic.jpg \n  adding: file.txt (deflated 90%)\n  adding: pic.jpg (deflated 14%)\n$ ls -lG\n-rw-r--r-- 1 tux 4.4M Jan  7 11:17 archive.zip\n-rw-r--r-- 1 tux  44M Jan  7 10:55 file.txt\n-rw-r--r-- 1 tux  76K Jan  7 08:34 pic.jpg<\/code><\/pre>\n\n\n\n<p>A compressed archive of 4.4 MB down from a little more than 44 MB isn&#8217;t bad.<\/p>\n\n\n\n<p>Similarly, the <code>tar<\/code> command with the <code>--gzip<\/code> option produces a 4.5 MB archive. However, filtering <code>tar<\/code> through <code>--xz<\/code> makes a significant improvement:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ tar --create --xz --file archive.tar.xz file.txt pic.jpg \n$ ls -lG\n-rw-r--r-- 1 tux users 3.3M Jan  7 11:17 archive.tar.xz\n-rw-r--r-- 1 tux users  44M Jan  7 10:55 file.txt\n-rw-r--r-- 1 tux users  76K Jan  7 08:34 pic.jpg<\/code><\/pre>\n\n\n\n<p>At 3.3 MB, it seems that a newer compression algorithm has outperformed ZIP, at least in this particular test. I&#8217;m the first to admit that compression tests are subject to many variables, so it&#8217;s not globally significant that XZ has done better than ZIP in this one example. With some experimentation, I could [probably] devise a test that gets better results from ZIP. However, this example does demonstrate that it&#8217;s useful having an archive tool that is modular enough to allow for the development of new algorithms.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Output manipulation<\/h2>\n\n\n\n<p>When you extract data from a TAR or ZIP archive, you can choose to either extract specific files or to extract everything all at once. I believe it&#8217;s most common to extract everything, because that&#8217;s the default behaviour on major desktops like GNOME and macOS. With both the <code>tar<\/code> and <code>unzip<\/code> commands, even when you choose to extract everything all at once, you still have a choice of where to put the files you&#8217;ve extracted.<\/p>\n\n\n\n<p>By default, both the <code>tar<\/code> and <code>unzip<\/code> commands extract all files into the current directory. If the archive itself contains a directory, then that directory serves as a &#8220;container&#8221; for the extracted files. Otherwise, the files appear in your current directory. This can get messy, but it&#8217;s a common enough problem that Linux and UNIX users call it a &#8220;tarbomb&#8221; because it sometimes feels like an archive has exploded and left file shrapnel in its wake.<\/p>\n\n\n\n<p>However, a tarbomb (or zipbomb) isn&#8217;t inherently bad. It&#8217;s a valid use case when you want to essentially overlay updated or additional files into an existing file system. For example, suppose you have a website consisting of several PHP files across several directories. You can take a copy of the site to your development machine to make updates, and then create an archive of the files you&#8217;ve updated. Extract the archive on your web server, and each new version of any file is extracted exactly where it originated from because both <code>tar<\/code> and <code>unzip<\/code> retain the filesystem&#8217;s structure. I use this feature when doing dot-release updates of several different content management systems, and it makes maintenance pleasantly simply.<\/p>\n\n\n\n<p>Both the <code>unzip<\/code> and <code>tar<\/code> commands provide an option to change directory before extraction so you can store an archive in one directory but send extracted files to a different location.<\/p>\n\n\n\n<p>Use the <code>--directory<\/code> option with the <code>tar<\/code> command:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ mkdir mytar\n$ tar --extract --file archive.tar.xz --directory .\/mytar\n$ ls .\/mytar\nfile.txt   pic.jpg<\/code><\/pre>\n\n\n\n<p>Use the <code>-d<\/code> option with <code>unzip<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ mkdir myzip\n$ unzip archive.zip -d .\/myzip\n$ ls .\/myzip\nfile.txt   pic.jpg<\/code><\/pre>\n\n\n\n<p>The feature <code>unzip<\/code> doesn&#8217;t have is the ability to drop directories from the archive before extraction. For example, suppose you want to extract files directly into <code>myzip<\/code>, but you&#8217;ve been given an archive containing a leading directory called <code>chaff<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ unzip archive+chaff.zip -d .\/myzip\n$ ls .\/myzip\nchaff\n$ ls .\/myzip\/chaff\nfile.txt   pic.jpg<\/code><\/pre>\n\n\n\n<p>You don&#8217;t want <code>chaff<\/code>, but there&#8217;s no option in <code>unzip<\/code> to skip it.<\/p>\n\n\n\n<p>Frustratingly, the <code>unzip<\/code> command essentially encourages this anti-pattern. In order to avoid delivering a zipbomb to someone, you thoughtfully nest your files in a useless folder. But by nesting everything in a useless folder, you&#8217;ve also prevented your user from extracting only the files required.<\/p>\n\n\n\n<p>The <code>tar<\/code> command solves this problem elegantly. You can protect your users from a tarbomb by nesting your files in a useless directory because <code>tar<\/code> allows any user to skip over any number of leading directories.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ tar --extract --strip-components=1 \\\n  --file archive+chaff.tar.xz --directory .\/mytar\n$ ls .\/mytar\nfile.txt  pic.jpg<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Permission and ownership<\/h2>\n\n\n\n<p>The ZIP file format doesn&#8217;t preserve file ownership. The TAR file format does.<\/p>\n\n\n\n<p>You might not notice this when using ZIP or TAR archives just on your own personal systems. Once a file is extracted, you own the file. However, using <code>tar<\/code> as a superuser or with the <code>--same-owner<\/code> option extracts each file with the same ownership it had when archived, assuming the same user and group is available on the system. There&#8217;s no option for that with <code>unzip<\/code> command because the ZIP file format doesn&#8217;t track ownership.<\/p>\n\n\n\n<p>The <code>zip<\/code> command can preserve file permissions, but again <code>tar<\/code> offers a lot more flexibility. The <code>--same-permissions<\/code>, <code>--no-same-permissions<\/code>, and <code>--mode<\/code> options let you control the permissions assigned to archived files.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Better archiving with tar<\/h2>\n\n\n\n<p>It&#8217;s easy to use either ZIP or TAR interchangeably, because for most general purpose activities their default behaviour is similar and suitable. However, if you&#8217;re using archives for mission critical work involving disparate systems and a diverse set of people, TAR is the technically superiour choice. Whether TAR is the &#8220;correct&#8221; choice depends entirely on your target audience, because there&#8217;s no doubt that ZIP has greater support. But all things being equal, TAR is the archive format and <code>tar<\/code> is the archive command I prefer.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>1 I love having choices when it comes to computing, and especially in the world of open source<\/p>\n","protected":false},"author":31,"featured_media":13269,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[5],"tags":[104,91,97],"class_list":["post-13268","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-linux","tag-command-line","tag-linux","tag-sysadmin"],"modified_by":"David Both","_links":{"self":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/13268","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/users\/31"}],"replies":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=13268"}],"version-history":[{"count":2,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/13268\/revisions"}],"predecessor-version":[{"id":13271,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/13268\/revisions\/13271"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/media\/13269"}],"wp:attachment":[{"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=13268"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=13268"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=13268"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}