Tar: The ultimate archiving tool

Thu 1 Jul 2021

tar is an archiving tool that comes standard with most Linux distro's. Its complete set of options make it a daunting task to learn these by heart. The name of the program is derived from tape archive.

Let me introduce tar in this article and break down the very basics to create, list, and extract an archive.

Let's play around with archives

There are a few arguments that once you learn these the commands become very easy to create. Let's take a look at those.

-f will set the filename of the tarball file.

-v will set verbose mode. You will see all files as they are processed

-c to indicate if you want to create an archive

-t will list the contents of an archive

-x will extract the contents of an archive

-z compress using gzip i.e. tar.gz

-J compress using xz i.e. tar.xz

-j compress using bzip2 i.e. tar.bz2

There are also --lzip, --lzma and --zlop. These each offer distinct features and have their uses. However, for general use gzip, xz and bzip2 will work the best on average.

# Create an tar archive
rob@Rathalos ~ $ tar -cf archive.tar pictures/
# Create an tar archive verbosely
rob@Rathalos ~ $ tar -cvf archive.tar pictures/

# List contents of a tar archive
rob@Rathalos ~ $ tar -tf archive.tar
# List contents of a tar archive verbosly (contains metadata)
rob@Rathalos ~ $ tar -tvf archive.tar

# Extract contents of a tar archive
rob@Rathalos ~ $ tar -xvf archive.tar

# Create a gzipped tar archive
rob@Rathalos ~ $ tar -zxvf archive.tar.gz pictures/
# Create a xz tar archive
rob@Rathalos ~ $ tar -Jxvf archive.tar.xz pictures/
# Create a bzip2 tar archive
rob@Rathalos ~ $ tar -jxvf archive.tar.bz2 pictures/

A little on compression

Here is a little bit of info on compression. There are two compression types that we need to differentiate between. This is Lossy and Lossless compression.

Lossy compression loses information of the original file to compress the data. The best example is a simple JPG file. You can, for instance, choose to export an image in GIMP to 60% of the original quality to compress the file. This is a lossy compression. It is inrevertably compressed to a lesser quality.

Lossless compression, hence the name, does not lose any information. The compression operation is revertable. This type of compression is used when compressing archives.

Text and text-like files will have the highest compression ratios. They will, on average, compress to about 50% of its original size. Other files like images will compress less as the algorithm will have a hard time finding a pattern.

The algorithm will look for patterns in the byte contents of the file. This is much easier on text and text-like files. There is a higher chance to find letters that are the same. Images, due to their complexity in their byte contents, thus not compressing as well. A PNG containing only one color will compress much more than a PNG containing gradients and complex artwork for example.

The reason why complex PNG won't compress well is the same reason why re-compressing compressed files does not work well either. Compressing files is only meant to be done once. The byte contents will change in a way that the pattern won't be able to go over twice.

Extensions and compressions

You might already have seen different extensions on tar archives. These extensions are quite useful. The modern binary of tar will automatically recognize the algorithm, you can use file archive.tar.gz to get the gist of that. In previous versions, you had to include the respective compression parameter to extract the data properly.

That is why it's important to name your archive accordingly. It will tell you what you can expect and how to work with those files. Now, I will list the most used extensions here however, you can find a full list on the wikipedia page.

Long Short
gzip .tar.gz .taz, .tgz
xz .tar.xz .txz
bzip2 .tar.bz2 .tb2, .tbz, .tbz2, .tz2

Compatibility

Linux and macOS have been supporting tar for a very long time out of the box. Windows 10 has added out-of-the-box support since build 17063 or later. That makes tar one of the most widespread archiving tools and very worthwhile to dive into.

7zip

7zip, particularly by Windows users I feel surpassed zip by far. This because it used better algorithms to compress the files. There is a well referenced page with benchmarks that denote the differences between algorithms.

7zip is also open source vs winzip being a premium software. However, tar supports also supports the algorithms they use. This makes the choice for Linux users in particular easy. The consensus is that xz remains the best format in terms of compression ratio. xz has the same compression ratio as the algorithm used with 7zip.

Okay, so lots of choices

Exactly! It all comes down to the use-case. You need backward compatibility? Ow, you want compression speed faster? Geez louis, look at that memory usage, try use this! Need an algorithm that uses less CPU resources? Want to save disk space? Tar has got you covered! There is one for each of those topics. That will make the choice easier in some cases when dealing with very specific platforms. In general use, you will see gzip, bzip2, and xz the most though.

Honestly, this article only goes over archiving and compressing the archives options that tar has to offer.