Archiving and Compression Utilities

Howto guide for Archiving and Compressing

tar, cpio, gzip, gunzip, bzip2 utilities


There are many different tools available for archiving data within your Linux system. Many of the popular tools are command line based which makes it easier for an administrator to incorporate these into a script.
The most popular command line tools are "tar", "cpio", "gzip" and "bzip2".
The first of these tools "tar" is not a compression utility, however, it is frequently used in conjunction with a compression utility such as "gzip" or "bzip2".




Linux tar command


The Linux "tar" command is used to convert a group of files into an archive or extract file(s) from an existing archive. An archive is generally a single file that contains a number of individual files and associated metadata which allows them to be restored to their original form. Archives are a convenient way to store and distribute data and programs.

The name "tar" is derived from the term "Tape Archiver" which was originally designed for backing up data to magnetic tape. Now tar is used to group files together in an archive on a filesystem. These archives are frequently referred to a s a "tarball"

The basic syntax of the "tar" command is: tar OPTION... Archive_Name File_Name(s).


Basic tar examples


tar cf myarchive.tar file1 file2
In the above example, we are creating and archive called "myarchive.tar" which will contain the files "file1" and "file2".

tar tvf myarchive.tar
In this example, we instruct tar to display the contents of its archive.

tar xvf myarchive.tar
Here we extract all files verbosely from the archive "myarchive.tar"

tar uvf myarchive.tar file3
Here we add an additional file to the existing archive. If the file is already present within the archive, then it will only be added if the file in the archive is older than the file that is trying to be added.

tar rf myarchive.tar file4
To add a new file into an archive, the "r" option can be used. In this example, the file "file4" is added into the archive "myarchive.tar".

tar f myarchive.tar --delete file1 file2
This example will remove the files "file1" and "file2" from the archive "myarchive.tar".

tar cvf myarchive.tar ./mytest
In this example, we add the directory "mytest" and its content to the archive "myarchive.tar"

tar xvf myarchive.tar file3
In this example, only file3 is extracted from the archive.

tar cvf - . |ssh -l john remote_server "cd /tmp/john && tar -xf-"
In this example, we create an archive from standard input and then pipe this to a remote server using the userid of "john". The archive is then placed in the remote location "/tmp/john". If successful, then the archive is extracted. This particular method is useful if you are trying to move large amounts of data from one point to another. The process is continual.

Examples of above tar commands in use:



john@john-desktop:~/test_examples$ ls -l
total 24
-rw-rw-r-- 1 john john   48 Feb  3 20:49 file1
-rw-rw-r-- 1 john john   48 Feb  3 20:50 file2
-rw-rw-r-- 1 john john  135 Feb  3 21:00 file3
-rw-rw-r-- 1 john john  135 Feb  6 21:41 file4
-rw-rw-r-- 1 john john 1946 Feb  3 21:10 largefile.txt
-rw-rw-r-- 1 john john   85 Feb  3 20:42 textfile.txt

john@john-desktop:~/test_examples$ tar -cf myarchive.tar file1 file2

john@john-desktop:~/test_examples$ ls -l
total 36
-rw-rw-r-- 1 john john    48 Feb  3 20:49 file1
-rw-rw-r-- 1 john john    48 Feb  3 20:50 file2
-rw-rw-r-- 1 john john   135 Feb  3 21:00 file3
-rw-rw-r-- 1 john john   135 Feb  6 21:41 file4
-rw-rw-r-- 1 john john  1946 Feb  3 21:10 largefile.txt
-rw-rw-r-- 1 john john 10240 Feb  6 21:41 myarchive.tar
-rw-rw-r-- 1 john john    85 Feb  3 20:42 textfile.txt

john@john-desktop:~/test_examples$ tar -tvf myarchive.tar
-rw-rw-r-- john/john        48 2013-02-03 20:49 file1
-rw-rw-r-- john/john        48 2013-02-03 20:50 file2

john@john-desktop:~/test_examples$ rm file1 file2

john@john-desktop:~/test_examples$ ls file*
file3  file4

john@john-desktop:~/test_examples$ tar -xvf myarchive.tar
file1
file2

john@john-desktop:~/test_examples$ ls file*
file1  file2  file3  file4

john@john-desktop:~/test_examples$ tar -uvf myarchive.tar file3
file3

john@john-desktop:~/test_examples$ tar -tvf myarchive.tar
-rw-rw-r-- john/john        48 2013-02-03 20:49 file1
-rw-rw-r-- john/john        48 2013-02-03 20:50 file2
-rw-rw-r-- john/john       135 2013-02-03 21:00 file3

john@john-desktop:~/test_examples$ tar -rf myarchive.tar file4

john@john-desktop:~/test_examples$ tar -tvf myarchive.tar
-rw-rw-r-- john/john        48 2013-02-03 20:49 file1
-rw-rw-r-- john/john        48 2013-02-03 20:50 file2
-rw-rw-r-- john/john       135 2013-02-03 21:00 file3
-rw-rw-r-- john/john       135 2013-02-06 21:41 file4


Using tar with compression


As we mentioned earlier, the tar utility has now means of compression by itself. To add compression to the tar utility, we can use the tar command in conjunction with a compression utility:

The "j" flag can be used to specify "bzip2" compression issued. The "z" flag can be used for using the "gzip" compression utility. The "Z" can also be used to use the "compress" utility.


Examples of using compression


tar cvfj myarchive.tar.bz2 file1 file2 file3
This example creates an archive using bzip2 for compression. An archive called "myarchive.tar.bz2" is created containing "file1 file2 and file3".

tar xvfj myarchive.tar.bz2
Here we decompress the archive and then extract the files from within.



john@john-desktop:~/test_examples$ ls -l
total 12
-rw-rw-r-- 1 john john  48 Feb  6 22:07 file1
-rw-rw-r-- 1 john john  48 Feb  3 20:50 file2
-rw-rw-r-- 1 john john 135 Feb  3 21:00 file3

john@john-desktop:~/test_examples$ tar cvfj myarchive.tar.bz2 file1 file2 file3
file1
file2
file3

john@john-desktop:~/test_examples$ ls -l
total 16
-rw-rw-r-- 1 john john  48 Feb  6 22:07 file1
-rw-rw-r-- 1 john john  48 Feb  3 20:50 file2
-rw-rw-r-- 1 john john 135 Feb  3 21:00 file3
-rw-rw-r-- 1 john john 245 Feb  7 20:37 myarchive.tar.bz2

john@john-desktop:~/test_examples$ rm file*

john@john-desktop:~/test_examples$ ls -l
total 4
-rw-rw-r-- 1 john john 245 Feb  7 20:37 myarchive.tar.bz2

john@john-desktop:~/test_examples$ tar xvfj myarchive.tar.bz2
file1
file2
file3

john@john-desktop:~/test_examples$ ls -l
total 16
-rw-rw-r-- 1 john john  48 Feb  6 22:07 file1
-rw-rw-r-- 1 john john  48 Feb  3 20:50 file2
-rw-rw-r-- 1 john john 135 Feb  3 21:00 file3
-rw-rw-r-- 1 john john 245 Feb  7 20:37 myarchive.tar.bz2


cpio command


The cpio command is used to process archive files. Its name is derived from the phrase "Copy in, Copy out". The function of the cpio command can be broken roughly into three categories of operation. These are: The copying of files to an archive, Extracting files from an archive and the passing of files to another directory tree. Generally cpio takes its input from "standard input" whilst creating an archive. This information is then sent to standard output.


Basic cpio examples


Create a basic cpio archive


john@john-desktop:~/test_examples$ ls -l
total 8
drwxrwxr-x 2 john john 4096 Feb  7 20:47 backup_archives
drwxrwxr-x 2 john john 4096 Feb  7 20:43 testcpio
john@john-desktop:~/test_examples$ cd testcpio/

john@john-desktop:~/test_examples/testcpio$ ls
file1  file2  file3

john@john-desktop:~/test_examples/testcpio$ ls | cpio -ov > ~/test_examples/backup_archives/myarchive.cpio
file1
file2
file3
89 blocks

In the above example we pipe the output from the "ls" command to "cpio". This output is then redirected to your destination area.

Extract files from a cpio archive


john@john-desktop:~/test_examples$ mkdir recover

john@john-desktop:~/test_examples$ ls -l
total 12
drwxrwxr-x 2 john john 4096 Feb  7 20:48 backup_archives
drwxrwxr-x 2 john john 4096 Feb  7 20:53 recover
drwxrwxr-x 2 john john 4096 Feb  7 20:43 testcpio

john@john-desktop:~/test_examples$ cd recover/

john@john-desktop:~/test_examples/recover$ cpio -idv < ~/test_examples/backup_archives/myarchive.cpio
file1
file2
file3
89 blocks

john@john-desktop:~/test_examples/recover$ ls -l
total 48
-rw-rw-r-- 1 john john 15078 Feb  7 20:56 file1
-rw-rw-r-- 1 john john 15078 Feb  7 20:56 file2
-rw-rw-r-- 1 john john 15078 Feb  7 20:56 file3

In the above example we extract our files from the cpio archive "myarchive.cpio" into the current directory.

Create a cpio archive with files that match a particular type



john@john-desktop:~/test_examples/testcpio$ ls -l
total 60
-rw-rw-r-- 1 john john   321 Feb  7 21:12 file1.bak
-rw-rw-r-- 1 john john 15078 Feb  7 20:43 file1.txt
-rw-rw-r-- 1 john john   321 Feb  7 21:12 file2.bak
-rw-rw-r-- 1 john john 15078 Feb  7 20:43 file2.txt
-rw-rw-r-- 1 john john   321 Feb  7 21:12 file3.bak
-rw-rw-r-- 1 john john 15078 Feb  7 20:43 file3.txt

$ find . -iname "*.txt" -print | cpio -ov > ~/test_examples/backup_archives/mytxtfiles.cpio
./file2.txt
./file3.txt
./file1.txt
89 blocks

john@john-desktop:~/test_examples/testcpio$ cd ../backup_archives/

john@john-desktop:~/test_examples/backup_archives$ ls -l
total 48
-rw-rw-r-- 1 john john 45568 Feb  7 21:19 mytxtfiles.cpio

In the above example, only files that match the pattern "*.txt" are selected for archiving.


Create a .tar archive with cpio -F



$ ls | cpio -ov -H tar -F mytar.tar

In this example we specified the "-F" parameter to use the archive "mytar.tar"

Extracting a .tar archive with cpio



$ cpio -idv -F mytar.tar

Viewing the contents of a .tar file with the cpio command



$ cpio -it -F mytar.tar

gzip and gunzip command


gzip is a popular compression tool that can reduce the size of named files using the "Lempel-Ziv" algorithm. When a file is compressed, a compressed version is created with a file extension of ".gz". "gzip" will only compress regular files and not symbolic links. By default, gzip will keep the original file name and timestamps within the compressed file.

gunzip is capable of decompressing files from "gzip" and "compress". gunzip can take a list of file names from the command line whose extension ends: ".gz, -gz, .z, -z, _z or .Z". gunzip can also handle the special extensions: ".tgz and .taz as shorthands for .tar.gz and .tar.Z respectively".


gzip examples


gzip file1
file1 is compressed creating a compressed version of the file called file1.gz. The original file now only exists in its compressed format! See example below:



john@john-desktop:~/test_examples/testcpio$ ls -l
total 48
-rw-rw-r-- 1 john john 15078 Feb  7 21:35 file1
-rw-rw-r-- 1 john john 15078 Feb  7 21:35 file2
-rw-rw-r-- 1 john john 15078 Feb  7 21:36 file3

john@john-desktop:~/test_examples/testcpio$ gzip file1

john@john-desktop:~/test_examples/testcpio$ ls -l
total 36
-rw-rw-r-- 1 john john  3229 Feb  7 21:35 file1.gz
-rw-rw-r-- 1 john john 15078 Feb  7 21:35 file2
-rw-rw-r-- 1 john john 15078 Feb  7 21:36 file3

gzip -c file1 > file1.gz
This example will retain the original file. See example below:



john@john-desktop:~/test_examples/testcpio$ ls -l
total 48
-rw-rw-r-- 1 john john 15078 Feb  7 21:35 file1
-rw-rw-r-- 1 john john 15078 Feb  7 21:35 file2
-rw-rw-r-- 1 john john 15078 Feb  7 21:36 file3

john@john-desktop:~/test_examples/testcpio$ gzip -c file1 > file1.gz

john@john-desktop:~/test_examples/testcpio$ ls -l
total 52
-rw-rw-r-- 1 john john 15078 Feb  7 21:35 file1
-rw-rw-r-- 1 john john  3229 Feb  7 21:39 file1.gz
-rw-rw-r-- 1 john john 15078 Feb  7 21:35 file2
-rw-rw-r-- 1 john john 15078 Feb  7 21:36 file3

gzip -r mydir
This example will compress all files under the given directory.

gzip -d file1.gz
This will uncompress file1.gz. This is the equivalent of issuing "gunzip file1.gz"



john@john-desktop:~/test_examples/testcpio$ ls -l
total 36
-rw-rw-r-- 1 john john  3229 Feb  7 21:39 file1.gz
-rw-rw-r-- 1 john john 15078 Feb  7 21:35 file2
-rw-rw-r-- 1 john john 15078 Feb  7 21:36 file3

john@john-desktop:~/test_examples/testcpio$ gzip -d file1.gz

john@john-desktop:~/test_examples/testcpio$ ls -l
total 48
-rw-rw-r-- 1 john john 15078 Feb  7 21:39 file1
-rw-rw-r-- 1 john john 15078 Feb  7 21:35 file2
-rw-rw-r-- 1 john john 15078 Feb  7 21:36 file3


gzip option flags



Compress or uncompress Files (by default, compress FILES in-place).

Mandatory arguments to long options are mandatory for short options too.

  -c, --stdout      write on standard output, keep original files unchanged
  -d, --decompress  decompress
  -f, --force       force overwrite of output file and compress links
  -h, --help        give this help
  -l, --list        list compressed file contents
  -L, --license     display software license
  -n, --no-name     do not save or restore the original name and time stamp
  -N, --name        save or restore the original name and time stamp
  -q, --quiet       suppress all warnings
  -r, --recursive   operate recursively on directories
  -S, --suffix=SUF  use suffix SUF on compressed files
  -t, --test        test compressed file integrity
  -v, --verbose     verbose mode
  -V, --version     display version number
  -1, --fast        compress faster
  -9, --best        compress better

bzip2 examples


bzip2 file1
file1 is compressed creating a compressed version of the file called file1.bz2

bzip2 -c file1 > file1.bz2
This example of bzip2 will retain the original file.

bzip2 -r mydir
This example will compress all files under the directory "mydir".

bzip -d fil1.bz2
In this example, file1.bz2 is decompressed. This is the equivalent of issuing "bunzip file1.bz2"

bzip2 option flags


bzip2, a block-sorting file compressor.  Version 1.0.5, 10-Dec-2007.

   usage: bzip2 [flags and input files in any order]

   -h --help           print this message
   -d --decompress     force decompression
   -z --compress       force compression
   -k --keep           keep (don't delete) input files
   -f --force          overwrite existing output files
   -t --test           test compressed file integrity
   -c --stdout         output to standard out
   -q --quiet          suppress noncritical error messages
   -v --verbose        be verbose (a 2nd -v gives more)
   -L --license        display software version & license
   -V --version        display software version & license
   -s --small          use less memory (at most 2500k)
   -1 .. -9            set block size to 100k .. 900k
   --fast              alias for -1
   --best              alias for -9

   If invoked as `bzip2', default action is to compress.
              as `bunzip2',  default action is to decompress.
              as `bzcat', default action is to decompress to stdout.

Compress command


The compress command reduces the size of named files using "Lempel-Ziv" coding. Where possible, each file is replaced by one with the extension .Z. This compression is carried out keeping the same ownership modes, access and modification times. If no files are specified, the standard input is compressed to the standard output. The Compress routine will only attempt to compress regular files. To uncompress a file to its original form, the uncompress command is issued. For further information on the "compress command, issue "compress --help" from the command line.

Example of the compress command



john@john-desktop:~/test_examples$ ls -l
total 16
-rw-rw-r-- 1 john john  48 Feb  6 22:07 file1
-rw-rw-r-- 1 john john  48 Feb  3 20:50 file2
-rw-rw-r-- 1 john john 135 Feb  3 21:00 file3
-rw-rw-r-- 1 john john 135 Feb  6 21:41 file4

john@john-desktop:~/test_examples$ compress file1

john@john-desktop:~/test_examples$ ls -l
total 16
-rw-rw-r-- 1 john john  41 Feb  6 22:07 file1.Z
-rw-rw-r-- 1 john john  48 Feb  3 20:50 file2
-rw-rw-r-- 1 john john 135 Feb  3 21:00 file3
-rw-rw-r-- 1 john john 135 Feb  6 21:41 file4

john@john-desktop:~/test_examples$ compress file2 file3 file4

john@john-desktop:~/test_examples$ ls -l
total 16
-rw-rw-r-- 1 john john  41 Feb  6 22:07 file1.Z
-rw-rw-r-- 1 john john  41 Feb  3 20:50 file2.Z
-rw-rw-r-- 1 john john 100 Feb  3 21:00 file3.Z
-rw-rw-r-- 1 john john 100 Feb  6 21:41 file4.Z

john@john-desktop:~/test_examples$ uncompress file3

john@john-desktop:~/test_examples$ ls -l
total 16
-rw-rw-r-- 1 john john  41 Feb  6 22:07 file1.Z
-rw-rw-r-- 1 john john  41 Feb  3 20:50 file2.Z
-rw-rw-r-- 1 john john 135 Feb  3 21:00 file3
-rw-rw-r-- 1 john john 100 Feb  6 21:41 file4.Z

john@john-desktop:~/test_examples$ uncompress file1.Z file2.Z file4.Z

john@john-desktop:~/test_examples$ ls -l
total 16
-rw-rw-r-- 1 john john  48 Feb  6 22:07 file1
-rw-rw-r-- 1 john john  48 Feb  3 20:50 file2
-rw-rw-r-- 1 john john 135 Feb  3 21:00 file3
-rw-rw-r-- 1 john john 135 Feb  6 21:41 file4