Ubuntu: Archiving a lot of files / one big file


I want to compress about 100'000 files (that's what find . -type f | wc -l says) with a total disk usage of 100 GB. Most of the files are small but just a handful of them makes up about 70 GB of the 100 GB.

I don't want to use tar or tar.gz for this because if I want to access the archive, File Roller first has to read in the entire archive from the external HDD before I can even see the file list. Same thing if I try to list the files on the terminal.

I don't need the rights management of tar because I can remember the few files which need other rights than the others. What compression algorithm should I use?

And while I'm at it: I make full disk backups with this command:

dd if=/dev/sda bs=32M | gzip -9 > /location/dateAndMachineName.gz  

It does a pretty good compression. But do you know a better compression algorithm?


The only solution I am aware of is pixz (sudo apt-get install pixz), a variant of xz using a blocked encoder which allows for fast random acccess/indexing. Additionally, it is a parallel method using multiple cores for compression.

Citing the docs:

The existing XZ Utils ( http://tukaani.org/xz/ ) provide great compression in the .xz file format, but they have two significant problems:

  • They are single-threaded, while most users nowadays have multi-core computers.
  • The .xz files they produce are just one big block of compressed data, rather than a collection of smaller blocks. This makes random access to the original data impossible.

With pixz, both these problems are solved.

Usage is simple:

tar -Ipixz -cf foo.tpxz foo to compress a folder foo

pixz -l foo.tpxz to list files in it (fast!)

pixz -x <file_path> < foo.tpxz | tar x to extract a single file given <file_path> in the archive

As a bonus, you will get access rights stored as well since the files are tarred first!


I con only think of one solution for you: Make a new partition, with a btrfs filesystem and activate transparent compression. Keep in mind tha some people still considder btrfs an "experimental" filesystem. That being said, my secondary backup HDD is using btrfs (for little over 2 years) and so far it's given me 0 issues. But as usual YMMV.

This and this should get you started with btrfs, if you are not familiar with it already.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »