Ubuntu: I use dd to backup my ubuntu, but the backup image is too large


I use the command dd if=/dev/sda | gzip > /media/test/system_backup.img.gz to backup my disk. But the system_bakcup.img.gz is up to 5GB, and when I compress this file, the system_backup.img is only 3.5GB, please help me!


One pitfall to this backup technique is that, despite how much data is actually stored on the partition, it's likely you're going to be copying over a lot of junk data as well. Why? Well, as you add more files to your drive and then delete them, the data of your old file does not just disappear (unless you explicitly write over that memory). It's all still there. When you do a block device copy (i.e. use dd to copy the data) it has no clue the remnants of your deleted Titanic DVD is no longer something you care to have. It just mindlessly copies the bytes. And even though you don't care about that DVD, gzip won't know that so it'll compress the fairly uncompressable DVD despite the fact that it's essentially garbage.

How do you fix this?

Erm, well, you could do a file-level backup, to the tune of

tar -zcvf /media/test/system_backup.tar.gz /wherever/sda/is/mounted  

But if you're going to do file-level backup you should look into incremental backup schemes.

If you want to stick with a block level backup option, perhaps you could just write zeros to the empty space by determining the space you have left on the drive and then using (e.g. for one GB of free space):

dd if=/dev/zero of=/path/on/drive/zeros.bin bs=1M count=1024  

Then if you ran the same backup command it should be smaller (repeating zeros are easy to compress).

But even then you're still copying junk you don't need (i.e. all those zeros). If you can guarantee all the data you want is at the beginning of the partition (this is file-system dependent and may require a defrag) you could limit the backup size to the amount of used space, so something like (e.g. for 2GB of used space)

dd if=/dev/sda bs=1M count=2048 | gzip > /media/test/system_backup.img.gz  

You can use dh to get the amount of space used:

 df -m /dev/sda | tail -n 1 | tr -s ' ' | cut -d ' ' -f 2  

You could include it in the dd command like

dd if=/dev/sda bs=1M count=`df -m /dev/sda | tail -n 1 | tr -s ' ' | cut -d ' ' -f 3` | gzip > /media/test/system_backup.img.gz  

Just some ideas, you'd have to try it out to see if it actually worked. :-)

Edit: Some additional notes (nothing you couldn't find by looking up the flags I used in the man pages):

In the dd command, bs specifies the block size of the copying. When copying off a hard drive, it is most efficient to copy in blocks about the same size as your drive's cache. You don't have to worry too much about that though. In fact, you don't really have to specify the block size but if you don't know the default it will be difficult to determine how many blocks you need to copy to copy the right amount of data. I chose 1MB blocks for convenience and the count flag specifies how many blocks to copy (so 10 blocks copies 10 MB of data).

The df command will give the utilization of a storage device. The -m option shows it in MB (as opposed to the default of KB). Type in that command alone and the need of the tail, tr, and cut commands should be fairly evident (tail gets the last line, tr removes the superfluous spaces inserted for formatting, cut splits the output by spaces and returns the third element (indexed as 0, 1, 2) which is the space utilized.


You should use a Live CD/DVD/USB to perform this to insure a consistent image

Your drive may be full of deleted data patterns resulting in a larger image than necessary. If this is the case you can fix this with dd as well.

1) dd if=/dev/zero of=/delete.me

This will create a file full of zeros (highly compressible) and will run until the disk is full at which point it will fail due to running out of space. This is ok.

2) When finished creating the zero file (delete.me) delete it with

rm /delete.me

3) Try your backup again. since previously used space will now be zeroed out the end result should be a smaller image.


You shouldn't do this to backup unless you know that nothing's tinkering at your disk at the time of backing it up, otherwise resulting image would be not only large but inconsistent in addition rendering it to almost useless for recovery purpose under some circumstances .

To overcome it you can use snapshots of LVM-2, but it's much more smarter to not backup the whole disk again and again.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »