Pages

Tuesday, April 21, 2009

File compresion tests

Slackware had a recent update to pkgtools which adds support for other compression formats. The standard tgz (tar.gzip), tbz (tar.bzip2), tlz (tar.lzma), and txz (tar.xz). lzma and xz in Slackware current can be handled by a new package called ---xz. I wanted to see if using the different compression algorithms made any difference. For this test, I'm using a folder which contains 9,264 files, and 806 sub folders. Uncompressed size is 25112411 (~23.9MB). I wanted to test the file size, compression time, and decompression time. Using gzip, bzip2, lzma, xz, and P-7zip. 7zip is another lzma compression utility I already had on my system.

Commands used -

time tar -cf - fluxbox-themes | gzip -9 >fluxbox-themes.tar.gz
time tar -cf - fluxbox-themes | bzip2 -9 >fluxbox-themes.tar.bz2
time tar -cf - fluxbox-themes | lzma -9 >fluxbox-themes.tar.lzma
time tar -cf - fluxbox-themes | lzma -5 >fluxbox-themes-5.tar.lzma
time tar -cf - fluxbox-themes | xz -9 >fluxbox-themes.tar.xz
time tar -cf - fluxbox-themes | xz -5 >fluxbox-themes-5.tar.xz
time tar -cf - fluxbox-themes | 7z a -si -mx=9 fluxbox-themes.tar.7z
time tar -cf - fluxbox-themes | 7z a -si -mx=5 fluxbox-themes-5.tar.7z

All of the tools except 7zip follow the same option structure. This is something that's nice about xz compared to 7zip. 7zip also prints out some junk while compressing P-7zip 4.58 beta Copyright (c) 1999-2008 Igor Pavlov 2008-05-05 p7
zip Version 4.58 (locale=en_US,Utf16=on,HugeFiles=on,1 CPU)
Creating archive fluxbox-themes.tar.7z

Nothing horrible, but it is nice to have lzma compression tools that follows the other standards. There are other lzma Linux based compression tools besides the 3 here. How ever they are reported to not be stable, quite slow, or the compression isn't uniform. On to the test results. Ordered by compressed size.

Tool Time to compress Compressed size Time to decompress
7z -mx=9 real 0m30.865s user 0m27.943s sys 0m0.967s 7,388,688 real 0m10.011s user 0m1.461s sys 0m1.210s
lzma -9 real 0m28.085s user 0m25.833s sys 0m1.178s 7,388,924 real 0m12.952s user 0m1.452s sys 0m1.354s
xz -9 real 0m28.184s user 0m25.904s sys 0m1.106s 7,390,384 real 0m10.748s user 0m1.462s sys 0m1.281s
7z -mx=5 real 0m24.188s user 0m21.998s sys 0m0.714s 7,427,311 real 0m11.924s user 0m1.514s sys 0m1.216s
lzma -5 real 0m23.208s user 0m19.469s sys 0m0.739s 7,507,885 real 0m10.020s user 0m1.425s sys 0m1.317s
xz -5 real 0m21.079s user 0m19.559s sys 0m0.674s 7,511,708 real 0m10.614s user 0m1.457s sys 0m1.265s
bzip2 -9 real 0m15.024s user 0m14.212s sys 0m0.325s 8,807,485 real 0m13.550s user 0m3.767s sys 0m1.355s
gzip -9 real 0m5.258s user 0m4.729s sys 0m0.357s 9,456,727 real 0m6.796s user 0m0.467s sys 0m0.867s

7zip, lzma and xz where all just about exactly the same size. lzma being the slower of the three to decompress, and 7zip being the slowest to compress. Gzip, of course, being the speed king here. It's at least 4x faster than any of the lzma compression tools and ~3x as fast as bzip2. Given the size of bzip2, the amount of time it takes to compress and decompress compared to the other formats, it just doesn't seem like a valid choice here. Where bzip2 comes in, is that 7zip, lzma, and xz are not that popular, and support may or may not be available with your tools.

On my system (Slackware with KDE 4.2.2) Ark and xarchiver could open 7zip files, but did not know how to handle lzma nor xz files. The standard tar command tar xf $file.tar.$ did not work either.

In my personal opinion - bzip2 loses here. I don't see much difference between xz and lzma, other than 2 extra characters at the end. 7zip, lzma, and xz where all virtually identical in compression time, decompression time, and compressed size. 7zip does not support the same common command line switches that all of the other tools do. There's also warnings about using 7zip to backup Linux file systems It (7zip) does not correctly follow symlinks, for keep permissions. You could, in theory, tar the directory first then pipe it into 7zip. Plus 7zip supports multiple cores.

I will still continue to use the standard gzip for the bulk majority of my archiving. If I was running a mirror, or another bandwidth hogging service, you can believe I would quickly switch over to one of the lzma compression tools. For my home use, archiving space isn't limited enough to compensate for the 3-6x time increase to archive and 2x time increase to decompress.

No comments:

Post a Comment