Speed Up Archives with Pigz: Tips & Best Practices
What Pigz is
Pigz (parallel implementation of gzip) uses multiple CPU cores to compress data much faster than gzip by splitting work across threads while maintaining gzip-compatible output.
When to use it
- Large files or many files where compression is CPU-bound (not I/O-bound).
- Multi-core machines (more cores → larger speedup).
- Workflows that require gzip-compatible archives (pigz output can be decompressed by gzip).
Basic usage
- Compress a single file:
bash
pigz file.txt# creates file.txt.gz
- Decompress:
bash
pigz -d file.txt.gz # or gunzip file.txt.gz
- Specify threads:
bash
pigz -p 8 file.txt # use 8 threads
Tips for best performance
- Match threads to cores: Start with -p N where N equals logical CPU cores; reduce if system is shared.
- Avoid excessive threads: Too many threads on an I/O-bound system yields little benefit and increases contention.
- Use faster storage: For I/O-bound workloads, SSDs or local disks improve throughput.
- Tune compression level: Lower levels (e.g., -1) are much faster with slightly larger output; higher levels (-9) slow down significantly. Example:
bash
pigz -1 -p 8 bigfile # fast, reasonable compression pigz -9 -p 8 bigfile # best compression, slower
- Combine with tar for directories: Preserve metadata and compress a stream:
bash
tar cf - dir/ | pigz -p 8 > dir.tar.gz # extract: pigz -d -c dir.tar.gz | tar xf -
- Compress many small files efficiently: Archive first with tar, then pigz; compressing each small file separately has overhead.
Resource management
- Limit CPU impact: Use nice/ionice to lower priority on shared systems:
bash
nice -n 10 pigz -p 4 bigfile ionice -c2 -n7 pigz -p 4 bigfile
- Monitor system load: Use top/htop/iostat to ensure neither CPU nor disk is saturated.
Advanced workflows
- Streaming over network: Use pigz with ssh for parallel compression during transfer:
bash
tar cf - dir/ | pigz -p 8 | ssh host ‘cat > /path/dir.tar.gz’ # on receiver, to decompress as it arrives: ssh host ‘cat /path/dir.tar.gz’ | pigz -d | tar xf -
- Parallel decompression: pigz -d uses multiple threads when input was compressed with pigz; gzip cannot parallel-decompress pigz-compressed streams unless pigz was used.
Caveats
- Compatibility: Pigz output is gzip-compatible, but parallel compression produces independent compressed blocks; some gzip tools expecting single-threaded layout still work, but special tools that inspect block structure may differ.
- Checksums and streaming: pigz preserves gzip checksums; when splitting archives or using concatenated streams, verify integrity with gzip -t or pigz -t.
Quick checklist
- Use tar for many small files.
- Start with -p equal to logical cores; adjust if I/O-bound.
- Lower compression level for speed.
- Use nice/ionice on shared systems.
- Monitor CPU and disk to find the bottleneck.
If you want, I can provide example commands tuned to your CPU count, compression-speed tradeoff, or a script to batch-compress directories.
Leave a Reply