pcompress/params.json

1 line
13 KiB
JSON
Raw Normal View History

2012-11-25 09:56:17 +00:00
{"note":"Don't delete this file! It's used internally to help with page regeneration.","tagline":"A Parallel Compression and Deduplication utility","google":"UA-36422648-1","body":"Introduction\r\n============\r\nPcompress is an attempt to revisit **Data Compression** using unique combinations of existing and some new techniques. Both high compression ratio and performance are key goals along with the ability to leverage all the cores on a multi-core CPU. It also aims to bring to the table scalable, high-throughput Global **Deduplication** of archival storage. The deduplication capability is also available for single-file compression modes providing very interesting capabilities. Other projects providing some of these features include [Lrzip](http://ck.kolivas.org/apps/lrzip/), [eXdupe](http://www.exdupe.com/). Full archivers providing some of the similar features include the excellent [FreeArc](http://freearc.org/) and [PeaZIP](http://peazip.sourceforge.net/). Pcompress is not an archiver but provides a unique combination of features to both maximize compression ratio and provide high speed.\r\n\r\nPcompress can do both compression and decompression in parallel by splitting input data into chunks. It has a modular structure and includes support for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN/SHA checksums for data integrity. It can also do Lempel-Ziv-Prediction pre-compression (derived from libbsc) to improve compression ratios across the board. SSE optimizations for the bundled LZMA are included. It also implements chunk-level Content-Aware Deduplication and Delta Compression features\r\nbased on a rolling hash algorithm derived from the Rabin Fingerprinting approach. Other open-source deduplication software like [OpenDedup](http://opendedup.org/) and [LessFS](http://www.lessfs.com/wordpress/) use fixed block dedupe while [BackupPC](http://backuppc.sourceforge.net/) does file-level dedupe only (single-instance storage). Of course OpenDedup and LessFS are Fuse based filesystems doing inline dedupe of primary storage while Pcompress is only meant for archival storage as of today.\r\n\r\nDelta Compression is implemented via the widely popular bsdiff algorithm. Chunk Similarity is detected using an adaptation of [MinHashing](http://en.wikipedia.org/wiki/MinHash). It has low metadata overhead and overlaps I/O and compression to achieve maximum parallelism. It also bundles a simple mempool allocator to speed repeated allocation of similar chunks. It can work in pipe mode, reading from stdin and writing to stdout. It also provides adaptive compression modes in which some simple data heuristics are applied in an attempt to select a near-optimal algorithm per chunk.\r\n\r\nPcompress also supports encryption via AES and uses Scrypt from [Tarsnap](http://www.tarsnap.com/) for secure Password Based Key generation.\r\n\r\nNOTE: This utility is Not an archiver. It compresses only single files or datastreams. To archive use something else like tar, cpio or pax.\r\n\r\nBlog articles\r\n=============\r\nSee [Pcompress blogs](https://moinakg.wordpress.com/tag/pcompress/).\r\n\r\nUsage\r\n=====\r\n\r\n To compress a file:\r\n pcompress -c <algorithm> [-l <compress level>] [-s <chunk size>] <file>\r\n Where <algorithm> can be the folowing:\r\n lzfx - Very fast and small algorithm based on LZF.\r\n lz4 - Ultra fast, high-throughput algorithm reaching RAM B/W at level1.\r\n zlib - The base Zlib format compression (not Gzip).\r\n lzma - The LZMA (Lempel-Ziv Markov) algorithm from 7Zip.\r\n lzmaMt - Multithreaded version of LZMA. This is a faster version but\r\n uses more memory for the dictionary. Thread count is balanced\r\n between chunk processing threads and algorithm threads.\r\n bzip2 - Bzip2 Algorithm from libbzip2.\r\n ppmd - The PPMd algorithm excellent for textual data. PPMd requires\r\n at least 64MB X CPUs more memory than the other modes.\r\n\r\n libbsc - A Block Sorting Compressor using the Bu