pcompress/params.json

1 line
12 KiB
JSON
Raw Normal View History

2012-11-18 12:40:55 +00:00
{"google":"","name":"Pcompress","body":"Pcompress\r\n=========\r\n\r\nCopyright (C) 2012 Moinak Ghosh. All rights reserved.\r\nUse is subject to license terms.\r\nmoinakg (_at) gma1l _dot com.\r\nComments, suggestions, code, rants etc are welcome.\r\n\r\nPcompress is a utility to do compression and decompression in parallel by\r\nsplitting input data into chunks. It has a modular structure and includes\r\nsupport for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN\r\nchecksums for data integrity. It can also do Lempel-Ziv pre-compression\r\n(derived from libbsc) to improve compression ratios across the board. SSE\r\noptimizations for the bundled LZMA are included. It also implements\r\nchunk-level Content-Aware Deduplication and Delta Compression features\r\nbased on a Semi-Rabin Fingerprinting scheme. Delta Compression is done\r\nvia the widely popular bsdiff algorithm. Similarity is detected using a\r\ncustom hashing of maximal features of a block. When doing chunk-level\r\ndedupe it attempts to merge adjacent non-duplicate blocks index entries\r\ninto a single larger entry to reduce metadata. In addition to all these it\r\ncan internally split chunks at rabin boundaries to help dedupe and\r\ncompression.\r\n\r\nIt has low metadata overhead and overlaps I/O and compression to achieve\r\nmaximum parallelism. It also bundles a simple slab allocator to speed\r\nrepeated allocation of similar chunks. It can work in pipe mode, reading\r\nfrom stdin and writing to stdout. It also provides some adaptive compression\r\nmodes in which multiple algorithms are tried per chunk to determine the best\r\none for the given chunk. Finally it supports 14 compression levels to allow\r\nfor ultra compression modes in some algorithms.\r\n\r\nPcompress also supports encryption via AES and uses Scrypt from Tarsnap\r\nfor Password Based Key generation.\r\n\r\nNOTE: This utility is Not an archiver. It compresses only single files or\r\n datastreams. To archive use something else like tar, cpio or pax.\r\n\r\nUsage\r\n=====\r\n\r\n To compress a file:\r\n pcompress -c <algorithm> [-l <compress level>] [-s <chunk size>] <file>\r\n Where <algorithm> can be the folowing:\r\n lzfx - Very fast and small algorithm based on LZF.\r\n lz4 - Ultra fast, high-throughput algorithm reaching RAM B/W at level1.\r\n zlib - The base Zlib format compression (not Gzip).\r\n lzma - The LZMA (Lempel-Ziv Markov) algorithm from 7Zip.\r\n lzmaMt - Multithreaded version of LZMA. This is a faster version but\r\n uses more memory for the dictionary. Thread count is balanced\r\n between chunk processing threads and algorithm threads.\r\n bzip2 - Bzip2 Algorithm from libbzip2.\r\n ppmd - The PPMd algorithm excellent for textual data. PPMd requires\r\n at least 64MB X CPUs more memory than the other modes.\r\n\r\n libbsc - A Block Sorting Compressor using the Burrows Wheeler Transform\r\n like Bzip2 but runs faster and gives better compression than\r\n Bzip2 (See: libbsc.com).\r\n\r\n adapt - Adaptive mode where ppmd or bzip2 will be used per chunk,\r\n depending on heuristics. If at least 50% of the input data is\r\n 7-bit text then PPMd will be used otherwise Bzip2.\r\n adapt2 - Adaptive mode which includes ppmd and lzma. If at least 80% of\r\n the input data is 7-bit text then PPMd will be used otherwise\r\n LZMA. It has significantly more memory usage than adapt.\r\n none - No compression. This is only meaningful with -D and -E so Dedupe\r\n can be done for post-processing with an external utility.\r\n <chunk_size> - This can be in bytes or can use the following suffixes:\r\n g - Gigabyte, m - Megabyte, k - Kilobyte.\r\n Larger chunks produce better compression at the cost of memory.\r\n <compress_level> - Can be a number from 0 meaning minimum and 14 meaning\r\n