Update README.

This commit is contained in:
Moinak Ghosh 2012-11-18 23:19:22 +05:30
parent 393ced991a
commit 2909a3abff

View file

@ -8,29 +8,29 @@ Comments, suggestions, code, rants etc are welcome.
Pcompress is a utility to do compression and decompression in parallel by Pcompress is a utility to do compression and decompression in parallel by
splitting input data into chunks. It has a modular structure and includes splitting input data into chunks. It has a modular structure and includes
support for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN support for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN/
checksums for data integrity. It can also do Lempel-Ziv pre-compression SHA checksums for data integrity. It can also do Lempel-Ziv pre-compression
(derived from libbsc) to improve compression ratios across the board. SSE (derived from libbsc) to improve compression ratios across the board. SSE
optimizations for the bundled LZMA are included. It also implements optimizations for the bundled LZMA are included. It also implements
chunk-level Content-Aware Deduplication and Delta Compression features chunk-level Content-Aware Deduplication and Delta Compression features
based on a Semi-Rabin Fingerprinting scheme. Delta Compression is done based on a Semi-Rabin Fingerprinting scheme. Delta Compression is done
via the widely popular bsdiff algorithm. Similarity is detected using a via the widely popular bsdiff algorithm. Similarity is detected using a
custom hashing of maximal features of a block. When doing chunk-level technique based on MinHashing. When doing chunk-level dedupe it attempts
dedupe it attempts to merge adjacent non-duplicate blocks index entries to merge adjacent non-duplicate blocks index entries into a single larger
into a single larger entry to reduce metadata. In addition to all these it entry to reduce metadata. In addition to all these it can internally split
can internally split chunks at rabin boundaries to help dedupe and chunks at rabin boundaries to help dedupe and compression.
compression.
It has low metadata overhead and overlaps I/O and compression to achieve It has low metadata overhead and overlaps I/O and compression to achieve
maximum parallelism. It also bundles a simple slab allocator to speed maximum parallelism. It also bundles a simple slab allocator to speed
repeated allocation of similar chunks. It can work in pipe mode, reading repeated allocation of similar chunks. It can work in pipe mode, reading
from stdin and writing to stdout. It also provides some adaptive compression from stdin and writing to stdout. It also provides adaptive compression
modes in which multiple algorithms are tried per chunk to determine the best modes in which data analysis heuristics are used to identify near-optimal
one for the given chunk. Finally it supports 14 compression levels to allow algorithms per chunk. Finally it supports 14 compression levels to allow
for ultra compression modes in some algorithms. for ultra compression modes in some algorithms.
Pcompress also supports encryption via AES and uses Scrypt from Tarsnap Pcompress also supports encryption via AES and uses Scrypt from Tarsnap
for Password Based Key generation. for Password Based Key generation. A unique key is generated per session
even if the same password is used and HMAC is used to do authentication.
NOTE: This utility is Not an archiver. It compresses only single files or NOTE: This utility is Not an archiver. It compresses only single files or
datastreams. To archive use something else like tar, cpio or pax. datastreams. To archive use something else like tar, cpio or pax.