diff --git a/README.md b/README.md index 1ff9bb1..7886a43 100644 --- a/README.md +++ b/README.md @@ -8,29 +8,29 @@ Comments, suggestions, code, rants etc are welcome. Pcompress is a utility to do compression and decompression in parallel by splitting input data into chunks. It has a modular structure and includes -support for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN -checksums for data integrity. It can also do Lempel-Ziv pre-compression +support for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN/ +SHA checksums for data integrity. It can also do Lempel-Ziv pre-compression (derived from libbsc) to improve compression ratios across the board. SSE optimizations for the bundled LZMA are included. It also implements chunk-level Content-Aware Deduplication and Delta Compression features based on a Semi-Rabin Fingerprinting scheme. Delta Compression is done via the widely popular bsdiff algorithm. Similarity is detected using a -custom hashing of maximal features of a block. When doing chunk-level -dedupe it attempts to merge adjacent non-duplicate blocks index entries -into a single larger entry to reduce metadata. In addition to all these it -can internally split chunks at rabin boundaries to help dedupe and -compression. +technique based on MinHashing. When doing chunk-level dedupe it attempts +to merge adjacent non-duplicate blocks index entries into a single larger +entry to reduce metadata. In addition to all these it can internally split +chunks at rabin boundaries to help dedupe and compression. It has low metadata overhead and overlaps I/O and compression to achieve maximum parallelism. It also bundles a simple slab allocator to speed repeated allocation of similar chunks. It can work in pipe mode, reading -from stdin and writing to stdout. It also provides some adaptive compression -modes in which multiple algorithms are tried per chunk to determine the best -one for the given chunk. Finally it supports 14 compression levels to allow +from stdin and writing to stdout. It also provides adaptive compression +modes in which data analysis heuristics are used to identify near-optimal +algorithms per chunk. Finally it supports 14 compression levels to allow for ultra compression modes in some algorithms. Pcompress also supports encryption via AES and uses Scrypt from Tarsnap -for Password Based Key generation. +for Password Based Key generation. A unique key is generated per session +even if the same password is used and HMAC is used to do authentication. NOTE: This utility is Not an archiver. It compresses only single files or datastreams. To archive use something else like tar, cpio or pax.