Create gh-pages branch via GitHub

This commit is contained in:
Moinak Ghosh 2012-11-27 06:30:44 -08:00
parent 139381231a
commit 02196bc61e
2 changed files with 2 additions and 2 deletions

View file

@ -46,7 +46,7 @@
<p>Pcompress can do both compression and decompression in parallel by splitting input data into chunks. It has a modular structure and includes support for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN/SHA checksums for data integrity. It can also do Lempel-Ziv-Prediction pre-compression (derived from libbsc) to improve compression ratios across the board. SSE optimizations for the bundled LZMA are included. It also implements chunk-level Content-Aware Deduplication and Delta Compression features
based on a rolling hash algorithm derived from the Rabin Fingerprinting approach. Other open-source deduplication software like <a href="http://opendedup.org/">OpenDedup</a> and <a href="http://www.lessfs.com/wordpress/">LessFS</a> use fixed block dedupe while <a href="http://backuppc.sourceforge.net/">BackupPC</a> does file-level dedupe only (single-instance storage). Of course OpenDedup and LessFS are Fuse based filesystems doing inline dedupe of primary storage while Pcompress is only meant for archival storage as of today.</p>
<p>Delta Compression is implemented via the widely popular bsdiff algorithm. Chunk Similarity is detected using an adaptation of <a href="http://en.wikipedia.org/wiki/MinHash">MinHashing</a>. It has low metadata overhead and overlaps I/O and compression to achieve maximum parallelism. It also bundles a simple mempool allocator to speed repeated allocation of similar chunks. It can work in pipe mode, reading from stdin and writing to stdout. It also provides adaptive compression modes in which some simple data heuristics are applied in an attempt to select a near-optimal algorithm per chunk.</p>
<p>Delta Compression is implemented via the widely popular bsdiff algorithm. Chunk Similarity is detected using an adaptation of <a href="http://en.wikipedia.org/wiki/MinHash">MinHashing</a>. It has low metadata overhead and overlaps I/O and compression to achieve maximum parallelism. It also bundles a simple mempool allocator to speed repeated allocation of similar chunks. It can work in pipe mode, reading from stdin and writing to stdout. It also provides adaptive compression modes in which some simple data heuristics are applied in an attempt to select a good algorithm per chunk.</p>
<p>Pcompress also supports encryption via AES and uses Scrypt from <a href="http://www.tarsnap.com/">Tarsnap</a> for secure Password Based Key generation.</p>

File diff suppressed because one or more lines are too long