diff --git a/Changelog b/Changelog index 490ca76..c5b65f0 100644 --- a/Changelog +++ b/Changelog @@ -1,3 +1,32 @@ +== 2.1 Update Release == +Add more tests covering Segmented Global Dedupe. +Fix some tests. +Switch location of Dedupe context creation to allow correct index memory sizing. +Update README with details of Global Dedupe block hash selection. +Add SSE2 optimizations for Segmented Dedupe. +Fix segment offset sorting. +Get rid of incorrect duplicate checks in index. +Allow SKEIN to be used as a Global Dedupe chunk lookup hash. +Add a qsort variant optimized for integers and use in global dedupe. +Cleanup LZMA CRC64/32 declarations and add a header. +Fix heapq header. +Use openmp parallelism always for chunk hash computation during Global Dedupe. +Use SHA256 for Global Dedupe chunk lookup hash by default. +Allow changing Global Dedupe chunk lookup hash via env variable. +Fix crash with some older GCC versions. Reported in issue #7. +Fix issue #7. +Ensure tempfile cleanup even with error abort. +Fix bugs and improve accuracy in Segmented Dedupe. +Fix segment hashlist size computation. +Remove unnecessary sync of segment hashlist file writes. +Pass correct number of threads to index creation routine. +Add more error checks. +Handle correct positioning of segment hashlist file offset on write error. +Add missing semaphore signaling at dedupe abort points with global dedupe. +Use closer min-values sampling for improved segmented dedupe accuracy. +Update proper checksum info in README. +Fix sizing of similarity hash buffer. +Tweak index size computation. == 2.0 Major Release == Add test cases for Global Deduplication. Update documentation and code comments. diff --git a/README.md b/README.md index b098be0..383f549 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ Pcompress ========= -Copyright (C) 2012 Moinak Ghosh. All rights reserved. +Copyright (C) 2012-2013 Moinak Ghosh. All rights reserved. Use is subject to license terms. moinakg (_at) gma1l _dot com. Comments, suggestions, code, rants etc are welcome. @@ -12,13 +12,13 @@ support for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN/ SHA checksums for data integrity. It can also do Lempel-Ziv pre-compression (derived from libbsc) to improve compression ratios across the board. SSE optimizations for the bundled LZMA are included. It also implements -chunk-level Content-Aware Deduplication and Delta Compression features -based on a Semi-Rabin Fingerprinting scheme. Delta Compression is done -via the widely popular bsdiff algorithm. Similarity is detected using a -technique based on MinHashing. When doing chunk-level dedupe it attempts -to merge adjacent non-duplicate blocks index entries into a single larger -entry to reduce metadata. In addition to all these it can internally split -chunks at rabin boundaries to help dedupe and compression. +Variable Block Deduplication and Delta Compression features based on a +Semi-Rabin Fingerprinting scheme. Delta Compression is done via the widely +popular bsdiff algorithm. Similarity is detected using a technique based +on MinHashing. When doing Dedupe it attempts to merge adjacent non- +duplicate block index entries into a single larger entry to reduce metadata. +In addition to all these it can internally split chunks at rabin boundaries +to help Dedupe and compression. It has low metadata overhead and overlaps I/O and compression to achieve maximum parallelism. It also bundles a simple slab allocator to speed diff --git a/pcompress.h b/pcompress.h index 56f4060..e9f6c1d 100644 --- a/pcompress.h +++ b/pcompress.h @@ -44,7 +44,7 @@ extern "C" { #define FLAG_DEDUP 1 #define FLAG_DEDUP_FIXED 2 #define FLAG_SINGLE_CHUNK 4 -#define UTILITY_VERSION "2.0" +#define UTILITY_VERSION "2.1" #define MASK_CRYPTO_ALG 0x30 #define MAX_LEVEL 14