Moinak Ghosh
f3f472b860
Implement K-min-values Sketch for Similarity detection.
2012-09-11 20:26:36 +05:30
Moinak Ghosh
e6f042aaf8
Allow user-specified minimum Dedupe block size.
...
Compute similarity sketch only if Delta Compression enabled.
2012-09-05 22:43:54 +05:30
Moinak Ghosh
560fa85aab
Fix secondary sketch computation, some more accuracy in diff detection.
2012-09-04 23:28:02 +05:30
Moinak Ghosh
262566b59a
Add xxHash for Rabin block checksums, slightly faster than CRC64.
...
Fix missing initialization of character counts table.
Some file reorganization.
2012-09-02 20:40:32 +05:30
Moinak Ghosh
eda312ce1e
Add support for Skein512 and Skein256 checksums
...
Import Skein code from NIST CD submission
Make checksum algorithms pluggable
Fix handling of huge buffers (>2GB) in LZP
Cleanup of some buffer sizing code
Speed up CRC64 calculation in dedupe chunking
2012-08-31 22:36:06 +05:30
Moinak Ghosh
bf149e880d
Add LZP Pre-Compression support ported from libbsc.
...
Add generic pre-processing wrappers for future support of other pre-processors.
Clean up computation of Rabin block sizes.
Compute Rabin scratch space accurately to avoid RAM wastage.
2012-08-23 22:58:44 +05:30
Moinak Ghosh
023dcae19a
Speed up sort comparator function.
2012-08-17 13:18:50 +05:30
Moinak Ghosh
2dadf411fa
Reduce memory consumption and improve performance in dedupe
...
Re-introduce crc64 for dedup blocks to avoid wasted memcpy-s
Restructe block array to be an array of pointers allocated on demand
Fix a corner case issue when splitting chunks at a dedup boundary
2012-08-17 11:03:02 +05:30
Moinak Ghosh
55d0485d34
Improve Rabin computations using an irreducible polynomial
...
Slight improvement to similarity computation
A simple mechanism to include DEBUG mode stats
Include stdint for common int types
2012-08-15 20:13:40 +05:30
Moinak Ghosh
3150bdbed7
Implement secondary sketch based on character counts to refine similarity checksum.
...
Proper checksum update for last block.
Update comments.
2012-08-12 13:06:49 +05:30
Moinak Ghosh
bde917c8e9
Fix handling of compression flags in adaptive mode
...
Fix error handling when chunk size is too small for dedupe
Bump version to 0.6
2012-08-10 10:47:11 +05:30
Moinak Ghosh
f2ffcad2fd
Compute and compare Mean sketch cksum to improve similarity comparison
...
Fix optflags settings in Makefile
Small optimization in zero RLE encoder to avoid scanning during lookahead
Some minor fixes
2012-08-09 23:57:24 +05:30
Moinak Ghosh
400d0bfa72
Bias fingerprint value with occurrence counts for a better sketch
...
Fix latent bug when calling algo deinit in decompression code
Reduce diff threshold for slightly greater delta encoding
Limit similar buffer size difference for less wasted diffing
Change zlib compression wrapper to use faster deflateReset mechanism
Reduce optimization level for Dedupe code, it goes faster
2012-08-08 22:40:58 +05:30
Moinak Ghosh
927da81562
Remove unneeded checks in qsort comparator.
2012-08-05 18:58:40 +05:30
Moinak Ghosh
203008def9
Further improve LZMA compression parameters to utilize all the 14 levels.
...
Tweak some Rabin parmeters for better reduction with zlib and Bzip2.
2012-07-30 23:30:13 +05:30
Moinak Ghosh
94563a7ecd
Fix buffer size computation when allocating Rabin block array.
...
Reduce memory usage of Rabin block array.
Add an SSE optimization for bsdiff.
Move integer hashing function to utils file.
More updates to README.
2012-07-28 23:55:24 +05:30
Moinak Ghosh
c7cc7b469c
Update chunk size computation to reduce memory usage.
...
Implement runtime bypass of custom allocator.
Update README.
2012-07-27 22:03:24 +05:30
Moinak Ghosh
53d4311534
Make LZFX Hash size dynamic.
...
Use smaller min rabin block when using fast compression algos.
Add missing check for algo init function return value.
2012-07-23 21:43:12 +05:30
Moinak Ghosh
8cfd54fe34
Add LZFX Compression support, a very fast lightweight compressor.
...
Avoid a branch in the rabin loop.
2012-07-23 00:15:08 +05:30
Moinak Ghosh
7e14909ad1
Separate initial rabin boundary detection and block splitting for performance.
...
Also fix a rare corner case latent bug.
2012-07-22 21:27:44 +05:30
Moinak Ghosh
962a2cae8a
Compress Dedup index only if it is at least 90 bytes to avoid expansion.
...
Some minor cleanup.
2012-07-22 00:00:41 +05:30
Moinak Ghosh
b69dcf4d55
Remove debug statements.
2012-07-20 21:38:39 +05:30
Moinak Ghosh
fd7c7e9a65
Use 4-byte ints for header values instead of 8-byte size_t.
...
Use RLE on control data if it reduces the size.
Update some comments.
Use scratch space at end of data chunk, if available.
2012-07-20 20:53:46 +05:30
Moinak Ghosh
e788eb43b8
Implement Delta Encoding based on modified bsdiff.
...
Change to more accurate Sketch value computation approach.
2012-07-19 21:41:07 +05:30
Moinak Ghosh
1da2c40888
Use a rolling checksum based sketch value for a rabin chunk instead of a CRC64 checksum.
...
Avoids additional table-lookup memory access.
Reduce Rabin window size to avoid overflows in sketch value.
No need to maintain rolling checksum in Rabin context.
A few comment cleanups.
2012-07-13 22:06:55 +05:30
Moinak Ghosh
0091a0da02
Remove debug messages.
...
Fix last segment detection.
2012-07-10 21:11:31 +05:30
Moinak Ghosh
a873f92e41
Fix crash when decompressing deduped archive.
...
Ensure correct level is passed to lzma.
Avoid branch when wrapping rabin window position and check for rabin window size to be power of 2.
Update rabin parameters check for adaptive modes.
Add detection of 7-bit text/8-bit binary data for later use.
2012-07-10 20:14:23 +05:30
Moinak Ghosh
db0c9ea9ac
Improve LZMA compression parameters at extreme levels.
...
Fix incorrect thread calculation.
Remove some cruft.
2012-07-09 23:28:11 +05:30
Moinak Ghosh
010f49f412
Implement ability to partition chunks at the last rabin boundary instead of fixed size.
2012-07-08 21:44:08 +05:30
Moinak Ghosh
d3f5287ee5
Update License info to LGPLv3.
2012-07-07 22:18:29 +05:30
Moinak Ghosh
172432698e
Adjust Rabin parameters for chunksize > LZMA window size.
2012-07-07 00:01:13 +05:30
Moinak Ghosh
ea923b84f0
Use different min block size and Rabin break pattern depending on compression algo.
...
Cleanup some cruft.
2012-07-06 23:24:12 +05:30
Moinak Ghosh
f5ce45b16e
Techniques to better reduce Rabin Metadata.
...
Fix wrong chunk sizing with dedup enabled.
2012-07-06 00:16:02 +05:30
Moinak Ghosh
774384c204
Remove debug statement.
2012-07-04 23:39:03 +05:30
Moinak Ghosh
3f0e1952ef
Fix compile warnings.
2012-07-04 23:20:10 +05:30
Moinak Ghosh
1eee08040f
Use diferent average Rabin block sizes depending on compression algorithm.
...
Misc cleanups.
2012-07-03 22:47:24 +05:30
Moinak Ghosh
a13c61e926
Change rabin index encoding scheme for better metadata compression.
2012-07-02 22:08:03 +05:30
Moinak Ghosh
a1825a2305
Implement Parallel deduplication support.
...
Restructure compression functions to take chunk flag as argument.
Add missing error flag printing in LZMA.
Only create enough threads as needed by chunk size and file size.
Minor cleanups and variable name changes.
2012-07-01 21:44:02 +05:30
Moinak Ghosh
f9c3644459
Updates to Rabin based Dedup.
...
Change command line option.
2012-06-29 23:45:06 +05:30
Moinak Ghosh
cbf9728278
Implement Deduplication based on Rabin Fingerprinting: work in progress.
...
Fix bug that prevented pipe mode from being used.
Allow building without specialized allocator.
Use basic optimize flag in debuig build.
2012-06-29 18:23:55 +05:30
Moinak Ghosh
8f5f531967
Add license and other minor fixes.
2012-06-21 20:40:43 +05:30
Moinak Ghosh
733923cbf2
Add ability to adjust chunk boundary based on Rabin Fingerprinting to improve compression.
...
Remove unnecessary checks in compression loop.
2012-06-21 20:27:05 +05:30