Commit graph

283 commits

Author SHA1 Message Date
Moinak Ghosh
8386e72566 Rewrite core dedupe logic to simplify code and improve performance.
Hashtable based chunk-level deduplication instead of Quicksort.
Fix a corner case bug in Dedupe decompression.
2012-09-23 14:57:09 +05:30
Moinak Ghosh
99a8e4cd98 Speed up Hash computation for dedupe blocks.
Add missing initialization of sliding window.
Update help text.
2012-09-19 20:29:44 +05:30
Moinak Ghosh
e3befd9e16 Add support for Fixed-Block deduplication.
More refactoring of symbol names.
2012-09-16 11:12:58 +05:30
Moinak Ghosh
b9355a5dcc Reduce dedupe loop checks for slight speed edge.
Beginnings of Fixed-block dedupe.
Update variable name for clarity.
2012-09-15 11:14:58 +05:30
Moinak Ghosh
a6b3719d89 Fix conditional in heapq function. 2012-09-11 21:59:08 +05:30
Moinak Ghosh
f3f472b860 Implement K-min-values Sketch for Similarity detection. 2012-09-11 20:26:36 +05:30
Moinak Ghosh
117382c141 Update README to reflect current features. 2012-09-07 21:32:20 +05:30
Moinak Ghosh
05a010a9dd Bump version to 0.8.1
Update Changelog for 0.8.1 release.
2012-09-07 19:38:36 +05:30
Moinak Ghosh
fb0aef0bd6 Fix return code handling in LZP pre-compression, crashed adaptive modes. 2012-09-07 19:31:35 +05:30
Moinak Ghosh
e6f042aaf8 Allow user-specified minimum Dedupe block size.
Compute similarity sketch only if Delta Compression enabled.
2012-09-05 22:43:54 +05:30
Moinak Ghosh
560fa85aab Fix secondary sketch computation, some more accuracy in diff detection. 2012-09-04 23:28:02 +05:30
Moinak Ghosh
262566b59a Add xxHash for Rabin block checksums, slightly faster than CRC64.
Fix missing initialization of character counts table.
Some file reorganization.
2012-09-02 20:40:32 +05:30
Moinak Ghosh
4ba840b255 Add ASM version of Skein for x64 platforms with auto-detection
Error checking for checksum flag when decompressing
Update comments and READMEs
2012-09-01 14:40:15 +05:30
Moinak Ghosh
eda312ce1e Add support for Skein512 and Skein256 checksums
Import Skein code from NIST CD submission
Make checksum algorithms pluggable
Fix handling of huge buffers (>2GB) in LZP
Cleanup of some buffer sizing code
Speed up CRC64 calculation in dedupe chunking
2012-08-31 22:36:06 +05:30
Moinak Ghosh
f03834278a Bump version to 0.8.
Update Changelog for 0.8 beta release.
2012-08-27 22:53:39 +05:30
Moinak Ghosh
a222772940 Fix single chunk flag handling during decompression.
Update docs.
2012-08-27 22:24:23 +05:30
Moinak Ghosh
d75535bc7e Add support for libbsc a high-performance block sorting compressor.
Enable external algorithm threading for single chunk compressed files.
Update docs.
2012-08-27 21:51:55 +05:30
Moinak Ghosh
3b83bc2d4e Bump file version. 2012-08-26 15:01:18 +05:30
Moinak Ghosh
5f41057f9c Add config script to generate Makefile based on flags.
Add install target and installation readme file.
A few comment changes.
2012-08-26 14:09:24 +05:30
Moinak Ghosh
d4e9cd0140 Improve memory efficiency when total file size < total size of chunks.
Fix freeing of Zlib structures.
2012-08-24 20:16:21 +05:30
Moinak Ghosh
bf149e880d Add LZP Pre-Compression support ported from libbsc.
Add generic pre-processing wrappers for future support of other pre-processors.
Clean up computation of Rabin block sizes.
Compute Rabin scratch space accurately to avoid RAM wastage.
2012-08-23 22:58:44 +05:30
Moinak Ghosh
3851c9c6cc Delay allocation of per-thread chunks for performance and memory efficiency.
Avoid allocating double-buffer for single-chunk files.
Introduce lzmaMt option to indicate multithreaded LZMA.
Update README.
2012-08-18 22:00:14 +05:30
Moinak Ghosh
9eac774eb1 Add multithreaded LZMA port from p7zip
Compute balanced thread count between chunk threads and algo threads
Generic way to handle querying algorithm parameters
Clean up unnecessary includes
2012-08-18 10:20:52 +05:30
Moinak Ghosh
023dcae19a Speed up sort comparator function. 2012-08-17 13:18:50 +05:30
Moinak Ghosh
2dadf411fa Reduce memory consumption and improve performance in dedupe
Re-introduce crc64 for dedup blocks to avoid wasted memcpy-s
Restructe block array to be an array of pointers allocated on demand
Fix a corner case issue when splitting chunks at a dedup boundary
2012-08-17 11:03:02 +05:30
Moinak Ghosh
55d0485d34 Improve Rabin computations using an irreducible polynomial
Slight improvement to similarity computation
A simple mechanism to include DEBUG mode stats
Include stdint for common int types
2012-08-15 20:13:40 +05:30
Moinak Ghosh
3150bdbed7 Implement secondary sketch based on character counts to refine similarity checksum.
Proper checksum update for last block.
Update comments.
2012-08-12 13:06:49 +05:30
Moinak Ghosh
7b7007d6c5 Update version to 0.7 (0.6 was alpha). 2012-08-10 11:01:42 +05:30
Moinak Ghosh
eb2ee30d0d Update Changelog for 0.6 beta. 2012-08-10 10:57:40 +05:30
Moinak Ghosh
bde917c8e9 Fix handling of compression flags in adaptive mode
Fix error handling when chunk size is too small for dedupe
Bump version to 0.6
2012-08-10 10:47:11 +05:30
Moinak Ghosh
6b6e564886 Fix initialization of adaptive modes. 2012-08-10 10:15:20 +05:30
Moinak Ghosh
f2ffcad2fd Compute and compare Mean sketch cksum to improve similarity comparison
Fix optflags settings in Makefile
Small optimization in zero RLE encoder to avoid scanning during lookahead
Some minor fixes
2012-08-09 23:57:24 +05:30
Moinak Ghosh
400d0bfa72 Bias fingerprint value with occurrence counts for a better sketch
Fix latent bug when calling algo deinit in decompression code
Reduce diff threshold for slightly greater delta encoding
Limit similar buffer size difference for less wasted diffing
Change zlib compression wrapper to use faster deflateReset mechanism
Reduce optimization level for Dedupe code, it goes faster
2012-08-08 22:40:58 +05:30
Moinak Ghosh
a4311f2ede Fix handling of incompressible chunks.
Fix handling of various dedup failures.
Add NULL compression option for dedup only compression.
2012-08-05 22:35:51 +05:30
Moinak Ghosh
927da81562 Remove unneeded checks in qsort comparator. 2012-08-05 18:58:40 +05:30
Moinak Ghosh
2cbcb0c9e4 Fix buffer sizing for LZ4.
Fix exit condition checks in LZ4 decompression wrapper.
2012-08-04 17:55:20 +05:30
Moinak Ghosh
f9215b53fb Fix buffer size calculation when decompressing LZ4, Zlib and Bzip2 compressed chunks.
Slight SSE optimization in LZ4HC.
2012-08-03 23:19:38 +05:30
Moinak Ghosh
636ab4a3d8 Update Changelog for 0.6. 2012-08-02 22:01:05 +05:30
Moinak Ghosh
2c516c009c Fix crash when algo init function returns error.
Fix LZFX error handling.
More updates to README.
2012-07-31 21:07:35 +05:30
Moinak Ghosh
203008def9 Further improve LZMA compression parameters to utilize all the 14 levels.
Tweak some Rabin parmeters for better reduction with zlib and Bzip2.
2012-07-30 23:30:13 +05:30
Moinak Ghosh
7ff2cb74c4 Increase the small size slabs a bit.
Move 64Bit integer hashing function to common file for use in other places.
2012-07-29 15:02:51 +05:30
Moinak Ghosh
a6f3756e68 Fix slab sizing. 2012-07-29 00:36:20 +05:30
Moinak Ghosh
94563a7ecd Fix buffer size computation when allocating Rabin block array.
Reduce memory usage of Rabin block array.
Add an SSE optimization for bsdiff.
Move integer hashing function to utils file.
More updates to README.
2012-07-28 23:55:24 +05:30
Moinak Ghosh
f83652aa90 Update README. 2012-07-27 22:25:44 +05:30
Moinak Ghosh
bc71caffc3 Display release version in usage text. 2012-07-27 22:07:56 +05:30
Moinak Ghosh
c7cc7b469c Update chunk size computation to reduce memory usage.
Implement runtime bypass of custom allocator.
Update README.
2012-07-27 22:03:24 +05:30
Moinak Ghosh
9c3423530c Fix huge chunk handling in zlib compression routines.
Fix Zlib error messages.
Remove extra variable in Bzip2 decompress routine.
2012-07-27 00:11:01 +05:30
Moinak Ghosh
b586d30359 Fix huge buffer (>2GB) handling in Bzip2 compress routine. 2012-07-26 21:47:33 +05:30
Moinak Ghosh
5ad944e368 Update usage text. 2012-07-25 21:12:30 +05:30
Moinak Ghosh
296e2ab6b2 Add support for LZ4 compression including multi-pass LZ4.
Add missing Read_Adjusted() declaration, was causing a crash with 2GB chunks.
Fix minor cut-paste issues in comments.
2012-07-25 21:07:36 +05:30