Commit graph

  • 969e242b31 Update README with details of Global Dedupe block hash selection. Moinak Ghosh 2013-05-06 23:50:56 +0530
  • 3ac09c1760 Create gh-pages branch via GitHub Moinak Ghosh 2013-05-06 07:01:20 -0700
  • c27317d7da Add SSE2 optimizations for Segmented Dedupe. Moinak Ghosh 2013-05-05 23:34:26 +0530
  • 6ecc400571 Fix segment offset sorting. Get rid of incorrect duplicate checks in index. Moinak Ghosh 2013-05-05 18:50:52 +0530
  • c6da2325e3 Allow SKEIN to be used as a Global Dedupe chunk lookup hash. Moinak Ghosh 2013-05-04 15:59:29 +0530
  • 0cf94c308a Add a qsort variant optimized for integers and use in global dedupe. Cleanup LZMA CRC64/32 declarations and add a header. Fix heapq header. Moinak Ghosh 2013-05-03 22:06:55 +0530
  • c43e99f422 Use openmp parallelism always for chunk hash computation during Global Dedupe. Moinak Ghosh 2013-05-02 23:24:43 +0530
  • 120877348c Use SHA256 for Global Dedupe chunk lookup hash by default. Allow changing Global Dedupe chunk lookup hash via env variable. Moinak Ghosh 2013-05-02 00:05:05 +0530
  • 6e4d45b644 Fix crash with some older GCC versions. Reported in issue #7. Moinak Ghosh 2013-05-01 19:27:43 +0530
  • eae16b82d3 Fix issue #7. Ensure tempfile cleanup even with error abort. Moinak Ghosh 2013-05-01 18:01:17 +0530
  • b23b5789fb Fix bugs and improve accuracy in Segmented Dedupe. Fix segment hashlist size computation. Remove unnecessary sync of segment hashlist file writes. Pass correct number of threads to index creation routine. Add more error checks. Handle correct positioning of segment hashlist file offset on write error. Add missing semaphore signaling at dedupe abort points with global dedupe. Use closer min-values sampling for improved segmented dedupe accuracy. Update proper checksum info in README. Moinak Ghosh 2013-04-30 19:35:18 +0530
  • 074e265f70 Fix sizing of similarity hash buffer. 2.0Major Moinak Ghosh 2013-04-26 22:36:14 +0530
  • 2f2fc23771 Tweak index size computation. Moinak Ghosh 2013-04-26 19:21:11 +0530
  • c4f3bd14c0 Update Changelog for 2.0 release. Moinak Ghosh 2013-04-26 19:04:35 +0530
  • f05c7905b2 Bump version for release. Moinak Ghosh 2013-04-26 18:47:52 +0530
  • eb964b0bde Update README. Moinak Ghosh 2013-04-26 18:46:14 +0530
  • aed69b2d53 Add test cases for Global Deduplication. Update documentation and code comments. Remove tempfile pathname after creation to ensure clean removal after process exit. Moinak Ghosh 2013-04-26 18:32:00 +0530
  • 75f62d6a36 Simplify segment lookup loop. Fix assertion. Moinak Ghosh 2013-04-26 10:56:29 +0530
  • 5bb028fe03 Change Segmented Dedupe flow to improve parallelism. Periodically sync writes to segcache file. Use simple insertion sort for small numbers of elements. Moinak Ghosh 2013-04-25 23:42:32 +0530
  • 79a6e7f770 Capability to output data to stdout when compressing. Always use segmented similarity bases dedupe when using -G option in pipe mode. Standardize on average 8MB segment size for segmented dedupe. Fix hashtable sizing. Some miscellaneous cleanups. Update README with details of new features. Moinak Ghosh 2013-04-24 23:03:58 +0530
  • 6c5d8d9e18 Optimize index lookup for 8-byte keys. More cleanups. Moinak Ghosh 2013-04-24 19:49:43 +0530
  • 5d6ffd969d More tweaks to slightly improve segment dedupe efficiency. Use on average 8MB segments for all cases. Some minor cleanps. Moinak Ghosh 2013-04-24 19:13:07 +0530
  • eabd670790 Improve segment similarity detection and drastically reduce index size. Moinak Ghosh 2013-04-23 23:15:32 +0530
  • b32f4b3f9a Improve duplicate segment match detection. Moinak Ghosh 2013-04-23 20:51:12 +0530
  • 6b7d883393 Tweak percentage intervals computation to improve segmented dedupe ratio. Avoid repeat processing of already processed segments. Moinak Ghosh 2013-04-23 18:53:56 +0530
  • d29f125ca7 Clean up temp cache dir handling. Allow temp dir setting via specific env variable to point to fast devices like ramdisk,ssd. Moinak Ghosh 2013-04-22 22:57:31 +0530
  • 2c4024792a Several bugfixes. Avoid matching with self during hash lookup. Moinak Ghosh 2013-04-22 22:07:07 +0530
  • 6b23f6a73a Several fixes and optimizations. Moinak Ghosh 2013-04-22 19:52:18 +0530
  • c0b4aa0116 Many optimizations and changes to Segmented Global Dedupe. Use chunk hash based similarity matching rather than content based. Use sorting to order hash buffer rather than min-heap for better accuracy. Use fast CRC64 for similarity hash for speed and lower memory requirements. Moinak Ghosh 2013-04-21 18:11:16 +0530
  • 3b8a5813fd Many optimizations to segmented global dedupe. Use chunk hash based cumulative similarity matching instead of chunk content. Moinak Ghosh 2013-04-19 22:51:51 +0530
  • 2f6ccca6e5 Update usage text and add minor tweaks. Moinak Ghosh 2013-04-18 22:55:49 +0530
  • 426c0d0bf2 Properly cleanup global dedupe state. Moinak Ghosh 2013-04-18 21:36:36 +0530
  • 8ae571124d Complete implementation for Segmented Global Deduplication. Moinak Ghosh 2013-04-18 21:26:24 +0530
  • a22b52cf08 Work in progress changes for Segmented Global Deduplication. Moinak Ghosh 2013-04-14 23:51:54 +0530
  • 50251107de Work in progress changes for Segmented Global Deduplication. Moinak Ghosh 2013-04-09 22:23:51 +0530
  • 3d7a179a77 Work in progress changes for scalable segmented global deduplication. Allow user-specified environment setting to control in-memory index size. Moinak Ghosh 2013-04-06 15:15:27 +0530
  • c357452079 Implement global dedupe in pipe mode. Update hash index calculations to use upto 75% memavail when file size is not known. Use little-endian nonce format for Salsa20. Moinak Ghosh 2013-03-29 15:18:25 +0530
  • 19b304f30c Add global index cleanup function. Fix location of sem_wait(). More comments. Moinak Ghosh 2013-03-25 21:04:16 +0530
  • 1143207cd5 Add check to disable Delta Compression with Global deduplication for now. Moinak Ghosh 2013-03-24 23:30:40 +0530
  • fbf4658635 Implement Global Deduplication. Moinak Ghosh 2013-03-24 23:21:17 +0530
  • 876796be5c Work in progress changes for global dedupe. Moinak Ghosh 2013-03-21 22:00:38 +0530
  • b7fdeb08bc Work in progress global dedupe changes. Moinak Ghosh 2013-03-20 22:47:03 +0530
  • f2806d4ffa Work in progress global dedupe changes. Moinak Ghosh 2013-03-19 20:13:44 +0530
  • f61d9993be Create gh-pages branch via GitHub Moinak Ghosh 2013-03-07 08:53:20 -0800
  • f8f23e5200 Major License text cleanup. Moinak Ghosh 2013-03-07 20:26:48 +0530
  • 370e84f2be Update Changelog and bump version for 1.4 release. Add license text to Salsa20 files. 1.4Update Moinak Ghosh 2013-03-06 17:15:03 +0530
  • 45aa726474 Update couple more test parameters with new crypto options. Moinak Ghosh 2013-03-06 00:04:15 +0530
  • e41f156beb Update README and test cases with new crypto options. Update usage text. Moinak Ghosh 2013-03-05 21:07:54 +0530
  • fa9fbdb7a4 Cleanup more stack parameters after use in various crypto functions. Fix comment. Moinak Ghosh 2013-03-05 20:46:18 +0530
  • cf053c0257 Fix increment of XSalsa20 192-bit nonce value. Handle nonce bytes in endian neutral way. Moinak Ghosh 2013-03-04 23:48:12 +0530
  • dce424ec85 Use 128-bit key length when decompressing older version archives. Moinak Ghosh 2013-03-04 22:35:33 +0530
  • 20250aa5dc Add XSalsa20 encryption algorithm from the NaCL library. Include 128-bit key support based on the Salsa20 eSTREAM submission. Allow variable-length nonces. Use random bytes for initial nonce value. Increase PBE hash rounds to 50000. Moinak Ghosh 2013-03-04 21:56:07 +0530
  • e16b408061 Move Scrypt helper function out of AES module. Fix a compiler warning. Moinak Ghosh 2013-03-03 21:55:59 +0530
  • 7a29c7be1e Change default encryption key length to 256 bits. Add optional ability to change key length at runtime via cli option. Include key length property in archive header. Fix header HMAC to include salt, nonce and key length properties. Retain backward compatibility to handle older format archives. Fix compilation of AES ASM code. Moinak Ghosh 2013-03-03 20:02:14 +0530
  • 72b23dac1a Add AES-NI optimized code derived from latest OpenSSL upstream. Add AES instruction set detection. Add missing license headers to a few files. Moinak Ghosh 2013-02-25 19:23:51 +0530
  • 532cd2a941 Add Vector Permute AES from OpenSSL 1.0.1e. Remain compatible with older OpenSSL versions. Moinak Ghosh 2013-02-24 23:52:34 +0530
  • efe5232cdc Add compatibility to decode old-format parallel hashes created with version 1.2. Bump archive version to 7 as parallel hashes are now merkle style hashes. Moinak Ghosh 2013-02-24 20:05:16 +0530
  • 5f6217bb1f Add lookup and insert functionality for global index. Make global dedupe code buildable. Moinak Ghosh 2013-02-21 23:07:07 +0530
  • cb853821c7 Use PPMd fallback for adapt2 if BSC is not enabled. Moinak Ghosh 2013-02-17 22:01:29 +0530
  • f41ea40bb9 Improve XML detection in adaptive mode. Moinak Ghosh 2013-02-17 21:36:20 +0530
  • 6badbcaea7 Make global dedupe bits buildable and fix errors. Rename Adaptive compression type constants to avoid conflict with global constants. Moinak Ghosh 2013-02-17 21:05:40 +0530
  • 7386f82a4f Work in progress global dedupe. Moinak Ghosh 2013-02-16 23:33:06 +0530
  • f89473d29c Fixes for issues/warnings reported in issue #4. Moinak Ghosh 2013-02-15 22:53:17 +0530
  • 24d62bfde9 Global dedupe work in progress. Moinak Ghosh 2013-02-14 23:10:53 +0530
  • 1eae57c8a2 Starting changes for single-file global dedupe. Moinak Ghosh 2013-02-12 21:53:04 +0530
  • 3e1737b4ab Use OpenMP parallelism when computing xxHashes for chunks. Moinak Ghosh 2013-02-02 09:27:58 +0530
  • 6bfd044311 Use 2-stage Merkle Tree hashing for parallel hashes for better crypto properties. Update xxhash comment. Moinak Ghosh 2013-02-01 22:07:28 +0530
  • af4c6e1d84 Reduce dedupe hash table collisions by half. Moinak Ghosh 2013-01-31 00:38:41 +0530
  • 3d8f3ada1c Improve Deduplication performance by another 95%. Start sliding window scanning near minimum chunk size boundaries to avoid scanning whole chunk. Moinak Ghosh 2013-01-30 22:41:13 +0530
  • 02ac25e560 Create gh-pages branch via GitHub Moinak Ghosh 2013-01-29 18:59:19 -0800
  • 9983d79e62 Bump version and update changelog for 1.3.0 release. Fix issue #3. 1.3Perf Moinak Ghosh 2013-01-29 21:42:54 +0530
  • a03e3ba41b Fix return from parallel versions of Keccak_Hash() function. Moinak Ghosh 2013-01-28 00:51:47 +0530
  • 468044d816 Add parallel versions of various checksums for single-segment, single-thread compression. Moinak Ghosh 2013-01-27 23:47:55 +0530
  • 2da0d0950b Use BLAKE2 parallel version for single-chunk archives (whole file in one chunk). Set decompression threads correctly for single-chunk archives. Moinak Ghosh 2013-01-26 18:28:13 +0530
  • 68f60ba50b Update default checksum to BLAKE256. Moinak Ghosh 2013-01-26 17:40:23 +0530
  • d08b5ea399 Add optimized BLAKE2 implementations with runtime detection of CPU capability (SSE/AVX). Minor cleanups. Moinak Ghosh 2013-01-26 15:39:10 +0530
  • 43af97042a Major changes to use Intel's optimized SHA512 code for SHA512 and SHA512/256. Remove earlier SHA256 code which is slower than SHA512/256 (on 64-bit CPU). Use HMAC from Alan Saddi's implementation for cleaner, faster code. Moinak Ghosh 2013-01-25 22:55:55 +0530
  • 26bb137257 Changes for generalized runtime SSE/AVX/XOP detection. Multi instruction set XXhash build with runtime selection. Extend CPUID code to detect more instruction sets. Add options for BLAKE2 hash. Move GCC builtins into utils header. Bump file format version number due to extended digest flags. Add descriptions to digest list. Moinak Ghosh 2013-01-25 00:10:12 +0530
  • 7b7c85dab4 Rationalize XXHash implementation to deal with 32-byte blocks instead of 16-byte. Fix XXHash performance degradation for small keys. Modify a data analysis loop in adaptive compress to make it auto-vectorizable. Moinak Ghosh 2013-01-23 20:58:39 +0530
  • 5c8704c5bb Improve Deduplication throughtput by 90%. Use SSE4 register as sliding window for default 16-byte window size. Use local variable for sliding window position to avoid spurios memory access in non-SIMD case. Avoid computing breakpoint check value if processed length < minimum block length. Moinak Ghosh 2013-01-22 15:54:42 +0530
  • e9e3e1e632 Improve SSE version detection. Add SSE4 detection. Fix setting of some opt flags in Makefile.in. Moinak Ghosh 2013-01-20 22:53:36 +0530
  • 3888c8d316 Many optimization tweaks Optimize Rabin Deduplication and Bsdiff Vectorize XXHash using SE4 Moinak Ghosh 2013-01-20 22:02:26 +0530
  • 455c8107d5 Use pre-increment for shorter instruction length and slight speed. Moinak Ghosh 2013-01-17 22:54:30 +0530
  • 49ec3a054d Add SSE2 improvements to CTR mode AES. Add debug print of encryption and HMAC throughput. Fix error message for invalid option. Moinak Ghosh 2013-01-16 19:52:46 +0530
  • 39dbc4be43 Implement algo-specific minimum distance match for Delta Compression. Moinak Ghosh 2013-01-14 13:20:07 +0530
  • d49a088eea Fixes and performance improvements for Dedupe Delta Compression Avoid using fingerprints in minhash computation and fix write amplification Modify min-heap to use 64bit values Improve bsdiff performance Fix pointer comparison in bsdiff Use 32bit offsets in bsdiff to reduce memory usage Improve Zero RLE Encoder performance Add more buffer overflow checks in Zero RLE Decoder Moinak Ghosh 2013-01-13 22:04:59 +0530
  • 87aa12206e Use SSE3 lddqu in the matchfinder if SSE3 is enabled. Moinak Ghosh 2013-01-06 19:28:40 +0530
  • 976a12afbe Remove outdated LZP note. Moinak Ghosh 2013-01-05 19:59:17 +0530
  • e05a58f035 Create gh-pages branch via GitHub Moinak Ghosh 2013-01-05 05:08:15 -0800
  • 254f03a7a2 Update Changelog for last fixes. 1.2Stable Moinak Ghosh 2013-01-05 17:03:41 +0530
  • a8fd60fb06 Fix issue #1 and issue #2. Enable building with older openssl (at least 0.9.8e). Add variants of missing functions in older openssl versions. Allow proper linking with libraries in alternate locations and setting RUNPATH. Increase hash rounds for nonce generation. Moinak Ghosh 2013-01-05 00:16:15 +0530
  • 16b1d9e7a3 Bump version, update Changelog and documentation for 1.2 release. Moinak Ghosh 2013-01-03 23:40:21 +0530
  • 47ebd5b752 Fix calculation of extra scratch space for Dedupe. Add missing return value check in small buffer Delta2. Update test failure detection in test driver. Moinak Ghosh 2013-01-03 22:25:14 +0530
  • d9eb82e0e8 Fix numeric parsing. Fix dedupe bug introduced in last commit. Reset valid flag when resetting dedupe context. Cleanup test suites. Do not abort test suite on failure of a test case. Moinak Ghosh 2013-01-03 00:27:18 +0530
  • 6b756eb165 Fix Delta2 handling for small buffers. Fix LZP handling during preprocessing. Fix type flag handling during preprocessing. Update test cases to use configurable list of test corpus files. Update some more int64_t datatypes to uint64_t Add a gitignore. Moinak Ghosh 2013-01-02 22:56:21 +0530
  • 43f5acfa2d Add more comments to code. Moinak Ghosh 2012-12-31 23:27:31 +0530
  • 13d9378acd Update to latest XXHash version. Moinak Ghosh 2012-12-31 11:53:47 +0530
  • 8bfa49fc66 Fix Keccak invocation and reset default checksum back to SKEIN 256. Moinak Ghosh 2012-12-30 01:13:55 +0530
  • 28224d29d3 Improve Dedupe performance. Add more debug timing stats. Change default checksum to Keccak 256 (SIMD version 4x faster than Skein). Fix compiler warning in allocator code. Moinak Ghosh 2012-12-29 23:43:41 +0530
  • 36d95276ee Further improvements to Delta2 performance. Fix the byteswap macros. Start adding assertions. Moinak Ghosh 2012-12-28 22:12:38 +0530