Commit graph

344 commits

Author SHA1 Message Date
Moinak Ghosh
c43e99f422 Use openmp parallelism always for chunk hash computation during Global Dedupe. 2013-05-02 23:24:43 +05:30
Moinak Ghosh
120877348c Use SHA256 for Global Dedupe chunk lookup hash by default.
Allow changing Global Dedupe chunk lookup hash via env variable.
2013-05-02 00:05:05 +05:30
Moinak Ghosh
6e4d45b644 Fix crash with some older GCC versions. Reported in issue #7. 2013-05-01 19:27:43 +05:30
Moinak Ghosh
eae16b82d3 Fix issue #7.
Ensure tempfile cleanup even with error abort.
2013-05-01 18:01:17 +05:30
Moinak Ghosh
b23b5789fb Fix bugs and improve accuracy in Segmented Dedupe.
Fix segment hashlist size computation.
Remove unnecessary sync of segment hashlist file writes.
Pass correct number of threads to index creation routine.
Add more error checks.
Handle correct positioning of segment hashlist file offset on write error.
Add missing semaphore signaling at dedupe abort points with global dedupe.
Use closer min-values sampling for improved segmented dedupe accuracy.
Update proper checksum info in README.
2013-04-30 19:35:18 +05:30
Moinak Ghosh
074e265f70 Fix sizing of similarity hash buffer. 2013-04-26 22:36:14 +05:30
Moinak Ghosh
2f2fc23771 Tweak index size computation. 2013-04-26 19:21:11 +05:30
Moinak Ghosh
c4f3bd14c0 Update Changelog for 2.0 release. 2013-04-26 19:04:35 +05:30
Moinak Ghosh
f05c7905b2 Bump version for release. 2013-04-26 18:47:52 +05:30
Moinak Ghosh
eb964b0bde Update README. 2013-04-26 18:46:14 +05:30
Moinak Ghosh
aed69b2d53 Add test cases for Global Deduplication.
Update documentation and code comments.
Remove tempfile pathname after creation to ensure clean removal after process exit.
2013-04-26 18:32:00 +05:30
Moinak Ghosh
75f62d6a36 Simplify segment lookup loop.
Fix assertion.
2013-04-26 10:56:29 +05:30
Moinak Ghosh
5bb028fe03 Change Segmented Dedupe flow to improve parallelism.
Periodically sync writes to segcache file.
Use simple insertion sort for small numbers of elements.
2013-04-25 23:42:32 +05:30
Moinak Ghosh
79a6e7f770 Capability to output data to stdout when compressing.
Always use segmented similarity bases dedupe when using -G option in pipe mode.
Standardize on average 8MB segment size for segmented dedupe.
Fix hashtable sizing.
Some miscellaneous cleanups.
Update README with details of new features.
2013-04-24 23:03:58 +05:30
Moinak Ghosh
6c5d8d9e18 Optimize index lookup for 8-byte keys.
More cleanups.
2013-04-24 19:49:43 +05:30
Moinak Ghosh
5d6ffd969d More tweaks to slightly improve segment dedupe efficiency.
Use on average 8MB segments for all cases.
Some minor cleanps.
2013-04-24 19:13:07 +05:30
Moinak Ghosh
eabd670790 Improve segment similarity detection and drastically reduce index size. 2013-04-23 23:15:32 +05:30
Moinak Ghosh
b32f4b3f9a Improve duplicate segment match detection. 2013-04-23 20:51:12 +05:30
Moinak Ghosh
6b7d883393 Tweak percentage intervals computation to improve segmented dedupe ratio.
Avoid repeat processing of already processed segments.
2013-04-23 18:53:56 +05:30
Moinak Ghosh
d29f125ca7 Clean up temp cache dir handling.
Allow temp dir setting via specific env variable to point to fast devices like ramdisk,ssd.
2013-04-22 22:57:31 +05:30
Moinak Ghosh
2c4024792a Several bugfixes.
Avoid matching with self during hash lookup.
2013-04-22 22:07:07 +05:30
Moinak Ghosh
6b23f6a73a Several fixes and optimizations. 2013-04-22 19:52:18 +05:30
Moinak Ghosh
c0b4aa0116 Many optimizations and changes to Segmented Global Dedupe.
Use chunk hash based similarity matching rather than content based.
Use sorting to order hash buffer rather than min-heap for better accuracy.
Use fast CRC64 for similarity hash for speed and lower memory requirements.
2013-04-21 18:11:16 +05:30
Moinak Ghosh
3b8a5813fd Many optimizations to segmented global dedupe.
Use chunk hash based cumulative similarity matching instead of chunk content.
2013-04-19 22:51:51 +05:30
Moinak Ghosh
2f6ccca6e5 Update usage text and add minor tweaks. 2013-04-18 22:55:49 +05:30
Moinak Ghosh
426c0d0bf2 Properly cleanup global dedupe state. 2013-04-18 21:36:36 +05:30
Moinak Ghosh
8ae571124d Complete implementation for Segmented Global Deduplication. 2013-04-18 21:26:24 +05:30
Moinak Ghosh
a22b52cf08 Work in progress changes for Segmented Global Deduplication. 2013-04-14 23:51:54 +05:30
Moinak Ghosh
50251107de Work in progress changes for Segmented Global Deduplication. 2013-04-09 22:23:51 +05:30
Moinak Ghosh
3d7a179a77 Work in progress changes for scalable segmented global deduplication.
Allow user-specified environment setting to control in-memory index size.
2013-04-06 15:15:27 +05:30
Moinak Ghosh
c357452079 Implement global dedupe in pipe mode.
Update hash index calculations to use upto 75% memavail when file size is not known.
Use little-endian nonce format for Salsa20.
2013-03-29 15:18:25 +05:30
Moinak Ghosh
19b304f30c Add global index cleanup function.
Fix location of sem_wait().
More comments.
2013-03-25 21:04:16 +05:30
Moinak Ghosh
1143207cd5 Add check to disable Delta Compression with Global deduplication for now. 2013-03-24 23:30:40 +05:30
Moinak Ghosh
fbf4658635 Implement Global Deduplication. 2013-03-24 23:21:17 +05:30
Moinak Ghosh
876796be5c Work in progress changes for global dedupe. 2013-03-21 22:00:38 +05:30
Moinak Ghosh
b7fdeb08bc Work in progress global dedupe changes. 2013-03-20 22:47:03 +05:30
Moinak Ghosh
f2806d4ffa Work in progress global dedupe changes. 2013-03-19 20:13:44 +05:30
Moinak Ghosh
f8f23e5200 Major License text cleanup. 2013-03-07 20:26:48 +05:30
Moinak Ghosh
370e84f2be Update Changelog and bump version for 1.4 release.
Add license text to Salsa20 files.
2013-03-06 17:15:03 +05:30
Moinak Ghosh
45aa726474 Update couple more test parameters with new crypto options. 2013-03-06 00:04:15 +05:30
Moinak Ghosh
e41f156beb Update README and test cases with new crypto options.
Update usage text.
2013-03-05 21:07:54 +05:30
Moinak Ghosh
fa9fbdb7a4 Cleanup more stack parameters after use in various crypto functions.
Fix comment.
2013-03-05 20:46:18 +05:30
Moinak Ghosh
cf053c0257 Fix increment of XSalsa20 192-bit nonce value.
Handle nonce bytes in endian neutral way.
2013-03-04 23:48:12 +05:30
Moinak Ghosh
dce424ec85 Use 128-bit key length when decompressing older version archives. 2013-03-04 22:35:33 +05:30
Moinak Ghosh
20250aa5dc Add XSalsa20 encryption algorithm from the NaCL library.
Include 128-bit key support based on the Salsa20 eSTREAM submission.
Allow variable-length nonces.
Use random bytes for initial nonce value.
Increase PBE hash rounds to 50000.
2013-03-04 21:56:07 +05:30
Moinak Ghosh
e16b408061 Move Scrypt helper function out of AES module.
Fix a compiler warning.
2013-03-03 21:55:59 +05:30
Moinak Ghosh
7a29c7be1e Change default encryption key length to 256 bits.
Add optional ability to change key length at runtime via cli option.
Include key length property in archive header.
Fix header HMAC to include salt, nonce and key length properties.
Retain backward compatibility to handle older format archives.
Fix compilation of AES ASM code.
2013-03-03 20:02:14 +05:30
Moinak Ghosh
72b23dac1a Add AES-NI optimized code derived from latest OpenSSL upstream.
Add AES instruction set detection.
Add missing license headers to a few files.
2013-02-25 19:23:51 +05:30
Moinak Ghosh
532cd2a941 Add Vector Permute AES from OpenSSL 1.0.1e. Remain compatible with older OpenSSL versions. 2013-02-24 23:52:34 +05:30
Moinak Ghosh
efe5232cdc Add compatibility to decode old-format parallel hashes created with version 1.2.
Bump archive version to 7 as parallel hashes are now merkle style hashes.
2013-02-24 20:05:16 +05:30