Moinak Ghosh
aed69b2d53
Add test cases for Global Deduplication.
...
Update documentation and code comments.
Remove tempfile pathname after creation to ensure clean removal after process exit.
2013-04-26 18:32:00 +05:30
Moinak Ghosh
5bb028fe03
Change Segmented Dedupe flow to improve parallelism.
...
Periodically sync writes to segcache file.
Use simple insertion sort for small numbers of elements.
2013-04-25 23:42:32 +05:30
Moinak Ghosh
79a6e7f770
Capability to output data to stdout when compressing.
...
Always use segmented similarity bases dedupe when using -G option in pipe mode.
Standardize on average 8MB segment size for segmented dedupe.
Fix hashtable sizing.
Some miscellaneous cleanups.
Update README with details of new features.
2013-04-24 23:03:58 +05:30
Moinak Ghosh
6c5d8d9e18
Optimize index lookup for 8-byte keys.
...
More cleanups.
2013-04-24 19:49:43 +05:30
Moinak Ghosh
5d6ffd969d
More tweaks to slightly improve segment dedupe efficiency.
...
Use on average 8MB segments for all cases.
Some minor cleanps.
2013-04-24 19:13:07 +05:30
Moinak Ghosh
eabd670790
Improve segment similarity detection and drastically reduce index size.
2013-04-23 23:15:32 +05:30
Moinak Ghosh
6b7d883393
Tweak percentage intervals computation to improve segmented dedupe ratio.
...
Avoid repeat processing of already processed segments.
2013-04-23 18:53:56 +05:30
Moinak Ghosh
d29f125ca7
Clean up temp cache dir handling.
...
Allow temp dir setting via specific env variable to point to fast devices like ramdisk,ssd.
2013-04-22 22:57:31 +05:30
Moinak Ghosh
2c4024792a
Several bugfixes.
...
Avoid matching with self during hash lookup.
2013-04-22 22:07:07 +05:30
Moinak Ghosh
6b23f6a73a
Several fixes and optimizations.
2013-04-22 19:52:18 +05:30
Moinak Ghosh
c0b4aa0116
Many optimizations and changes to Segmented Global Dedupe.
...
Use chunk hash based similarity matching rather than content based.
Use sorting to order hash buffer rather than min-heap for better accuracy.
Use fast CRC64 for similarity hash for speed and lower memory requirements.
2013-04-21 18:11:16 +05:30
Moinak Ghosh
3b8a5813fd
Many optimizations to segmented global dedupe.
...
Use chunk hash based cumulative similarity matching instead of chunk content.
2013-04-19 22:51:51 +05:30
Moinak Ghosh
2f6ccca6e5
Update usage text and add minor tweaks.
2013-04-18 22:55:49 +05:30
Moinak Ghosh
8ae571124d
Complete implementation for Segmented Global Deduplication.
2013-04-18 21:26:24 +05:30
Moinak Ghosh
a22b52cf08
Work in progress changes for Segmented Global Deduplication.
2013-04-14 23:51:54 +05:30
Moinak Ghosh
50251107de
Work in progress changes for Segmented Global Deduplication.
2013-04-09 22:23:51 +05:30