Moinak Ghosh
ef98422bd4
Add basic file format documentation.
...
Reduce memory threshold for switching to Similarity based Deduplication.
2013-08-18 20:11:20 +05:30
Moinak Ghosh
58f3113558
Avoid unnecessary re-hashing of 64-bit keys of the segment index.
2013-08-17 22:08:55 +05:30
Moinak Ghosh
d31c6433c2
Update free memory computation to include cached buffers.
...
Fix a potential rare corner case.
2013-08-17 11:31:44 +05:30
Moinak Ghosh
413a2a2fb1
Update Changelog and bump version for 2.3 release.
2013-08-10 10:25:41 +05:30
Moinak Ghosh
f35d0ff4ef
Fix multiple crashes for some corner cases.
...
Increase max block size for variable dedup block sizes greater than 16KB.
Update test cases and fix a test script bug.
2013-08-09 21:55:06 +05:30
Moinak Ghosh
fe18afbcf4
Use wrapper script to set paths when launching pcompress from build directory.
...
Use smaller max block size when doing global dedupe.
Fix init of executable name.
2013-08-07 22:03:52 +05:30
Moinak Ghosh
f34cfb1aa6
Make data partitioning between threads more effective.
...
Remove unnecessary computation to make Fixed block chunking faster.
2013-07-21 09:31:59 +05:30
Moinak Ghosh
2a218e9da5
Fix Dedupe Mode initialization.
2013-07-12 18:21:49 +05:30
Moinak Ghosh
8b73303488
Some minor code cleanup.
2013-07-05 22:22:11 +05:30
Moinak Ghosh
e10a13ad94
Improve accuracy of the KMV sketch computation and speed it up.
2013-07-03 19:24:06 +05:30
Moinak Ghosh
6b67e98747
Reduce similarity indicators to reduce memory use with low impact on dedupe ratio.
2013-06-30 22:38:05 +05:30
Moinak Ghosh
de0695e2c5
Add missing init of rabin block size.
2013-06-29 19:13:22 +05:30
Moinak Ghosh
17db67564d
Reduce a rollign hash parameter for a slight speedup with no side effect.
2013-06-24 21:13:32 +05:30
Moinak Ghosh
6432c76b4b
Update README formatting yet again - ugh.
2013-06-16 21:12:04 +05:30
Moinak Ghosh
52723cbbac
Update README formatting.
2013-06-16 21:09:04 +05:30
Moinak Ghosh
92be5a17f0
Update README with pointers to relevant analysis and documentation.
2013-06-16 20:46:17 +05:30
Moinak Ghosh
c0dd0102a5
A few minor fixes.
2013-06-14 22:25:01 +05:30
Moinak Ghosh
63370caee9
Remove an rpath entry meant for testing.
2013-06-03 21:28:20 +05:30
Moinak Ghosh
7743792018
Make default symbol visibility to hidden with explicit public visibility specified.
...
Add missing static scope to a few more places.
2013-06-03 20:51:00 +05:30
Moinak Ghosh
c859cf35d5
Make Pcompress functionality into a library - initial changes.
2013-06-02 20:54:33 +05:30
Moinak Ghosh
8db0bef184
Bump version and update Changelog for 2.2 release.
2013-05-28 21:45:19 +05:30
Moinak Ghosh
ab1ced942d
Update invalid environment variable handling to actually fail rather than auto-correct.
2013-05-28 21:38:35 +05:30
Moinak Ghosh
e9ce7a5ed2
Fix a crash with invalid PCOMPRESS_CHUNK_HASH_GLOBAL.
...
Update testcase to correctly detect core files.
2013-05-26 23:38:10 +05:30
Moinak Ghosh
ddaa3b6b6d
Drastic simplification of Min-heap code and resultant Delta speedup.
2013-05-25 17:34:38 +05:30
Moinak Ghosh
0a1e3b39ef
Correspond segment size to chunk size for Segmented Dedupe for better accuracy.
2013-05-15 22:20:45 +05:30
Moinak Ghosh
d89c95225a
Add a testcase for issue #10 .
2013-05-12 14:19:26 +05:30
Moinak Ghosh
cbc0c84b12
Fix issue #10 .
2013-05-12 11:54:40 +05:30
Moinak Ghosh
41b036adac
Fix issue #8 .
2013-05-10 19:51:24 +05:30
Moinak Ghosh
8b3761ee81
Update Changelog, docs and bump version for 2.1 release.
2013-05-09 18:53:11 +05:30
Moinak Ghosh
a755d59dff
Add more tests covering Segmented Global Dedupe.
...
Fix some tests.
2013-05-07 22:30:36 +05:30
Moinak Ghosh
2740a00c76
Switch location of Dedupe context creation to allow correct index memory sizing.
2013-05-07 20:50:13 +05:30
Moinak Ghosh
969e242b31
Update README with details of Global Dedupe block hash selection.
2013-05-06 23:50:56 +05:30
Moinak Ghosh
c27317d7da
Add SSE2 optimizations for Segmented Dedupe.
2013-05-05 23:34:26 +05:30
Moinak Ghosh
6ecc400571
Fix segment offset sorting.
...
Get rid of incorrect duplicate checks in index.
2013-05-05 18:50:52 +05:30
Moinak Ghosh
c6da2325e3
Allow SKEIN to be used as a Global Dedupe chunk lookup hash.
2013-05-04 15:59:29 +05:30
Moinak Ghosh
0cf94c308a
Add a qsort variant optimized for integers and use in global dedupe.
...
Cleanup LZMA CRC64/32 declarations and add a header.
Fix heapq header.
2013-05-03 22:06:55 +05:30
Moinak Ghosh
c43e99f422
Use openmp parallelism always for chunk hash computation during Global Dedupe.
2013-05-02 23:24:43 +05:30
Moinak Ghosh
120877348c
Use SHA256 for Global Dedupe chunk lookup hash by default.
...
Allow changing Global Dedupe chunk lookup hash via env variable.
2013-05-02 00:05:05 +05:30
Moinak Ghosh
6e4d45b644
Fix crash with some older GCC versions. Reported in issue #7 .
2013-05-01 19:27:43 +05:30
Moinak Ghosh
eae16b82d3
Fix issue #7 .
...
Ensure tempfile cleanup even with error abort.
2013-05-01 18:01:17 +05:30
Moinak Ghosh
b23b5789fb
Fix bugs and improve accuracy in Segmented Dedupe.
...
Fix segment hashlist size computation.
Remove unnecessary sync of segment hashlist file writes.
Pass correct number of threads to index creation routine.
Add more error checks.
Handle correct positioning of segment hashlist file offset on write error.
Add missing semaphore signaling at dedupe abort points with global dedupe.
Use closer min-values sampling for improved segmented dedupe accuracy.
Update proper checksum info in README.
2013-04-30 19:35:18 +05:30
Moinak Ghosh
074e265f70
Fix sizing of similarity hash buffer.
2013-04-26 22:36:14 +05:30
Moinak Ghosh
2f2fc23771
Tweak index size computation.
2013-04-26 19:21:11 +05:30
Moinak Ghosh
c4f3bd14c0
Update Changelog for 2.0 release.
2013-04-26 19:04:35 +05:30
Moinak Ghosh
f05c7905b2
Bump version for release.
2013-04-26 18:47:52 +05:30
Moinak Ghosh
eb964b0bde
Update README.
2013-04-26 18:46:14 +05:30
Moinak Ghosh
aed69b2d53
Add test cases for Global Deduplication.
...
Update documentation and code comments.
Remove tempfile pathname after creation to ensure clean removal after process exit.
2013-04-26 18:32:00 +05:30
Moinak Ghosh
75f62d6a36
Simplify segment lookup loop.
...
Fix assertion.
2013-04-26 10:56:29 +05:30
Moinak Ghosh
5bb028fe03
Change Segmented Dedupe flow to improve parallelism.
...
Periodically sync writes to segcache file.
Use simple insertion sort for small numbers of elements.
2013-04-25 23:42:32 +05:30
Moinak Ghosh
79a6e7f770
Capability to output data to stdout when compressing.
...
Always use segmented similarity bases dedupe when using -G option in pipe mode.
Standardize on average 8MB segment size for segmented dedupe.
Fix hashtable sizing.
Some miscellaneous cleanups.
Update README with details of new features.
2013-04-24 23:03:58 +05:30