Commit graph

53 commits

Author SHA1 Message Date
Moinak Ghosh
6a757ddb2c Multitue of tweaks and improvements.
* Use BSC for PNM type and Markup containing binary data.
* Change thresholds in analyzer.
* Properly use double precision in analyzer for accuracy.
* Indicate BSC processing of packPNM output
* Bring back raw-block Dispack for file not processed by Dispack filter.
2015-03-22 23:36:04 +05:30
Moinak Ghosh
d5e1d2cdef Some fixes in the Dictionary preprocessor.
Fix checking of data type flags.
Allow file-level filters to change output data type.
Tweak analyzer threshold for markup type.
2015-01-13 19:59:09 +05:30
Moinak Ghosh
077da83d5d A bunch of small fixes in Dict.
Improve text analysis for markup tags.
Use Libbsc for plain text and PPMd for markup mixed text.
Change thresholds.
2015-01-11 17:36:46 +05:30
Moinak Ghosh
753360e479 Tweak some data type settings. 2014-11-19 20:19:16 +05:30
Moinak Ghosh
507e7c75d3 Centralise data analysis routine for optimum performance and leverage.
Utilise buffer data analysis for preprocessing filters.
2014-11-06 22:23:33 +05:30
Moinak Ghosh
2e5f2d8aab Make DICT filter useful.
Improve data analysis in adaptive_compress.
2014-09-20 21:49:06 +05:30
Moinak Ghosh
c1411a6af6 More whitespace cleanup and MPLv2 licensing support. 2014-07-24 23:48:42 +05:30
Moinak Ghosh
6fba8aa8ac More OSX compatibility code.
Fix new warnings with Gcc 4.8.
2014-04-28 00:12:51 +05:30
Moinak Ghosh
33281a2257 Fix issue #17.
Use LZ4 and Libbsc extra padding space for the compression buffer in adaptive modes.
2014-01-29 00:12:04 +05:30
Moinak Ghosh
683c3e48b5 Detect some DICOM formats and use BSC for DICOM data. 2014-01-01 19:44:58 +05:30
Moinak Ghosh
ea345a902a Overhaul documentation part #1
Detect and handle uncompressed PDF files using libbsc.
Force binary/text data detection for tar archives.
Get rid of unnecessary CLI option.
Add full pipeline mode check when archiving.
2013-12-30 23:24:37 +05:30
Moinak Ghosh
4c75a2da48 Fix issue #12.
Fix issue #13.
Create output directory with correct mode.
Fix the flow where pathname list is not sorted.
Fix ppmd decompression bug introduced in previous commit.
Reduce compression level for automatic pathname sorting.
Change to extraction directory only after opening archive.
2013-12-27 23:49:47 +05:30
Moinak Ghosh
a022a958c3 Free PPMD buffer after compression, rather than caching.
Introduce new API in allocator to release buffer to OS.
Release LZMA buffers after use.
2013-12-21 23:32:27 +05:30
Moinak Ghosh
271414535e Drastically reduce memory consumption of PPMD8 in adaptive mode (Use lower max model order). 2013-12-21 20:42:38 +05:30
Moinak Ghosh
733e6f8245 Do not use Libbsc for TIFFs. Not all TIFFs compress well with Libbsc.
Fix DEBUG-STATS build for Dispack.
2013-12-06 22:53:41 +05:30
Moinak Ghosh
5e484f0694 Use libbsc for AVI and MP4 files. 2013-12-04 20:07:52 +05:30
Moinak Ghosh
3f62cdf7d5 Use Libbsc for MP4 and FLAC files.
Change some rare file type codes to indicate some common types.
2013-12-03 21:56:07 +05:30
Moinak Ghosh
958bdf7edc Use Libbsc for TIFF images.
Workaround for packJPG limitation.
2013-12-02 21:50:19 +05:30
Moinak Ghosh
c4c4b47138 Use Libbsc for DNA Sequence data instead of PPMD. Faster, better compression.
Fix pz extension handling for real.
2013-11-30 09:58:21 +05:30
Moinak Ghosh
306f145f22 Use libbsc/ppmd for BMP files.
Fix extension based hashing.
Do not append .pz extension to filenames already having it.
Some code formatting changes.
2013-11-28 22:42:51 +05:30
Moinak Ghosh
0192790c02 Add Dispack filter with auto-detection of x86 executables in archive mode.
More elaborate magic header based detection of 32-bit and 64-bit x86 binaries.
Always use fast-mode LZ4 in Adaptive modes.
2013-11-24 19:45:58 +05:30
Moinak Ghosh
1e2c3e479a Optimize preprocessed compression and avoid a bunch of memory copies.
Fix a crash.
Add a few more file types.
More comments.
2013-11-22 20:44:26 +05:30
Moinak Ghosh
11584cab52 Add fast handling of totally incompressible data (like Jpegs) in adaptive modes.
Add function to indicate totally incompressible data when archiving.
Reformat if statements in some places to reduce branching.
2013-11-15 21:06:23 +05:30
Moinak Ghosh
6aacd903ff Structured handling of file types.
Handling of already compressed data based on compression algorithm.
Add a few more extension types.
2013-11-09 16:46:19 +05:30
Moinak Ghosh
cae9de9b2e Leverage file type detection(archiver) to improve compression performance.
Use detected file/data type(archiver) for Adaptive compression modes.
Update type flags and add more extensions.
2013-11-08 23:50:28 +05:30
Moinak Ghosh
8c1f4ebe61 Add a simple log facility.
Refactor all printfs to use log facility.
2013-10-02 20:45:33 +05:30
Moinak Ghosh
cbc0c84b12 Fix issue #10. 2013-05-12 11:54:40 +05:30
Moinak Ghosh
f8f23e5200 Major License text cleanup. 2013-03-07 20:26:48 +05:30
Moinak Ghosh
5f6217bb1f Add lookup and insert functionality for global index.
Make global dedupe code buildable.
2013-02-21 23:07:07 +05:30
Moinak Ghosh
cb853821c7 Use PPMd fallback for adapt2 if BSC is not enabled. 2013-02-17 22:01:29 +05:30
Moinak Ghosh
f41ea40bb9 Improve XML detection in adaptive mode. 2013-02-17 21:36:20 +05:30
Moinak Ghosh
6badbcaea7 Make global dedupe bits buildable and fix errors.
Rename Adaptive compression type constants to avoid conflict with global constants.
2013-02-17 21:05:40 +05:30
Moinak Ghosh
7b7c85dab4 Rationalize XXHash implementation to deal with 32-byte blocks instead of 16-byte.
Fix XXHash performance degradation for small keys.
Modify a data analysis loop in adaptive compress to make it auto-vectorizable.
2013-01-23 20:58:39 +05:30
Moinak Ghosh
39dbc4be43 Implement algo-specific minimum distance match for Delta Compression. 2013-01-14 13:20:07 +05:30
Moinak Ghosh
26a4f42506 Introduce strict compiler flags and fix scores of warnings/issues.
Avoid different optimization flags for Dedupe sources.
Fix liberal mixing of uint64_t and int64_t (should all be uint64_t).
Fix corner case crash when decompressing.
2012-12-27 23:06:48 +05:30
Moinak Ghosh
b0f41c2888 Add matrix transpose to Delta2 encoding.
Change confusing structure member name.
2012-12-13 21:18:16 +05:30
Moinak Ghosh
224fb529e9 Get rid of size_t in places where 64-bitness is assumed. 2012-12-09 10:15:06 +05:30
Moinak Ghosh
29b0d8fd7b Implement Adaptive Delta Encoding. 2012-12-05 00:09:47 +05:30
Moinak Ghosh
33c727e6e7 Fix building without Libbsc support.
Add more tests for corrupted encrypted files.
2012-11-26 20:21:03 +05:30
Moinak Ghosh
d054e0f713 Zlib optimizations. Use raw deflate streams to avoid unnecessary adler32.
Change some function signatures to improve algo init function behavior.
Fix corner case dedupe bug in error handling flow.
Bump archive version signature.
2012-11-22 21:02:50 +05:30
Moinak Ghosh
3aa33f5b94 Minor fix for adapt mode. 2012-11-04 21:46:04 +05:30
Moinak Ghosh
f0c7ba87a3 Use Libbsc for XML data in adapt2 mode. 2012-11-04 21:13:26 +05:30
Moinak Ghosh
8f8af7ed6b Update adaptive mode heuristic based on algorithms.
Remove incorrect check in PPMd decompression code.
More refactoring of variable names.
2012-09-27 22:29:08 +05:30
Moinak Ghosh
449dc35675 Speed up adaptive modes by using heuristics to select compression algorithm.
Select similarity percentage based on dedupe block size for effectiveness.
2012-09-26 19:47:32 +05:30
Moinak Ghosh
9eac774eb1 Add multithreaded LZMA port from p7zip
Compute balanced thread count between chunk threads and algo threads
Generic way to handle querying algorithm parameters
Clean up unnecessary includes
2012-08-18 10:20:52 +05:30
Moinak Ghosh
bde917c8e9 Fix handling of compression flags in adaptive mode
Fix error handling when chunk size is too small for dedupe
Bump version to 0.6
2012-08-10 10:47:11 +05:30
Moinak Ghosh
6b6e564886 Fix initialization of adaptive modes. 2012-08-10 10:15:20 +05:30
Moinak Ghosh
d3f5287ee5 Update License info to LGPLv3. 2012-07-07 22:18:29 +05:30
Moinak Ghosh
a1825a2305 Implement Parallel deduplication support.
Restructure compression functions to take chunk flag as argument.
Add missing error flag printing in LZMA.
Only create enough threads as needed by chunk size and file size.
Minor cleanups and variable name changes.
2012-07-01 21:44:02 +05:30
Moinak Ghosh
7e9f636f8d Change adaptive modes. 2012-06-01 22:04:08 +05:30