Moinak Ghosh
7ef20ec5be
Specialized dictionary encoding for FASTA files.
2015-02-01 16:20:03 -08:00
Moinak Ghosh
30dee9a1a9
Improve check in E8E9 filter to avoid unnecessary encodes.
...
Allow small increase in output chunk size for transform preprocessing (E8E9).
Ensure chunk allocations always include oerhead size.
2015-01-25 23:17:23 -08:00
Moinak Ghosh
d5e1d2cdef
Some fixes in the Dictionary preprocessor.
...
Fix checking of data type flags.
Allow file-level filters to change output data type.
Tweak analyzer threshold for markup type.
2015-01-13 19:59:09 +05:30
Moinak Ghosh
077da83d5d
A bunch of small fixes in Dict.
...
Improve text analysis for markup tags.
Use Libbsc for plain text and PPMd for markup mixed text.
Change thresholds.
2015-01-11 17:36:46 +05:30
Moinak Ghosh
66a482c968
A new Dictionary preprocessor for text files.
2015-01-09 22:13:24 +05:30
Moinak Ghosh
73307c3996
Multiple checks and balances in Dispack to avoid buffer overlfow.
...
Allow filter variants to omit the standard header.
Use E8E9 in Dispack filter as a fallback.
Fix integer overflow for type value in thread data struct.
Do not inline functions in DEBUG build.
2014-12-21 14:13:58 +05:30
Moinak Ghosh
1db822d866
Add Dispack file-level filter in the libarchive chain.
...
Add new file type for Win32-PE executables (Dispack).
Reset file type flag after filter processing for better compression.
Fix array index handling for file type list.
2014-12-20 11:24:09 +05:30
Moinak Ghosh
2cd41ec257
Revamp Filter handling code.
...
1) Really avoid adding filter xattr for non-processed files.
2) Clean up filter error handling.
3) Avoid libarchive data writes in filter callbacks.
4) Have libarchive data writes in a single place.
5) Properly handle skipping filter processing for a file.
6) Fix temporary file pathname handling.
2014-12-14 23:37:40 +05:30
Moinak Ghosh
dfe18ef48f
Fix missed archive entry record.
...
Fix enabling of metadata stream feature.
Fix log message text.
Use macro for path separator.
2014-12-11 23:16:26 +05:30
Moinak Ghosh
f970b41e34
A bunch of improvements and fixes.
...
- Fix heap corruption in DICT Filter.
- Make default Dedup block size as 8KB.
- Revamp executable file handling: Part#1.
- Developed new E8E9 filter that works better than Dispack on raw data blocks.
- Remove block-based Dispack encoding. File-specific Dispack filter to be added.
- Improve file header based executable file detection.
- Introduce new sorting algorithm for filenames without extension.
2014-12-11 19:15:36 +05:30
Moinak Ghosh
b257c83f33
Detect a few mozilla file signatures.
...
Add missing option to suppress pathname sorting.
Fix chunk sizing to properly auto-enable deduplication.
Fix default dedupe block size to 8KB.
2014-11-16 22:57:47 +05:30
Moinak Ghosh
507e7c75d3
Centralise data analysis routine for optimum performance and leverage.
...
Utilise buffer data analysis for preprocessing filters.
2014-11-06 22:23:33 +05:30
Moinak Ghosh
848010fbb5
Tweak LZP and Dict to mostly avoid non-text files.
2014-11-05 22:05:19 +05:30
Moinak Ghosh
b2ad225fbb
iImplement fast TOC listing for metadata streams.
...
Fix help text.
Removed redundant allocator code.
Actually free memory on exit.
2014-11-03 20:20:05 +05:30
Moinak Ghosh
cc68550670
Add metadata stream flag for archive.
...
Change flag bit to not collide with checksum id.
Handle '-T' option properly.
2014-10-25 22:57:31 +05:30
Moinak Ghosh
e7081eb5a3
Git commit - rehash. Incorrect earlier commit.
...
Implement Separate metadata stream.
Fix blatant wrong check in Bzip2 compressor.
Implement E8E9 filter fallback in Dispack.
Improve dict buffer size checks.
Reduce thread count to control memory usage in archive mode.
2014-10-24 23:30:40 +05:30
Moinak Ghosh
e3c32ed6d6
Remove unneeded archive writing function.
...
Improve filter scratch buffer handling.
Improve memory accounting.
Remove delayed allocation when compressing. Allows better memory estimation.
Some cstyle fixes.
2014-09-24 21:54:36 +05:30
Moinak Ghosh
2e5f2d8aab
Make DICT filter useful.
...
Improve data analysis in adaptive_compress.
2014-09-20 21:49:06 +05:30
Moinak Ghosh
071a9e2b26
Update,simplify analyzer function to indicate text data for Dict filter.
...
Fix archive header writing bug.
Strip ^M chars from dict filter files.
Include DICT preprocessing type.
Fix a bunch of bugs found by Xcode.
2014-09-20 12:49:00 +05:30
Moinak Ghosh
4fedebc607
Dict filter work in progress.
2014-09-18 22:51:25 +05:30
Moinak Ghosh
af39994a59
Working Wavpack filter for compressing WAV filies.
...
Improved error handling of filter routines.
Improved verbose logging.
2014-09-17 20:34:38 +05:30
Moinak Ghosh
376a56622b
Several fixes for issue #21 .
2014-08-28 22:48:36 +05:30
Moinak Ghosh
d5ceda559e
Update Licensing notes and build notes.
...
More whitespace fixes.
2014-07-26 15:28:40 +05:30
Moinak Ghosh
c1411a6af6
More whitespace cleanup and MPLv2 licensing support.
2014-07-24 23:48:42 +05:30
Moinak Ghosh
10f40e1c6f
Part 1 changes to allow dual licensing to MPLV2.
...
Make external LGPL code/features disabled in MPLV2 variant.
Nuke some unwanted whitespace (cstyle).
2014-07-24 22:20:30 +05:30
Moinak Ghosh
63bef473cc
Working MAC OS X port.
...
Compatibility layer for semaphore handling.
2014-05-04 21:11:31 +05:30
Moinak Ghosh
6fba8aa8ac
More OSX compatibility code.
...
Fix new warnings with Gcc 4.8.
2014-04-28 00:12:51 +05:30
Moinak Ghosh
935717373b
Capability to list offset and length of each block when deduplication for external use.
2014-03-30 17:35:21 +05:30
Moinak Ghosh
c15957b990
Avoid auto-selection variable chunking for buffer sizes below threshold.
2014-03-02 17:13:31 +05:30
Moinak Ghosh
9d40f3c2fb
Do not auto-select Global Dedupe for below threshold buffers.
2014-02-22 23:34:26 +05:30
Moinak Ghosh
8a1f47917f
Fix issue #18 .
...
Do not try to generate a target filename in pipe mode.
2014-02-05 23:43:07 +05:30
Moinak Ghosh
2702544d3f
Scale default compression buffer size for levels > 8.
2014-01-29 23:27:53 +05:30
Moinak Ghosh
33281a2257
Fix issue #17 .
...
Use LZ4 and Libbsc extra padding space for the compression buffer in adaptive modes.
2014-01-29 00:12:04 +05:30
Moinak Ghosh
62568e9066
Basic capability to list contents of an archive without extracting to disk.
2014-01-12 20:38:20 +05:30
Moinak Ghosh
3ddaf6d45f
Bump version and update command help text.
2014-01-04 21:45:23 +05:30
Moinak Ghosh
16da0b0339
Fix handling of some options.
...
Update README with additional option details.
2014-01-03 22:51:02 +05:30
Moinak Ghosh
ea345a902a
Overhaul documentation part #1
...
Detect and handle uncompressed PDF files using libbsc.
Force binary/text data detection for tar archives.
Get rid of unnecessary CLI option.
Add full pipeline mode check when archiving.
2013-12-30 23:24:37 +05:30
Moinak Ghosh
4c75a2da48
Fix issue #12 .
...
Fix issue #13 .
Create output directory with correct mode.
Fix the flow where pathname list is not sorted.
Fix ppmd decompression bug introduced in previous commit.
Reduce compression level for automatic pathname sorting.
Change to extraction directory only after opening archive.
2013-12-27 23:49:47 +05:30
Moinak Ghosh
5521955a94
Detect AR archives and set the type.
...
Re-use a less common type code for AR.
Use Dispack generically for all executables and AR archives.
2013-12-18 23:00:39 +05:30
Moinak Ghosh
a741f34f78
Move MSDOS COM single-byte magic number checks to last in the list.
...
Move advanced options flag into context structure.
Include dtd files as text type.
2013-12-18 00:09:32 +05:30
Moinak Ghosh
a851bac247
Check harder with more strides in Delta2 for extreme compression levels.
2013-12-13 19:53:14 +05:30
Moinak Ghosh
bb08b24989
Make LibArchive filter process buffer more generic.
...
Include explicit CLI flags for PackJPG and Dispack.
Avoid auto-selection of filters if advanced options are specified.
2013-12-12 00:22:15 +05:30
Moinak Ghosh
393fd790b0
Add more robust checks for Jpeg and packJPG format files in filter routine.
...
Use case-insensitive checks for extension names.
Enable more features based on compression level, when archiving.
2013-12-08 23:24:06 +05:30
Moinak Ghosh
36ed5d5a78
Use adapt2 as default compression in archive mode.
...
Add more filter auto-selection by compression level in archive mode.
Replace odd stride lengths in Delta2 with standard numeric type lengths and improve performance.
2013-12-05 22:20:01 +05:30
Moinak Ghosh
5e484f0694
Use libbsc for AVI and MP4 files.
2013-12-04 20:07:52 +05:30
Moinak Ghosh
c4c4b47138
Use Libbsc for DNA Sequence data instead of PPMD. Faster, better compression.
...
Fix pz extension handling for real.
2013-11-30 09:58:21 +05:30
Moinak Ghosh
dfeea8c19b
Avoid Delta2,LZP for TIFF files. Negatively impacts compression.
2013-11-29 19:47:57 +05:30
Moinak Ghosh
306f145f22
Use libbsc/ppmd for BMP files.
...
Fix extension based hashing.
Do not append .pz extension to filenames already having it.
Some code formatting changes.
2013-11-28 22:42:51 +05:30
Moinak Ghosh
4923551570
Fix Dispack decoding.
2013-11-24 22:44:35 +05:30
Moinak Ghosh
0192790c02
Add Dispack filter with auto-detection of x86 executables in archive mode.
...
More elaborate magic header based detection of 32-bit and 64-bit x86 binaries.
Always use fast-mode LZ4 in Adaptive modes.
2013-11-24 19:45:58 +05:30