Commit graph

  • a851bac247 Check harder with more strides in Delta2 for extreme compression levels. Moinak Ghosh 2013-12-13 19:53:14 +0530
  • bb08b24989 Make LibArchive filter process buffer more generic. Include explicit CLI flags for PackJPG and Dispack. Avoid auto-selection of filters if advanced options are specified. Moinak Ghosh 2013-12-12 00:22:15 +0530
  • 393fd790b0 Add more robust checks for Jpeg and packJPG format files in filter routine. Use case-insensitive checks for extension names. Enable more features based on compression level, when archiving. Moinak Ghosh 2013-12-08 23:24:06 +0530
  • 733e6f8245 Do not use Libbsc for TIFFs. Not all TIFFs compress well with Libbsc. Fix DEBUG-STATS build for Dispack. Moinak Ghosh 2013-12-06 22:53:41 +0530
  • 36ed5d5a78 Use adapt2 as default compression in archive mode. Add more filter auto-selection by compression level in archive mode. Replace odd stride lengths in Delta2 with standard numeric type lengths and improve performance. Moinak Ghosh 2013-12-05 22:20:01 +0530
  • 316d5aa4a8 Remove fast path exit to allow compressing headers and zero paddings via LZ4. Moinak Ghosh 2013-12-04 23:05:18 +0530
  • 5e484f0694 Use libbsc for AVI and MP4 files. Moinak Ghosh 2013-12-04 20:07:52 +0530
  • 3f62cdf7d5 Use Libbsc for MP4 and FLAC files. Change some rare file type codes to indicate some common types. Moinak Ghosh 2013-12-03 21:56:07 +0530
  • 958bdf7edc Use Libbsc for TIFF images. Workaround for packJPG limitation. Moinak Ghosh 2013-12-02 21:50:19 +0530
  • 5a49252bb9 Remove external Libbsc option. Moinak Ghosh 2013-11-30 22:43:31 +0530
  • fb25e53b4f Add forked and optimized copy of LGPL version of Libbsc. Strip out Sort Transform from Libbsc copy. Reduce Libbsc memory use. Avoid redundant adler32 of data block in Libbsc. Moinak Ghosh 2013-11-30 22:13:33 +0530
  • c4c4b47138 Use Libbsc for DNA Sequence data instead of PPMD. Faster, better compression. Fix pz extension handling for real. Moinak Ghosh 2013-11-30 09:58:21 +0530
  • dfeea8c19b Avoid Delta2,LZP for TIFF files. Negatively impacts compression. Moinak Ghosh 2013-11-29 19:47:57 +0530
  • 306f145f22 Use libbsc/ppmd for BMP files. Fix extension based hashing. Do not append .pz extension to filenames already having it. Some code formatting changes. Moinak Ghosh 2013-11-28 22:42:51 +0530
  • bd530e3393 Get rid of nagging warning. Moinak Ghosh 2013-11-26 21:43:17 +0530
  • 7bf967b572 Fix PackJPG library usage. PackJPG interface doc is incomplete, ugh! Handle the case where PackJPG expands the file rather than compressing. Moinak Ghosh 2013-11-26 21:24:01 +0530
  • 4923551570 Fix Dispack decoding. Moinak Ghosh 2013-11-24 22:44:35 +0530
  • 0192790c02 Add Dispack filter with auto-detection of x86 executables in archive mode. More elaborate magic header based detection of 32-bit and 64-bit x86 binaries. Always use fast-mode LZ4 in Adaptive modes. Moinak Ghosh 2013-11-24 19:45:58 +0530
  • 1e2c3e479a Optimize preprocessed compression and avoid a bunch of memory copies. Fix a crash. Add a few more file types. More comments. Moinak Ghosh 2013-11-22 20:44:26 +0530
  • 664c8ef75b Fix fd leak. Moinak Ghosh 2013-11-15 23:06:31 +0530
  • c09a2b7b81 Fix issues when handling Jpegs where packJPG borks. Moinak Ghosh 2013-11-15 23:02:09 +0530
  • 11584cab52 Add fast handling of totally incompressible data (like Jpegs) in adaptive modes. Add function to indicate totally incompressible data when archiving. Reformat if statements in some places to reduce branching. Moinak Ghosh 2013-11-15 21:06:23 +0530
  • c567a1d2f5 Enable auto-filtering of archive entries based on compression level. Miscellaneous fixes. Moinak Ghosh 2013-11-14 21:54:46 +0530
  • e90c52e516 Work in progress changes for packJPG encoding and decoding. Enhance custom LibArchive filter functionlity. Moinak Ghosh 2013-11-13 23:28:01 +0530
  • 75dfa6a6fb Add basic framework for file type based filters during libarchive stage. Add packJPG filter for Jpeg files (not active yet). Directory format changes for clarity. Moinak Ghosh 2013-11-10 23:09:42 +0530
  • a5f1624a33 Add own implementation of archive entry extraction to allow custom filters. Fix magic number check for endianness. Moinak Ghosh 2013-11-09 21:55:18 +0530
  • 6aacd903ff Structured handling of file types. Handling of already compressed data based on compression algorithm. Add a few more extension types. Moinak Ghosh 2013-11-09 16:46:19 +0530
  • cae9de9b2e Leverage file type detection(archiver) to improve compression performance. Use detected file/data type(archiver) for Adaptive compression modes. Update type flags and add more extensions. Moinak Ghosh 2013-11-08 23:50:28 +0530
  • b7facc929e Add file type detection based on magic values. Add more comments. Add more extensions. Moinak Ghosh 2013-11-07 23:57:15 +0530
  • 991482403b Add extension based file type detection and setting segment data type. Use Bob Jenkins Minimal Perfect Hash to check for known extensions. Use semaphore signaling and direct buffer copy for extraction. Miscellaneous fixes. Moinak Ghosh 2013-11-07 21:48:54 +0530
  • 489b97cc79 Clear off private xattrs when extracting. Enable pathname sorting only for high compression levels. Moinak Ghosh 2013-11-04 18:35:22 +0530
  • 448890a014 Replace slow pipe with direct memory copy for archive extraction. Miscellaneous corrections and tweaks. Moinak Ghosh 2013-11-03 23:15:55 +0530
  • 7ed532133e Avoid using pipe during archive creation. Use semaphores and direct memory copy. Moinak Ghosh 2013-11-02 23:43:59 +0530
  • a374ca5909 Use mmap to read from the pathlist file for performance. Moinak Ghosh 2013-11-02 12:14:46 +0530
  • dcccffd7fa Archiving support using Libarchive: Fully functional archiving and extraction. Functionality to sort pathnames based on file extension and size. Moinak Ghosh 2013-11-01 23:15:40 +0530
  • e09d8a485c Archiving support using Libarchive: Working archive extraction. Moinak Ghosh 2013-10-31 00:15:17 +0530
  • 8e4b774c8c More changes for archiving. Allow multiple filenames on command line when archiving. Remove unneded small block writes with libarchive. Moinak Ghosh 2013-10-27 20:36:48 +0530
  • 46b11def08 Archiving support using Libarchive: Work in progress changes #3. Make log_msg() add newline by default. Moinak Ghosh 2013-10-24 00:16:04 +0530
  • bc451aba36 Archiving support using Libarchive: Work in progress changes #2. Moinak Ghosh 2013-10-22 23:41:51 +0530
  • 7f81869874 Archiving support using Libarchive: Work in progress changes. Change all perror() calls to use logger. Make the config script a little verbose. Moinak Ghosh 2013-10-20 23:54:27 +0530
  • 28fd9848f9 Ability to specify output compressed pathname. Fix log level handling. Trim commented code. Moinak Ghosh 2013-10-10 21:19:44 +0530
  • 8c1f4ebe61 Add a simple log facility. Refactor all printfs to use log facility. Moinak Ghosh 2013-10-02 20:45:33 +0530
  • fa78621cbf Cleanup pointer casting in code to use macros. Moinak Ghosh 2013-09-22 20:11:15 +0530
  • 38c0869f5c Update Changelog and tweak free memory detection for 2.4 release. Add identifiers to error messages for clarity. Fix init of dedupe block size. Tweak free memory detection to include swap and shared memory consideration. 2.4Bugfix Moinak Ghosh 2013-09-05 21:12:37 +0530
  • a61fea75da Fix incorrect chunk size initialization from a previous commit. Moinak Ghosh 2013-09-03 23:23:11 +0530
  • b236638e72 Remove confusing option with little practical utility. Update test cases and documentation. Moinak Ghosh 2013-09-01 15:02:28 +0530
  • 12a2b8ed63 Additional error checks in RLE encoding for bsdiff extra data. Add a buffer overflow check in RLE encoder. Avoid calling RLE encoding if extra data length is zero. Make 2KB block size default for non-global deduplication. Update test cases for new 2KB block size support. Moinak Ghosh 2013-08-30 19:51:43 +0530
  • 2e62be3c9c Truncate password file after zeroing. Moinak Ghosh 2013-08-29 22:03:08 +0530
  • 9a7a8e84fe Add more example usage. Moinak Ghosh 2013-08-28 21:01:25 +0530
  • be1d0857a6 Avoid calling compression routine when dedupe reduces data size to zero. Moinak Ghosh 2013-08-28 09:46:10 +0530
  • cee8d88ded Bump version for upcoming release. Moinak Ghosh 2013-08-27 21:41:16 +0530
  • 7685adefb2 Default compression level only when compressing. Moinak Ghosh 2013-08-24 23:15:07 +0530
  • fc65111bae Fix issue #11. Increase default chunk size to 8MB. Use default compression level of 1 (fast mode) for LZ4. Moinak Ghosh 2013-08-24 22:58:50 +0530
  • 3db5188445 Support for deduplication using 2KB block size. Moinak Ghosh 2013-08-19 13:38:52 +0530
  • ef98422bd4 Add basic file format documentation. Reduce memory threshold for switching to Similarity based Deduplication. Moinak Ghosh 2013-08-18 20:11:20 +0530
  • 58f3113558 Avoid unnecessary re-hashing of 64-bit keys of the segment index. Moinak Ghosh 2013-08-17 22:08:55 +0530
  • d31c6433c2 Update free memory computation to include cached buffers. Fix a potential rare corner case. Moinak Ghosh 2013-08-17 11:31:44 +0530
  • 413a2a2fb1 Update Changelog and bump version for 2.3 release. 2.3Update Moinak Ghosh 2013-08-10 10:25:41 +0530
  • f35d0ff4ef Fix multiple crashes for some corner cases. Increase max block size for variable dedup block sizes greater than 16KB. Update test cases and fix a test script bug. Moinak Ghosh 2013-08-09 21:55:06 +0530
  • fe18afbcf4 Use wrapper script to set paths when launching pcompress from build directory. Use smaller max block size when doing global dedupe. Fix init of executable name. Moinak Ghosh 2013-08-07 22:03:52 +0530
  • f34cfb1aa6 Make data partitioning between threads more effective. Remove unnecessary computation to make Fixed block chunking faster. Moinak Ghosh 2013-07-21 09:31:59 +0530
  • 043cdfc05c Fix Dedupe Mode initialization. algo-analysis Moinak Ghosh 2013-07-12 18:23:31 +0530
  • 2a218e9da5 Fix Dedupe Mode initialization. Moinak Ghosh 2013-07-12 18:21:49 +0530
  • 8b73303488 Some minor code cleanup. Moinak Ghosh 2013-07-05 22:22:11 +0530
  • e10a13ad94 Improve accuracy of the KMV sketch computation and speed it up. Moinak Ghosh 2013-07-03 19:24:06 +0530
  • 6b67e98747 Reduce similarity indicators to reduce memory use with low impact on dedupe ratio. Moinak Ghosh 2013-06-30 22:38:05 +0530
  • ae3ba0858c Avoid CRC64 for Similarity IDs when using 256-bit hash. Moinak Ghosh 2013-06-29 19:42:45 +0530
  • de0695e2c5 Add missing init of rabin block size. Moinak Ghosh 2013-06-29 19:13:22 +0530
  • a2d74dab50 Print index type. Moinak Ghosh 2013-06-29 11:57:32 +0530
  • 17db67564d Reduce a rollign hash parameter for a slight speedup with no side effect. Moinak Ghosh 2013-06-24 21:13:32 +0530
  • e732e86b91 New option to capture statistics for all rolling hash breakpoints. Moinak Ghosh 2013-06-23 21:30:32 +0530
  • 84944932b0 Add more statistics. Moinak Ghosh 2013-06-22 23:54:54 +0530
  • b8f4a5d411 Compute rolling hash coverage metrics. Moinak Ghosh 2013-06-20 23:57:18 +0530
  • 916f31d62b Add measurements for Chunking properties. Moinak Ghosh 2013-06-20 22:08:07 +0530
  • 6432c76b4b Update README formatting yet again - ugh. Moinak Ghosh 2013-06-16 21:12:04 +0530
  • 52723cbbac Update README formatting. Moinak Ghosh 2013-06-16 21:09:04 +0530
  • 92be5a17f0 Update README with pointers to relevant analysis and documentation. Moinak Ghosh 2013-06-16 20:46:17 +0530
  • c0dd0102a5 A few minor fixes. Moinak Ghosh 2013-06-14 22:25:01 +0530
  • 63370caee9 Remove an rpath entry meant for testing. Moinak Ghosh 2013-06-03 21:28:20 +0530
  • 7743792018 Make default symbol visibility to hidden with explicit public visibility specified. Add missing static scope to a few more places. Moinak Ghosh 2013-06-03 20:51:00 +0530
  • c859cf35d5 Make Pcompress functionality into a library - initial changes. Moinak Ghosh 2013-06-02 20:54:33 +0530
  • 7014b12666 Update results3 analysis link. gh-pages Moinak Ghosh 2013-06-01 23:48:31 +0530
  • 01ebc67f84 Add 3rd set of Benchmark results. Moinak Ghosh 2013-06-01 23:03:30 +0530
  • 8db0bef184 Bump version and update Changelog for 2.2 release. 2.2Bugfix Moinak Ghosh 2013-05-28 21:45:19 +0530
  • ab1ced942d Update invalid environment variable handling to actually fail rather than auto-correct. Moinak Ghosh 2013-05-28 21:38:35 +0530
  • 0945e79f2c Add blog corss links to index file. Moinak Ghosh 2013-05-28 00:00:11 +0530
  • b513f15d50 Update Index page with second results. Moinak Ghosh 2013-05-27 23:11:28 +0530
  • c6bcc6dae8 Add second set of benchmark results. Moinak Ghosh 2013-05-27 23:10:20 +0530
  • e9ce7a5ed2 Fix a crash with invalid PCOMPRESS_CHUNK_HASH_GLOBAL. Update testcase to correctly detect core files. Moinak Ghosh 2013-05-26 23:38:10 +0530
  • 0a4b588cc5 Add benchmark 1 link to index page. Moinak Ghosh 2013-05-26 13:05:27 +0530
  • cee75d3d0a Fix some overlaps. Moinak Ghosh 2013-05-26 13:01:58 +0530
  • fff3d75407 Add benchmark results 1st set. Moinak Ghosh 2013-05-26 12:39:18 +0530
  • ddaa3b6b6d Drastic simplification of Min-heap code and resultant Delta speedup. Moinak Ghosh 2013-05-25 17:34:38 +0530
  • 0a1e3b39ef Correspond segment size to chunk size for Segmented Dedupe for better accuracy. Moinak Ghosh 2013-05-15 22:20:45 +0530
  • d89c95225a Add a testcase for issue #10. Moinak Ghosh 2013-05-12 14:19:26 +0530
  • cbc0c84b12 Fix issue #10. Moinak Ghosh 2013-05-12 11:54:40 +0530
  • 41b036adac Fix issue #8. Moinak Ghosh 2013-05-10 19:51:24 +0530
  • 8b3761ee81 Update Changelog, docs and bump version for 2.1 release. 2.1Update Moinak Ghosh 2013-05-09 18:53:11 +0530
  • a755d59dff Add more tests covering Segmented Global Dedupe. Fix some tests. Moinak Ghosh 2013-05-07 22:30:36 +0530
  • 2740a00c76 Switch location of Dedupe context creation to allow correct index memory sizing. Moinak Ghosh 2013-05-07 20:50:13 +0530