pcompress/Changelog
2013-04-26 19:04:35 +05:30

353 lines
17 KiB
Text

== 2.0 Major Release ==
Add test cases for Global Deduplication.
Update documentation and code comments.
Remove tempfile pathname after creation to ensure clean removal after process exit.
Simplify segment lookup loop.
Fix assertion.
Change Segmented Dedupe flow to improve parallelism.
Periodically sync writes to segcache file.
Use simple insertion sort for small numbers of elements.
Capability to output data to stdout when compressing.
Always use segmented similarity bases dedupe when using -G option in pipe mode.
Standardize on average 8MB segment size for segmented dedupe.
Fix hashtable sizing.
Some miscellaneous cleanups.
Update README with details of new features.
Optimize index lookup for 8-byte keys.
More cleanups.
More tweaks to slightly improve segment dedupe efficiency.
Some minor cleanps.
Improve segment similarity detection and drastically reduce index size.
Improve duplicate segment match detection.
Tweak percentage intervals computation to improve segmented dedupe ratio.
Avoid repeat processing of already processed segments.
Clean up temp cache dir handling.
Allow temp dir setting via specific env variable to point to fast devices like ramdisk,ssd.
Several bugfixes.
Avoid matching with self during hash lookup.
Several fixes and optimizations.
Many optimizations and changes to Segmented Global Dedupe.
Use chunk hash based similarity matching rather than content based.
Use sorting to order hash buffer rather than min-heap for better accuracy.
Use fast CRC64 for similarity hash for speed and lower memory requirements.
Many optimizations to segmented global dedupe.
Use chunk hash based cumulative similarity matching instead of chunk content.
Update usage text and add minor tweaks.
Properly cleanup global dedupe state.
Complete implementation for Segmented Global Deduplication.
Work in progress changes for Segmented Global Deduplication.
Work in progress changes for Segmented Global Deduplication.
Work in progress changes for scalable segmented global deduplication.
Allow user-specified environment setting to control in-memory index size.
Implement global dedupe in pipe mode.
Update hash index calculations to use upto 75% memavail when file size is not known.
Use little-endian nonce format for Salsa20.
Add global index cleanup function.
Fix location of sem_wait().
More comments.
Add check to disable Delta Compression with Global deduplication for now.
Implement Global Deduplication.
== 1.4.0 Update Release ==
Update couple more test parameters with new crypto options.
Update README and test cases with new crypto options.
Update usage text.
Cleanup more stack parameters after use in various crypto functions.
Fix increment of XSalsa20 192-bit nonce value.
Handle nonce bytes in endian neutral way.
Use 128-bit key length when decompressing older version archives.
Add XSalsa20 encryption algorithm from the NaCL library.
Include 128-bit key support based on the Salsa20 eSTREAM submission.
Allow variable-length nonces.
Use random bytes for initial nonce value.
Increase PBE hash rounds to 50000.
Move Scrypt helper function out of AES module.
Change default encryption key length to 256 bits.
Add optional ability to change key length at runtime via cli option.
Include key length property in archive header.
Fix header HMAC to include salt, nonce and key length properties.
Retain backward compatibility to handle older format archives.
Fix compilation of AES ASM code.
Add AES-NI optimized code derived from latest OpenSSL upstream.
Add AES instruction set detection.
Add Vector Permute AES from OpenSSL 1.0.1e. Remain compatible with older OpenSSL versions.
Add compatibility to decode old-format parallel hashes created with version 1.2.
Bump archive version to 7 as parallel hashes are now merkle style hashes.
Add lookup and insert functionality for global index.
Use PPMd fallback for adapt2 if BSC is not enabled.
Improve XML detection in adaptive mode.
Make global dedupe bits buildable and fix errors.
Rename Adaptive compression type constants to avoid conflict with global constants.
Work in progress global dedupe.
Fixes for issues/warnings reported in issue #4.
Global dedupe work in progress.
Use OpenMP parallelism when computing xxHashes for chunks.
Use 2-stage Merkle Tree hashing for parallel hashes for better crypto properties.
Update xxhash comment.
Reduce dedupe hash table collisions by half.
Improve Deduplication performance by another 95%.
Start sliding window scanning near minimum chunk size boundaries to avoid scanning whole chunk.
== 1.3.0 Major Performance Release ==
Fix return from parallel versions of Keccak_Hash() function.
Add parallel versions of various checksums for single-segment, single-thread compression.
Use BLAKE2 parallel version for single-chunk archives (whole file in one chunk).
Set decompression threads correctly for single-chunk archives.
Update default checksum to BLAKE256.
Add optimized BLAKE2 implementations with runtime detection of CPU capability (SSE/AVX).
Major changes to use Intel's optimized SHA512 code for SHA512 and SHA512/256.
Remove earlier SHA256 code which is slower than SHA512/256 (on 64-bit CPU).
Use HMAC from Alan Saddi's implementation for cleaner, faster code.
Changes for generalized runtime SSE/AVX/XOP detection.
Multi instruction set XXhash build with runtime selection.
Extend CPUID code to detect more instruction sets.
Add options for BLAKE2 hash.
Move GCC builtins into utils header.
Bump file format version number due to extended digest flags.
Add descriptions to digest list.
Rationalize XXHash implementation to deal with 32-byte blocks instead of 16-byte.
Fix XXHash performance degradation for small keys.
Modify a data analysis loop in adaptive compress to make it auto-vectorizable.
Improve Deduplication throughtput by 90%.
Use SSE4 register as sliding window for default 16-byte window size.
Use local variable for sliding window position to avoid spurios memory access in non-SIMD case.
Avoid computing breakpoint check value if processed length < minimum block length.
Improve SSE version detection.
Add SSE4 detection.
Fix setting of some opt flags in Makefile.in.
Many optimization tweaks
Optimize Rabin Deduplication and Bsdiff
Vectorize XXHash using SSE4
Add SSE2 improvements to CTR mode AES.
Add debug print of encryption and HMAC throughput.
Fix error message for invalid option.
Implement algo-specific minimum distance match for Delta Compression.
Fixes and performance improvements for Dedupe Delta Compression
Avoid using fingerprints in minhash computation and fix write amplification
Modify min-heap to use 64bit values
Improve bsdiff performance
Fix pointer comparison in bsdiff
Use 32bit offsets in bsdiff to reduce memory usage
Improve Zero RLE Encoder performance
Add more buffer overflow checks in Zero RLE Decoder
Use SSE3 lddqu in the matchfinder if SSE3 is enabled.
Remove outdated LZP note.
== 1.2.0 Major Stable Release ==
Fix issue #1 and issue #2.
Enable building with older openssl (at least 0.9.8e).
Add variants of missing functions in older openssl versions.
Allow proper linking with libraries in alternate locations and setting RUNPATH.
Increase hash rounds for nonce generation.
Fix calculation of extra scratch space for Dedupe.
Add missing return value check in small buffer Delta2.
Update test failure detection in test driver.
Fix numeric parsing.
Fix dedupe bug introduced in last commit.
Reset valid flag when resetting dedupe context.
Cleanup test suites.
Do not abort test suite on failure of a test case.
Fix Delta2 handling for small buffers.
Fix LZP handling during preprocessing.
Fix type flag handling during preprocessing.
Update test cases to use configurable list of test corpus files.
Update some more int64_t datatypes to uint64_t
Update to latest XXHash version.
Fix Keccak invocation and reset default checksum back to SKEIN 256.
Improve Dedupe performance.
Fix compiler warning in allocator code.
Major enhancements to Delta2 encoding.
Fix the byteswap macros.
Start adding assertions.
Introduce strict compiler flags and fix scores of warnings/issues.
Avoid different optimization flags for Dedupe sources.
Fix liberal mixing of uint64_t and int64_t (should all be uint64_t).
Fix corner case crash when decompressing.
Fix Delta2 decode handling when not compressing.
More debug statements and other minor changes.
Avoid transposing below-threshold spans. Reduces compression ratio.
Use little-endian storage format for numbers to optimize for x86.
Improve embedded table detection.
Reduce Delta2 header sizes.
Fix preprocessing behavior when LZP does not compress but Delta2 works.
Improve Delta2 scanning speed and effectiveness.
Add destination buffer overflow check in Delta2.
Add rough speed computation.
More tweaks to Delta2 implementation.
Ability to run individual test suites from makefile.
Silence more Gcc warnings.
Update testing note.
Improve 64-bit compiler and platform checks.
Ensure core dumps are enabled during testing.
Enable building with alternate Zlib and Bzlib.
Fix correct setting of output size when using Delta2 without LZP.
README formatting.
Make Delta2 encoding independent of LZP.
Tweak Delta2 parameters.
Update README and test cases.
Update README to align with current features/behavior.
Fine tune transpose parameters.
Fix minor nits.
Change confusing structure member name.
Add Matrix Transpose of Dedupe index to compress it better.
Fix handling of Dedupe index compression failure.
Ensure intermediate file cleanup in tests.
Update to latest LZ4.
Get rid of size_t in places where 64-bitness is assumed.
Add support for 64-bit Keccak implementation.
Sanitize error message from tests.
Add more tests.
Improve platform detection in config script.
Add adaptive delta encoding test.
Implement Adaptive Delta Encoding (Delta2).
Work in progress global dedupe config setup.
== 1.1.0 Stable Release ==
Fix building without Libbsc support.
Add more tests for corrupted encrypted files.
Fix thread error reporting.
Update error condition tests to not truncate archive.
Fix issues with error handling.
Add new tests for out of range values and corrupted file.
Avoid CRC32 when decompressing older version archives.
Use HMAC on header and encrypted data, avoid regular digest when encrypting.
Add CRC32 header checksums in non-cryptographic mode.
Zlib optimizations. Use raw deflate streams to avoid unnecessary adler32.
Change some function signatures to improve algo init function behavior.
Fix corner case dedupe bug in error handling flow.
Bump archive version signature.
Work in progress global dedupe config loader.
Use fixed rolling-hash mask for better block size approximation.
== 1.0.0 Stable Release ==
Fix chunk flag setup when compression fails in adaptive mode.
Prevent display of non-fatal errors during compression.
Add buffer overflow check in Ppmd compression routines.
Fix pipe mode encryption check.
Change file difference check in tests.
Fix encryption with adaptive modes.
Add missing zero-out of algorithm data field.
Add a functionality test suite.
Tweak chunking parameters for better block size distribution and dedupe ratio.
Add some more debug mode info.
Minor fix for adapt mode.
Use Libbsc for XML data in adapt2 mode.
== 0.9.1 Minor update Release ==
Portability to Debian based distros.
Enable SSE4/AVX detection for AMD platforms (Bulldozer has both).
Portable long long int print formatting to silence gcc 4.6 warnings.
Add another example invocation.
== 0.9.0 Major Release ==
Implement HMAC verification for chunk header.
Do not ask for decryption passwd twice.
Refactor crypto utility functions in a separate file.
Add HMAC verification for file header.
Add a few extra input validation checks.
Include space in chunk header for HMAC.
Support for chunk-level encryption using AES and Scrypt based PBE.
Couple of minor fixes.
Incorporate SSE/AVX optimized Intel SHA-256 implementation.
Add support for runtime cpuid detection.
Add support for SHA256 and SHA512 digests from OpenSSL library.
== 0.8.6 Update Release ==
Fix polynomial computation.
Fix incorrect block length when doing fixed-block dedupe.
Remove unused structure member.
Switch to multiplicative rolling hash for good
== 0.8.5 Update Release ==
Update adaptive mode heuristic based on algorithms.
Remove incorrect check in PPMd decompression code.
Speed up adaptive modes by using heuristics to select compression algorithm.
Select similarity percentage based on dedupe block size for effectiveness.
Fix check for size reduction in dedupe.
Fix polynomial table computation.
Change hashing and length bias to reduce hashtable bucket collisions.
Add support for user-selectable 60% or 40% similarity for Delta Compression.
Overall slight speedup.
Rewrite core dedupe logic to simplify code and improve performance.
Hashtable based chunk-level deduplication instead of Quicksort.
Fix a corner case bug in Dedupe decompression.
Speed up Hash computation for dedupe blocks.
Add support for Fixed-Block deduplication.
Reduce dedupe loop checks for slight speed edge.
Implement K-min-values Sketch for Similarity detection.
== 0.8.1 Bugfix Release ==
Fix return code handling in LZP pre-compression, crashed adaptive modes.
Allow user-specified minimum Dedupe block size.
Compute similarity sketch only if Delta Compression enabled.
Fix secondary sketch computation, some more accuracy in diff detection.
Add xxHash for Rabin block checksums, slightly faster than CRC64.
Fix missing initialization of character counts table.
Some file reorganization.
Add ASM version of Skein for x64 platforms with auto-detection
Error checking for checksum flag when decompressing
Add support for Skein512 and Skein256 checksums
Import Skein code from NIST CD submission
Make checksum algorithms pluggable
Fix handling of huge buffers (>2GB) in LZP
Cleanup of some buffer sizing code
== 0.8 Beta release ==
Add support for libbsc a high-performance block sorting compressor.
Enable external algorithm threading for single chunk compressed files.
Add config script to generate Makefile based on flags.
Add install target and installation readme file.
Improve memory efficiency when total file size < total size of chunks.
Fix freeing of Zlib structures.
Add LZP Pre-Compression support ported from libbsc.
Add generic pre-processing wrappers for future support of other pre-processors.
Clean up computation of Rabin block sizes.
Compute Rabin scratch space accurately to avoid RAM wastage.
Delay allocation of per-thread chunks for performance and memory efficiency.
Avoid allocating double-buffer for single-chunk files.
Introduce lzmaMt option to indicate multithreaded LZMA.
Add multithreaded LZMA port from p7zip
Compute balanced thread count between chunk threads and algo threads
Generic way to handle querying algorithm parameters
Clean up unnecessary includes
Speed up sort comparator function.
Reduce memory consumption and improve performance in dedupe
Re-introduce crc64 for dedup blocks to avoid wasted memcpy-s
Restructe block array to be an array of pointers allocated on demand
Fix a corner case issue when splitting chunks at a dedup boundary
Improve Rabin computations using an irreducible polynomial
Slight improvement to similarity computation
A simple mechanism to include DEBUG mode stats
Implement secondary sketch based on character counts to refine similarity checksum.
Proper checksum update for last block.
== 0.7 Beta release ==
Bump version to 0.7
Fix handling of compression flags in adaptive mode
Fix error handling when chunk size is too small for dedupe
Fix initialization of adaptive modes.
Compute and compare Mean sketch cksum to improve similarity comparison
Fix optflags settings in Makefile
Small optimization in zero RLE encoder to avoid scanning during lookahead
Some minor fixes
Bias fingerprint value with occurrence counts for a better sketch
Fix latent bug when calling algo deinit in decompression code
Reduce diff threshold for slightly greater delta encoding
Limit similar buffer size difference for less wasted diffing
Change zlib compression wrapper to use faster deflateReset mechanism
Reduce optimization level for Dedupe code, it goes faster
Fix handling of incompressible chunks.
Fix handling of various dedup failures.
Add NULL compression option for dedup only compression.
Remove unneeded checks in qsort comparator.
Fix buffer sizing for LZ4.
Fix exit condition checks in LZ4 decompression wrapper.
Fix buffer size calculation when decompressing LZ4, Zlib and Bzip2 compressed chunks.
Slight SSE optimization in LZ4HC.
== 0.6 Alpha bugfix release ==
Fix crash when algo init function returns error.
Fix LZFX error handling.
More updates to README.
Further improve LZMA compression parameters to utilize all the 14 levels.
Tweak some Rabin parmeters for better reduction with zlib and Bzip2.
Increase the small size slabs a bit.
Fix slab sizing.
Fix buffer size computation when allocating Rabin block array.
Reduce memory usage of Rabin block array.
Add an SSE optimization for bsdiff.