Use chunk hash based similarity matching rather than content based. Use sorting to order hash buffer rather than min-heap for better accuracy. Use fast CRC64 for similarity hash for speed and lower memory requirements.
Use chunk hash based cumulative similarity matching instead of chunk content.