Add SSE4 detection. Fix setting of some opt flags in Makefile.in.
Avoid different optimization flags for Dedupe sources. Fix liberal mixing of uint64_t and int64_t (should all be uint64_t). Fix corner case crash when decompressing.
Add support for runtime cpuid detection.