Remove earlier SHA256 code which is slower than SHA512/256 (on 64-bit CPU). Use HMAC from Alan Saddi's implementation for cleaner, faster code.
Add support for runtime cpuid detection.