Add SSE4 detection. Fix setting of some opt flags in Makefile.in.
Enable SSE4/AVX detection for AMD platforms (Bulldozer has both). Portable long long int print formatting to silence gcc 4.6 warnings.
Add support for runtime cpuid detection.