That stage1 example is pretty ugly - using __builtin_ia32<x> would work, but they are the only things harder to read than the intrinsics themselves!
Plus there are some intrinsics that are just macros, (sets, masks, etc), and you don't get them from the preprocessor just by setting the function target.
As an aside, that page really is dated - it is just early proposals afterall - as SSE5 didn't see light of day like that. VPCMOV ended up in AMD's XOP set.
ICC also has its auto as well as manual dispatch options:
I believe this is the area where Intel had their knuckles rapped for only working on "GenuineIntel" processors, and why there are big disclaimers on everything now. I've not tried using these myself as they aren't portable solutions.
Plus there are some intrinsics that are just macros, (sets, masks, etc), and you don't get them from the preprocessor just by setting the function target.
As an aside, that page really is dated - it is just early proposals afterall - as SSE5 didn't see light of day like that. VPCMOV ended up in AMD's XOP set.
ICC also has its auto as well as manual dispatch options:
auto: https://software.intel.com/en-us/node/682440 manual: https://software.intel.com/en-us/node/684505
I believe this is the area where Intel had their knuckles rapped for only working on "GenuineIntel" processors, and why there are big disclaimers on everything now. I've not tried using these myself as they aren't portable solutions.