The fact that different CPUs have different features was one of the original rea...

jabl · on Dec 12, 2016

HPC guy here; Nope, we don't bother recompiling the kernel for every piece of hardware we have. HPC code tends to run 99.99% in user space, kernel performance doesn't matter that much.

End user applications, that's another matter, and here using all the latest vector instructions etc. can make a difference. Usually less so than what one might hope, though. The really big deal tends to be using optimized libraries such as OpenBLAS, FFTW, MKL instead of doing numerical linear algebra yourself in a naive fashion, or using the reference netlib BLAS.

Another very common problem we see is poor application I/O patterns. Yes, every HPC site loves to brag how many GB/s their Lustre system does, but if you divide that by the number of CPU cores in a cluster, that ratio is quite low. Additionally, like other clustered file systems, Lustre metadata performance is relatively poor, so applications banging on lots of small files can easily tank the performance of the entire Lustre system.

ngoldbaum · on Dec 12, 2016

Yup. When I was running on NASA's HPC resources, I found pages like these to be very useful:

https://www.nas.nasa.gov/hecc/support/kb/preparing-to-run-on...

https://www.nas.nasa.gov/hecc/support/kb/broadwell-processor...