SPARC is pretty much a lost cause for us. I believe that we have had better performance with the system memcpy over anything we've written.
Linux GCC was all over the place, depending on the Red Hat Enterprise version. IIRC, RHEL 4 and above, our internal code worked better, but with 5 and 6, the included memcpy is generally better.
If you're writing code that has serious performance requirements, experimentation is key. There's absolutely no guarantee that the system call will be better than a hand rolled call.
Linux GCC was all over the place, depending on the Red Hat Enterprise version. IIRC, RHEL 4 and above, our internal code worked better, but with 5 and 6, the included memcpy is generally better.
If you're writing code that has serious performance requirements, experimentation is key. There's absolutely no guarantee that the system call will be better than a hand rolled call.