C requires that variable-argument-list fuctions be called with a correct prototype in scope, so it is perfectly feasible to use a "callee-cleans-up" ("Pascal") calling convention for regular C functions and a "caller cleans up" ("cdecl") convention for varargs functions.
The point is moot though, because just because an operation can be represented using a single assembler mnemonic doesn't mean that it's any faster than an alternative that uses several. A case in point is the "REP MOVS" style string functions, that until very recently were actually slower than opencoding the equivalents, since they trapped to microcode.
The point is moot though, because just because an operation can be represented using a single assembler mnemonic doesn't mean that it's any faster than an alternative that uses several. A case in point is the "REP MOVS" style string functions, that until very recently were actually slower than opencoding the equivalents, since they trapped to microcode.