I don’t know about Varnish, but having worked on other implementations, you would usually have a timeout on the initial lock (semaphore) to prevent a slow connection from impacting all clients.
But this is much, much harder to do once you are already streaming the response - if the time to first byte (TTFB) is quick, but the connection is low-throughout, you can’t do much at this point. But nearly all modern implementations stream the bytes to all clients immediately; they don’t try to fill the cache first (they do it simultaneously).
Some implementations might avoid fanning in too much - maintaining a smaller pool of connections rather than trying get to ”1”, but that’s ultimately a trade-off at each layer of the onion, as they can still add up.
(I worked at both Cloudflare and Google, and it was a common topic: request coalescing is a big deal for large customers)
I think the nginx that members of the public can get from their package manager does not have this feature, and will force each client other than the first to either wait for the entire body to be downloaded or wait for a timeout and hit the origin in a non-cacheable request.
But this is much, much harder to do once you are already streaming the response - if the time to first byte (TTFB) is quick, but the connection is low-throughout, you can’t do much at this point. But nearly all modern implementations stream the bytes to all clients immediately; they don’t try to fill the cache first (they do it simultaneously).
Some implementations might avoid fanning in too much - maintaining a smaller pool of connections rather than trying get to ”1”, but that’s ultimately a trade-off at each layer of the onion, as they can still add up.
(I worked at both Cloudflare and Google, and it was a common topic: request coalescing is a big deal for large customers)