The 24GB card mentioned is almost certainly a RTX Titan, which are $3000 each. Just the card. Second, training frameworks like Megatron can distribute to multiple GPUs in the same computer as if they were on different machines, but the naive trainer is greatly helped by NVLink in order to actually look the memory and greatly improve accuracy, which means V100s which are $5000 each. (Also, people use Linux for ML)
Most universities have access to supercomputers, including GPU clusters. But that's not the point, not every NLP problem requires experimenting with 175B parameter models.
Academic researchers shouldn't try to compete with Google or OpenAI in scaling up models. They should try to come up with new approaches. Our brains have been evolving under tight constraints (size, energy, noise, etc). Maybe a good academic problem to solve is "how can I do what GPT-3 does if I only have an 8 GPU workstation?" This might lead to all kinds of breakthroughs.