Just like adding code to textual models helps the model develop its reasoning capabilities, it seems like adding more languages helps in other areas too. What is needed is more good quality data to train on...
We also see humans get worse at specific things when they learn too much in general. There is a cut-off point to how many concepts we can learn with what skill. To be most effective, we have to specialize in the right things while continuing to acquire generalist knowledge. It’s a balancing act.
These architectures are less capable than brains in many ways. So, we should expect them to have such trade-offs. An efficient one should work fine on English, mathematical notation, and a programming language. Maybe samples of others that illustrate unique concepts. I’m also curious how many languages or concepts you can add to a given architecture before its effectiveness starts dropping.
It's not the amount that is wrong, it's how the model is trained. The model is trained for zero and few shot tasks. It is not surprising that it is performing well when you ask for that.