Amiga 1000 was released in mid-1985, although I think few units were shipped before 1986. Amiga 500 was 1987 - same year as VGA, its little brother MCGA, and the Mac II.
Not sure when VGA would have been considered mainstream... 1989 maybe?
Mac LC was 1990, so probably before that.
VGA and Mac color were better for most practical things. Square pixels and far fewer resolution/flicker/color tradeoffs.
> Amiga 1000 was released in mid-1985, although I think few units were shipped before 1986.
It was announced on July 26th, 1985 at Lincoln Center with Andy Warhol painting Blondie (Debby Harry) live on-stage (a demo which was re-created at the Computer History Museum this past Summer as part of the Amiga 040th celebration). But you're right it wasn't commonly available until the late Fall. I managed to get mine at the end of November.
Considering that people expect literally the same thing, I can understand how even small regional differences can seem extreme. Like not finding any beef on the menu in India, or any bacon in the Middle East.
I've been telling analysts/investors for a long time that dense architectures aren't "worse" than sparse MoEs and to continue to anticipate the see-saw of releases on those two sub-architectures. Glad to continuously be vindicated on this one.
For those who don't believe me. Go take a look at the logprobs of a MoE model and a dense model and let me know if you can notice anything. Researchers sure did.
Dense is (much) worse in terms of training budget. At inference time, dense is somewhat more intelligent per bit of VRAM, but much slower, so for a given compute budget it's still usually worse in terms of intelligence-per-dollar even ignoring training cost. If you're willing to spend more you're typically better off training and running a larger sparse model rather than training and running a dense one.
Dense is nice for local model users because they only need to serve a single user and VRAM is expensive. For the people training and serving the models, though, dense is really tough to justify. You'll see small dense models released to capitalize on marketing hype from local model fans but that's about it. No one will ever train another big dense model: Llama 3.1 405B was the last of its kind.
MoE isn't inherently better, but I do think it's still an under explored space. When your sparse model can do 5 runs on the same prompt in the same time as a dense model takes to generate one, there opens up all sorts of interesting possibilities.
No it's not our fault - re our 4 uploads - the first 3 are due to llama.cpp fixing bugs - this was out of our control (we're llama.cpp contributors, but not the main devs) - we could have waited, but it's best to update when multiple (10-20) bugs are fixed.
The 4th is Google themselves improving the chat template for tool calling for Gemma.
https://github.com/ggml-org/llama.cpp/issues/21255 was another issue CUDA 13.2 was broken - this was NVIDIA's CUDA compiler itself breaking - fully out of our hands - but we provided a solution for it.
See, that’s the thing. The users don’t care or even know about the whole server/client thing. They just want to change the storage folder. A good GUI lets you do such things without the whole envvar dance.
reply