I think the main limitation, right now, is hardware. For GPUs the main limit is the VRAM available on consumer models. CPUs have plenty of memory but don't have the bandwidth or vector compute power for LLMs. This is why I think the Strix Halo is so exciting: it has bandwidth + compute power plus a lot of memory. It's not quite where it needs to be to replace a dedicated GPU, but in a few iterations it could be.
I'm interested in other opinions. I'm no expert on this stuff.
How does the shared memory model for GPUs on Apple Silicon factor into this? These are technically consumer grade and not very expensive, but they can offer a huge amount of memory since all the memory is shared between CPU and GPU, even a midtier machine can easily have 100 GB of GPU memory.
If you squint the M4 is the same as the Strix Halo. The M4 has roughly
* double the bandwidth;
* half the compute; and
* double the price for comparable memory (128GB)
compared to the Strix Halo.
I'm more interested in the AMD chips because of cost plus, while I have an Apple laptop, I do most of my work on a Linux desktop. So a killer AMD chip works better for me. If you don't mind paying the Apple tax then a Mac is a viable option. I'm not sure on the software side of LLMs on Apple Silicon but I cannot imagine it's unusable.
I am also very interested in AMD's Strix Halo for running LLMs locally. For that I have a Framework Desktop in order (batch 1!).
Alex Ziskind on Youtube does videos comparing Strix Halo, M4 Mac mini and MacBook Pro, Nvidia 5090, etc. including power consumption. The only downside is one has to pull out the numbers from the videos, there's no tables or anything. Here is the recent video with testing Strix Halo and a Mac mini: https://www.youtube.com/watch?v=B7GDr-VFuEo
I don't know, what's worst with people running LLM locally compared to running any software locally?
There is nothing fundamentally new in having freedom in edge of societies. Yes it can lead to horrible situation, like someone kill neighbors, using the single handable bright new tool available to all. But that's far less of a concern than having the powerful new tool staying in full concentrated control of the greediest humans out there, who will gladly escalate any hindrance to genocide whenever something doesn't fit their perspective.
If anyone has any suggestions of people thinking about this space they respect, I'd love to listen to more ideas and thoughts on the developments.