This is another one of those "I used this for 5 minutes and found this out" naive posts which add nothing useful.
Check out the host LLM's at home crowd. One app to look at is llama.cpp. Model compression is one of the first techniques to successfully run models on low capacity hardware.
Check out the host LLM's at home crowd. One app to look at is llama.cpp. Model compression is one of the first techniques to successfully run models on low capacity hardware.