This is another one of those "I used this for 5 minutes and found this out" naiv...

		melbourne_mat on June 14, 2024 \| parent \| context \| favorite \| on: Cost of self hosting Llama-3 8B-Instruct This is another one of those "I used this for 5 minutes and found this out" naive posts which add nothing useful. Check out the host LLM's at home crowd. One app to look at is llama.cpp. Model compression is one of the first techniques to successfully run models on low capacity hardware.