Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Few hundred on the planet? Are you kidding me? We're asking enterprises to run LLMs on-premise (I'm intentionally discounting the cloud scenario where the traffic rates are much higher). That's way more than a hundred and sorry to break it to you that Ollama is just not going to cut it.


No need to be angry about this. Tech folks should be discussing this collectively and collaboratively. There's space for everything from local models running on smartphones all the way up to OpenAI style industrialized models. Back when social networks were first coming out, I used to read lots of comments about deploying and running distributed systems. I remember reading early incident reports about hotspotting and consistent hashing and TTL problems and such. We need to foster more of that kind of conversation for LMs. Sadly right now Xitter seems to be the best place for that.


Not angry. Having a discussion :-). It just amazes me how the HN crowd is more than happy with just trying out a model on their machine and calling it a day and not seeing the real picture ahead. Let ignore perf concerns for a moment. Let's say I want to run it on a shared server in the enterprise network so that any application can make use of it. Each application might want to use a model of their choosing. Ollama will unload/load/unload models as each new request arrives. Not sure if folks here are realizing this :-)


> Not sure if folks here are realizing this :-)

I’m not sure you’re capable of understanding that your needs and requirements are just that, yours.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: