Hacker Newsnew | past | comments | ask | show | jobs | submit | mchusma's commentslogin

Yes let’s go make deafness optional for adults and eradicate childhood deafness. Zero moral ambiguity.

Google owned 14ish percent of Anthropic before this investment, so presumably this could bring it up to as much as 25%?

I work with a lot of full-time devs, and it is very hard to go beyond the $200 max plan. If you use API credits, and I think the enterprise plan kind of forces you to do this, you can definitely incur this much, particularly if you're not using prompt caching and things like that.

But I and others in my company have very heavy usage. We only rarely, with parallel agentic processes, run out of the $200 a month plan.

And what do I mean by "hard"? I mean, it requires a lot of active thinking to think about how you can actively max it out. I'm sure there's some use cases where maybe it is not hard to do this, but in general, I find most devs can't even max out the $100 a month plan, because they haven't quite figured out how to leverage it to that degree yet.

(Again, if someone is using the API instead of subscription, I wouldn't be surprised to see $2,000 bills.)


Business/Enterprise accounts are billed at $20/seat + API prices, not subscription prices. You can give them a monthly dollar quota or let them go unlimited, but they're not being subsidized like in team. And team can't get a 20x plan from what I can tell.

At first, I was more excited about the Flash model, but I'm now more excited about the Pro model in many ways. I feel like the Pro model with an Run through unsloth, and with some fine tuning, is gonna be enough for many vertical SaaS applications.

Where previously I was wary to under-provide the intelligence level, I'm now more excited about the idea of being able to give these pretty large intelligent models to my application. The idea that for basically sub-agents, we can fine-tune them, should reasonably expect to perform as well as Opus for a specific subtask of which my applications have many.

In other words, we can run a general-purpose intelligent model, Sonnet or Opus, orchestrating a fleet of, let's say, 30 to 50 of these sub-agents that have been fine-tuned. By doing that, I can get very low pricing versus something that would have occurred if I used Opus or Sonnet for everything.


> The idea that for basically sub-agents, we can fine-tune them, should reasonably expect to perform as well as Opus for a specific subtask of which my applications have many [...] we can run a general-purpose intelligent model, Sonnet or Opus, orchestrating a fleet of, let's say, 30 to 50 of these sub-agents that have been fine-tuned

I've heard so many people saying this for the last year, and even tried doing it myself too, and never seen a successful application of it, nor succeeded myself either with SOTA models that are smart but slow or local models that are dumb but fast (even with beefy hardware).

What makes you believe this is possible in the first place? Every "swarm of agents" implementation I've seen only been able to produce lowest quality of code, most of the time vastly bloated, but surely you must have seen something working in practice that you could share with the rest of us?


I guess it depends on a task. Opus is already spawning Sonnet/Haiku for simple tasks with a good success rate.

I think "agent spawns weaker agent to do safe edit sometimes" is vastly different than the imagined "general-purpose intelligent model orchestrating a fleet of 50 sub-agents".

I honestly think health data should be public by default to any health researcher. We should do whatever we can to solve disease and live forever. Privacy be damned, I want life.

For comparison on openrouter DeepSeek v4 Flash is slightly cheaper than Gemma 4 31b, more expensive than Gemma 4 26b, but it does support prompt caching, which means for some applications it will be the cheapest. Excited to see how it compares with Gemma 4.

I wonder why there aren't more open weights model with support for prompt caching on OpenRouter.

It is tricky to build good infrastructure for prompt caching.

Its as simple as telling your claude code to implement prompt caching!

I appreciate this, makes me trust it more than benchmarks.

Kind of? But I really care about price speed and quality. If it used 10x tokens at 1/10th the tokens and same latency I would be neutral on it.

Kimmi 2.6 for example seems to throw more tokens to improve performance (for better or worse)


I’m pretty convinced that this is what Kimmi 2.6 does, mostly just thinks more.

I like both GLM and Kimmi 2.6 but honestly for me they didn’t have quite the cost advantage that I would like partly because they use more tokens so they end up being maybe sonnet level intelligence at haiku level cost. Good but not quite as extreme as some people would make them out to be and for my use cases running the much cheaper, Gemma 4 four things where I don’t need Max intelligence and running sonnet or opus for things where I need the intelligence and I can’t really make the trade-off is been generally good and it just doesn’t seem worth it to cost cut a little bit. Plus when you combine prompt, cashing and sub agents using Gemma 4, the cost to run sonnet or even opus, are not that extreme.

For coding $200 month plan is such a good value from anthropic it’s not even worth considering anything else except for up time issues

But competition is great. I hope to see Anthropic put out a competitor in the 1/3 to 1/5 of haiku pricing range and bump haiku’s performance should be closer to sonnet level and close the gap here.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: