I think the difference is that with LLMs, in a lot of cases you do see some diminishing returns.
I won't deny that the latest Claude models are fantastic at just one shotting loads of problems. But we have an internal proxy to a load of models running on Vertex AI and I accidentally started using Opus/Sonnet 4 instead of 4.6. I genuinely didn't know until I checked my configuration.
AI models will get to this point where for 99% of problems, something like Gemma is gonna work great for people. Pair it up with an agentic harness on the device that lets it open apps and click buttons and we're done.
I still can't fathom that we're in 2026 in the AI boom and I still can't ask Gemini to turn shuffle mode on in Spotify. I don't think model intelligence is as much of an issue as people think it is.
100% agree here. The actual practical bottleneck is harness and agentic abilities for most tasks.
It's the biggest thing that stuck out to me using local AI with open source projects vs Claude's client. The model itself is good enough I think - Gemma 4 would be fine if it could be used with something as capable as Claude.
And that's gonna stay locked down unfortunately especially on mobile and cars - it needs access to APIs to do that stuff - and not just regular APIs that were built for traditional invoking.
The same way that websites are getting llm.txts I think APIs will also evolve.
GPT 3.5 was intelligent enough to understand that command and turn it into a correct shaped JSON object: the platforms don't have tight enough integration to take advantage of the intelligence
I think security is the issue-ai is good at circumventing this. For example , ai can read paywalled articles you cannot. Do you really want ai to have ‘free range’.?
I mean to me even difference between Opus and Sonnet is as clear as day and night, and even Opus and the best GPT model. Opus 4.6 just seems much more reliable in terms of me asking it to do something, and that to actually happen.
It depends what you're asking it though. Sure, in a software development environment the difference between those two models is noticeable.
But think about the general user. They're using the free Gemini or ChatGPT. They're not using the latest and greatest. And they're happy using it.
And I am willing to bet that a lot of paying users would be served perfectly fine by the free models.
If a capable model is able to live on device and solve 99% of people's problems, then why would the average person ever need to pay for ChatGPT or Gemini?
But even other tasks, like research etc, where dates are important, little details and connections are important, reasoning is important, background research activities or usage of tools outside of software development, and this is where I am finding much of the LLMs most useful for my life.
Even Opus makes mistakes with dates or not understanding news and everything correctly in context with chronological orders etc, and it would be even worse with smaller and less performing models.
My experience is very different than yours. Codex and CC yield very differenty result both because of the harness differencess and the model differences, but niether is noticeably better than the other.
Personally, I like Codex better just because I don't have to mess with any sort of planning mode. If I imply that it shouldn't change code yet, it doesn't. CC is too impatient to get started.
I guess yes, that's a harness difference, and you can also configure CC as a harness to behave very differently, but still with same harness and guidance, "to me" there's still a difference in terms of Opus 4.6 and e.g. GPT 5.4 or which GPT model do you use? I've been using Claude Code, Codex and OpenCode as harnesses presently, but for serious long running implementation I feel like I can only really rely on CC + Opus 4.6.
I come from Cursor before having adopted the TUI tools. Opus was nothing short of pathetic in their environment compared to the -codex models. I would only use it for investigations and planning because it was faster.
Like you've said, though, that could just be a harness issue.
I have the opposite experience. Codex gets to work much faster than Claude Code. Also I've never seen the need to use planning mode for Claude. If it thinks it needs a plan it will make one automatically.
"XYZ Corp" won't allow their developers to write their desktop app in Rust because they want to consume only 16MB RAM, then another implementation for mobile with Swift and/or Kotlin, when they can release good enough solution with React + Electron consuming 4GB RAM and reuse components with React Native.
Strangely enough, AI could turn this on its head. You can have your cake and eat it too, because you can tell Claude/Codex/whatever to build you a full-featured Swift version for iOS and Kotlin for Android and whatever you want on Windows and Mac. There's still QA for the different builds, but you already have to QA each platform separately anyway if you really care that they all work, so in theory that doesn't change.
Of course, it's never that simple in reality; you need developers who know each platform for that to work, because you must run the builds and tell the AI what it's doing wrong and iterate. Currently, you can probably get away with churning out Electron slop and waiting for users to complain about problems instead of QAing every platform. Sad!
> The simple fact is that a 16 GB RAM stick costs much less than the development time to make the app run on less.
The costs are borne by different people: development by the company, RAM sticks by the customer.
A company is potentially (silently?) adding to the cost of the product/service that the customer has to bear by needed to have more RAM (or have the same amount, but can't do as much with it).
Yep, and since companies care about TCO, they reward the software with the lower TCO, which happens to be the one that uses more RAM but is cheaper to produce.
Some software has millions or even billions of users. The cost of 16 GB multiplied by million millions or billions would pay for a lot of refactoring.
That said, I think it’s more of a collective action problem. The person who could pay for the refactor to operate in 640 K is not the same person who has to pay for the 16 GB. And yes, the 16 GB is cheap enough in comparison to other costs that the latter group doesn’t necessarily notice that they are subsidizing inefficient development.
I think stavros means amortization on an individual level - if all software is bloated and requires 16GB to run then my expense for a 16GB stick is not caused by a single piece of software, but everything I use.
Not that I agree of course :) I’m talking more of the net negative of everyone needing to buy 16gb sticks so developers can YOLO vibe-coded unoptimized garbage. But at least I think the former explanation is what stavros meant :)
People get hung up on bad optimization. It you are the working at sufficiently large scale, yes, thinking about bytes might be a good use of your time.
But most likely, it's not. At a system level we don't want people to do that. It's a waste of resources. Making a virtue out of it is bad, unless you care more about bytes than humans.
These bytes are human lives. The bytes and the CPU cycles translate to software that takes longer to run, that is more frustrating, that makes people accomplish less in longer time than they could, or should. Take too much, and you prevent them from using other software in parallel, compounding the problem. Or you're forcing them to upgrade hardware early, taking away money they could better spend in different areas of their lives. All this scales with the number of users, so for most software with any user base, not caring about bytes and cycles is wasting much more people-hours than is saving in dev time.
Creating people able to do these optimizations costs human life, which is not spend on other things, like building the unoptimized version of another product.
We're not talking about writing assembly by hand here. If your software has a million daily users and wastes a minute of their day, that's about 9 work-years of labour wasted every single day.
In a 5-year lifecycle that's about 10,000 years of human labour wasted. Yes, I had to quadruple-check this myself.
Does it take 10,000 work-years of effort, per project, to train its developers to write reasonably performant code?
Of course not all of this would translate into actual productivity gains but it doesn't have to.
The one we're in where "software" doesn't just mean an app that someone downloads from a website or an app store. Software includes lots of server side components, etc, etc.
I once noticed my name in the Chromium OS credits due to a patch I had submitted to a library that's on every Chromebook. 1 million would be a small number for Chromebooks alone.
I'm not talking about the median piece of software with 2 users and 0.1 developers (I made that up).
The ones that stick out are actively maintained, widely used, and well funded. It doesn't have to be a million active users, but they should be the first to get their act together.
Look at the whole history of computing. How many times has the pendulum swung from thin to fat clients and back?
I don't think it's even mildly controversial to say that there will be an inflection point where local models get Good Enough and this iteration of the pendulum shall swing to fat clients again.
Assuming improvements in LLMs follow a sigmoid curve, even if the cloud models are always slightly ahead in terms of raw performance it won't make much of a difference to most people, most of the time.
The local models have their own advantages (privacy, no -as-a-service model) that, for many people and orgs, will offset a small performance advantage. And, of course, you can always fall back on the cloud models should you hit something particularly chewy.
(All IMO - we're all just guessing. For example, good marketing or an as-yet-undiscovered network effect of cloud LLMs might distort this landscape).
My thinkpad is nearly 10 years old, I upgraded it to 32GB of ram and have replaced the battery a couple of times, but it's absolutely fine apart from that.
If AI which was leading edge in 2023 can run on a 2026 laptop, then presumably AI which is leading edge in 2026 will run on a 2029 laptop. Given that 2023 was world changing then that capacity is now on today's laptop
Either AI grows exponentially in which case it doesn't matter as all work will be done by AI by 2035, or it plateaus in say 2032 in which case by 2035 those models will run on a typical laptop.
What used to be a very exciting pioneering space company is now a cash cow that directly funds and enables whimsies of an unhinged billionaire that keeps doing shit aimed at destabilizing lives of many, including mine. And I am not even in the US!
I think the sentiment is more widespread than you think. SpaceX has always succeeded despite Elon’s clown work. I for one would be absolutely humiliated to work for someone like him.
Isaacson isn’t an objective biographer. He needs access to continue pumping out more work, and he won’t get access if he doesn’t show his subject in a positive light.
I don’t doubt these anecdotes that show the guy’s intuition was correct in a few cases. I’d be interested to know when it was wrong.
For example, he had an intuition that his engineers could build full self driving in 12 months 10 years ago. And every 12 months he’d show up promising that it would be ready for real for real in 12 months. Nah bro, he was just lying to pump up the stock price. Yeah, maybe. Or his intuition was just wrong.
He’s exceptionally good at taking credit for work done by others. SpaceX is the classic example, but him playing Path of Exile claiming to have reached the maximum level possible when actually he was taking credit for someone else playing on his behalf is another good example.
> (Hint: Musk was right and his engineers were wrong. Both times.)
This is a really useless statistic. How many times was Musk wrong and his engineers were right? We don't know, we only have those two hand-picked examples from a less than neutral source.
Not even an overall error rate is helpful: it's fine to make a huge number of incorrect guesses when your goal is R&D and the payoff from the few correct guesses covers the expense of the incorrect ones.
For example, engineers also warned the launch pad wasn't suitable for the first test flight, those enineers were correct, it didn't matter as much as it would have for e.g. a Saturn V or a Space Shuttle launch because Starship was (and still is) a test flight of a shockingly cheap unmanned vehicle.
That said, if the people currently raising alarm over Kessler cascade risks are correct, that would be an example of something essentially unsurvivable for Starlink and similar constellations.
Musk gambles when he thinks the expected return is positive; you can do that while winning hardly any bets, let alone loosing hardly any. What you have to avoid fooling yourself over are the odds, the costs, and the rewards, and I do think Musk is fooling himself about political costs if nothing else.
I see no reason to believe the extraordinary progress we've seen recently will stop or even slow down. Personally, I've benefited so much from AI that it feels almost alien to hear people downplaying it. Given the excitement in the field and the sheer number of talented individuals actively pushing it forward, I'm quite optimistic that progress will continue, if not accelerate.
If LLM's are bumpers on a bowling lane, HN is a forum of pro bowlers.
Bumpers are not gonna make you a pro bowler. You aren't going to be hitting tons of strikes. Most pro bowlers won't notice any help from bumpers, except in some edge cases.
If you are an average joe however, and you need to knock over pins with some level of consistency, then those bumpers are a total revolution.
I hear you, I feel constantly bewildered by comments like "LLMs haven't changed really since GPT3.5.", I mean really? It went from an exciting novelty to a core pillar of my daily work, it's allowed me and my entire (granted , quote senior) org to be incredibly more productive and creative with our solutions.
And the I stumble across a comment where some LLM hallucinated a library that means clearly AI is useless.
I agree. Below are a few errors. I have also asked ChatGPT to check the summaries and it found all the errors (and even made up a few more which weren't actual errors, but just not expressed in perfect clarity.)
Spoilers ahead!
First novel: The Trisolarans did not contact earth first. It was the other way round.
Second novel: Calling the conflict between humans and Trisolarans a "complex strategic game" is a bit of a stretch. Also, the "water drops" do not disrupt ecosystems. I am not sure whether "face-bearers" is an accurate translation. I've only read the English version.
Third novel: Luo Yi does not hold the key to the survival of the Trisolarans and there were no "micro-black holes" racing towards earth. Trisolarans were also not shown colonizing other worlds.
I am also not sure whether Luo Ji faced his "personal struggle and psychological turmoil" in this novel or in an earlier novel. He certainly was most certain of his role at the end. Even the Trisolarians judged him at over 92 % deterrent rate.
Yeah describing Luo Ji as having "struggles with the ethical implications of his mission" is the biggest whopper.
He's like God's perfect sociopath. He wobbles between total indifference to his mission and interplanetary murder-suicide, and the only things that seem to really get to him are a stomachache and being ghosted by his wife.
And this example does not even illustrate the long context understanding well, since smaller Qwen2.5 models can already recall parts of the Three Body Problem trilogy without pasting the three books into the context window.
And multiple summaries of each book (in multiple languages) are almost definitely in the training set. I'm more confused how it made such inaccurate, poorly structured summaries given that and the original text.
Although, I just tried with normal Qwen 2.5 72B and Coder 32B and they only did a little better.
Seems a very difficult problem to produce a response just on the text given and not past training. An LLM that can do that would seem to be quite more advanced than what we have today.
Though I would say humans would have difficulty too -- say, having read The Three Body problem before, then reading a slightly modified version (without being aware of the modifications), and having to recall specific details.
This problem is poorly defined; what would it mean to produce a response JUST based on the text given? Should it also forgo all logic skills and intuition gained in training because it is not in the text given? Where in the N dimensional semantic space do we draw a line (or rather, a surface) between general, universal understanding and specific knowledge about the subject at hand?
That said, once you have defined what is required, I believe you will have solved the problem.
For a lot of applications we have pretty much reached that point already. 2TB NVMe SSD are around the $100-$150 price point these days. Unless they are actively trying, the average desktop user is never going to fill that up. There are only so many holiday pictures you can take, after all.
I think the size of audio files is a great example of why storage needs won't infinitely grow. Although we have orders of magnitude more storage space these days, audio files haven't really gotten any bigger since the CD era. If anything, they have gotten smaller as better compression algorithms were invented.
The thing is, human hearing only has so much resolution. Sure, we could be sampling audio at 64-bits with a 1MHz sample rate these days if we wanted to, but there's just no reason to. Similarly with pictures: if you can't see any pixels standing a foot away from a poster-sized print, why bother increasing the resolution any more?
The big consumer-range data hogs are 1) video torrents, and 2) games. Both of them have a natural upper bound due to human perception. They might still grow by an order of magnitude or two, but it won't be much more before it just becomes pointless.
Enterprise is a bit of a different story, of course - especially now that AI is rapidly increasing the value of data.
I don't think holding human perception as a standard for video games holds up. For audio and video it makes sense, but we are a long way of for video.
For video games there is a huge "storage waste" factor. A factor that I think is more important than the human perception limit. If you look at modern games you can probably throw away double digit percentage signs of most games if a capable team would have the time to optimize for disk space. It's simply not done because it has little advantage. I think this wastage factor will scale with the complexity of video games regardless of graphical fidelity.
> 2TB NVMe SSD are around the $100-$150 price point these days. Unless they are actively trying, the average desktop user is never going to fill that up. There are only so many holiday pictures you can take, after all.
4k family videos would like to have a word with you.
Big brother potential aside, I could imagine a future in which everyone wears a body camera everywhere. A digital record of your entire life. Given increasingly good AI extraction, you can then have searchable transcripts of your conversations, query your life for the last time you saw movie X (interesting IP questions here), re-experience moments with grandma, whatever. There was a Black Mirror episode incorporating this concept.
4k video * lifetime is greater storage requirements than available to consumers today.
If you find such a concept interesting, I would recommend the movie Final Cut (2004) with Robin Williams. Not a particularly good movie but it does have this as an interesting premise.
> There are only so many holiday pictures you can take, after all.
You miss the point, sadly.
Yeah storage is getting larger and cheaper but modern cameras/phones/etc take photos that are way larger than they used to do.
My first digital camera in like 2001 or so could only take either 20 "large" (640x480) bitmap photos or 80 "small" pictures, taking a few tens of kilobytes each at most.
My 2022 iphone se takes high quality pictures that are easily in the 7-8MB range (i just checked).
So yeah, disks keep getting larger, but so does the media.
And image sizes aren't really bound by human perception in any real way. Yes, screens typically range from 2-8MP, and probably won't go much beyond 32MP. But more resolution, more dynamic range, more color fidelity are incredibly useful for editing pictures; and you can always add resolution to add digital zoom.
Photos are however constrained by physics. There are only so many photons captured in a certain area in a given time, so no matter whether we talk about the tiny lenses and sensors of smartphones or the much larger versions on dedicated cameras there's a very real limit (and phones are pretty close to the limits imposed by their sensor size)
I spend so much storage on just storing things that are readily available on the internet, either to speed up local access or because one day they might not be as readily available. I don't think we will "solve" storage for good unless we also "solve" bandwidth.
I think we will always fill up the storage available as we crank up the fidelity of our recordings (simulations). Video is an approximation of reality, and we can always ramp up the resolution, ramp up the dimensions, always striving to simulate the universe at the same fidelity as the universe itself, to the point that the map becomes the territory itself. In other words, until it's no longer approximating, no longer compressing. There will never be enough storage to store our simulations of our reality as long as we're forced to use that same reality to construct the storage.
I'm already at that point, effectively. I spent a few decades worrying about fitting data into hard drives... most recently with my digital photos. I've stopped worrying.
My first computer stored data on audio cassette tapes. When I started college, the new hot machine in the server room was a DEC VAX 11/780 running VMS.
For nostalgia purposes, I have a virtual VAX 11/780 running VMS in my phone, it only takes about 3 GB. I don't do much else with the phone, so I have plenty of room left over.
I can imagine ways to fill up a petabyte personally... but not much past that.
Only if we reach a peak fidelity level. Since storage will never be infinite therefore unless we reach a level of media quality that is good enough for everything it will always be an arms race.
I think even if we got storage that could store enough information to be indistinguishable from reality itself, we would still want to save variations, clips, duplicates, intermediates... I don't know that there is a peak fidelity.
The thing is, there are only 24 hours in a day. There is a hard upper limit to the amount of content you can consume. You're not going to be downloading 1000 hours worth of content every single day, just because it is physically impossible to use it.
My 500 GB HDD has been way more than adequate for the past 10+ years, and that includes having a Windows 8 partition that I haven't booted into in years.
Oddly, I don't see anything about pricing for Workers AI on the Workers pricing page[0] but their Workers AI blog post from Sept 2023[1] says the pricing is per 1k "neurons":
> Users will be able to choose from two ways to run Workers AI:
> Fast Twitch Neurons (FTN) - running at nearest user location at $0.125 / 1k neurons
> Neurons are a way to measure AI output that always scales down to zero (if you get no usage, you will be charged for 0 neurons).
Here's the key detail:
> To give you a sense of what you can accomplish with a thousand neurons, you can: generate 130 LLM responses, 830 image classifications, or 1,250 embeddings.
productionizing ai models is a pain, this makes it easy. say you were building a d&d app and wanted to generate character art, this would make it very easy to get started. aws has similar offerings (e.g sage maker) but it’s not on the edge.
reply