Hacker Newsnew | past | comments | ask | show | jobs | submit | segh's commentslogin

This is an argument against all technological progress.

Nope. Just technological progress in the context of a neo-feudal system.

Being average is a just stage LLMs pass through as AI makes its way towards 'expert' and 'super human' levels.

LLMs are trained to predict tokens on highly mediocre code though. How will it exceed its training data?

Probably the same way other models learned to surpass human ability while being bootstrapped from human-level data - using reinforcement learning.

The question is, do we have good enough feedback loops for that, and if not, are we going to find them? I would bet they will be found for a lot of use cases.


Because you ask it to improve things and so it produces slightly better than average results - the average person can find things wrong with something, and fix it as well. Then you feed that improved result back in and generate a model where the average is better.

/end extreme over optimism.


Humans can decide to write above-average code by putting in more effort, writing comprehensive tests, iteratively refactoring, profile-informed optimization, etc.

I think you can have LLMs do that too, and then generate synthetic training data for "high-effort code".


Well state of the art LLMs sure can't consistently produce high quality code outside of small greenfield projects or tiny demos, which is a domain that was always easy even for humans as there are very few constraints to consider, and the context is very small.

Part of the problem is that better code is almost always less code. Where a skilled programmer will introduce a surgical 1-3 LOC diff, an incompetent programmer will introduce 100 LOC. So you'll almost always have a case where the bad code outnumbers the good.


Current LLMs do tend to explode complexity if left to their own devices but I don't think that's an inherent limitation. Mediocre programmers can write good code if they try hard enough and spend enough time on it.

Thats because humans have "understanding" they can use to assess quality, without understanding "trying harder" just means spending more "effort" distilling an average result, at best over a larger sample size.

Who are you to question our faith? /s

I can do long division manually but I still reach for a calculator.


Do you also spend a lot of time maximising calculator utilisation in other places? Maybe trying to write letters with it or composing music with it?


Won’t you get much better results trying to maximize utilization of some sort of LLM? For many people, you’d get faster and better results trying to optimize for LLMs than for any standard word processor or music composition tool.


> but I still reach for a calculator

Does the calculator give you a slightly different answer each time, even with the same inputs?


People skills :/


ngmi


This is an experiment to see the current limit of AI capabilities. The end result isn't useful, but the fact is established that in Feb 2026, you can spend $20k on AI to get a inefficient but working C complier.


Of course it's impressive. I am just pointing out that these experiments with the million line browser and now this c compiler seem to greatly extrapolate conclusions. The researchers claim they prove you can scale agents horizontally for econkmic benefit. But the products both of these built are of questionable technical quality and it isnt clear to me they are a stable enough foundation to build on top of. But everyone in the hype crowd just assumes this is true. At least this researcher has sort of promised to pursue this project whereas Wilson already pretty much gave up on his browser. I hadn't seen a commit in that repo for weeks. Given that, I am not going to immediately assume these agents truly achieved anything of economic value relative to what a smaller set of agents could have achieved.


> inefficient but working

FWIW, an inefficient but working product is pretty much the definition of a startup MVP. People are getting hung up on the fact that it doesn't beat gcc and clang, and generalizing to the idea that such a thing can't possibly be useful.

But clearly it can, and is. This builds and boots Linux. A putative MVP might launch someone's dreams. For $20k!

The reflexive ludditism is kinda scary actually. We're beyond the "will it work" phase and the disruption is happening in front of us. I was a luddite 10 months ago. I was wrong.


> FWIW, an inefficient but working product is pretty much the definition of a startup MVP

It depends on what kind of start-up we're talking about.

A compiler start-up probably should show some kind of efficiency gain even in an MVP. As in: we're insanely efficient in this part of the work, but we're still missing all other functionalities but have a clear path to implementing the rest.

This is more like: It's inefficient, and the code is such a mess that I have no idea on how to improve on it.

As per the blog improvements were attempted but that only started a game of whack-a-mole with new problems.

If on the other hand you're talking about Claude Teams for writing code as an MVP: the outcome is more like proof that the approach doesn't work and you need humans in the loop.


You are projecting and over-reacting. My response is measured against the insane hype this is getting beyond what was demonstrared. I never said ot wasn't impressive.

I'm not hung up on anything. Clearly the project isn't stable because it can't be modified without regression. It can be an MVP but if it needs someone to rewrite it or spend many man-months just to grok the code to add to it then its conceivable it isnt an economic win in the long run. Also, they haven't compared this to what a smaller set of agents could accomplish with the same task and thus I am still not fully sold on the economic viability of horizontally scaling agents at this time (well at least not on the task that was tested).


> The end result isn't useful

Then, as your parent comment asked, is there value in it? $20K, which is more than the yearly minimum wage in several countries in Europe, was spent recreating a worse version of something we already have, just to see if it was possible, using a system which increases inequality and makes climate change—which is causing people to die—worse.


Far far more people use ChatGPT than Claude.ai


I started skimming and instantly thought AI, then came to the comments to see if it was just me.


School incentives are not really aligned around maximizing learning rate for every student. (E.g. that is why there is/was debate around teaching phonetics)


Cool experiment! My intuition suggests you would get a better result if you let the LLM generate tokens for a while before giving you an answer. Could be another experiment idea to see what kind of instructions lead to better randomness. (And to extend this, whether these instructions help humans better generate random numbers too.)


ChatGPT's tone is slowly taking over the entire internet


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: