Hacker Newsnew | past | comments | ask | show | jobs | submit | RALaBarge's commentslogin

Yeah then you can version lock changes to one thing post-evaluation vs or even easier as noted above, download the stdlib and host it yourself.

I think we are stuck with LLMs. They are already in a place where they can find these issues in the first place. They can access RSS feeds. You could cron an agent to look to see if you are pwned as frequently as you want at literally almost zero cost. When you do ingest the libraries, keep a list and of what version and that can help as well.

It’s all about tooling, if the ai can fetch data it can do something rad with it. Use something like an ai harness to have an mcp server and other tooling to improve the harness and the tools I made this for my own learning: GitHub.com/ralabarge/beigebox

Jingoism: Its such a rush!

Check out github.com/ralabarge/beigebox -- OSS AI Harness, started as a way to save all of my data but has agentic features, MCP server, point it at any endpoint (or use any front end with it as well, transparent middleware)

So far what I am finding is that you just get the basics working and then use the tool and inference to improve the tool.


We totally agree.

That's what I've been heads down, HUNGRY, working on, looking for investors and founding engineers pst: https://heymanniceidea.com (disclaimer: I am not associated with heymanniceidea.com)


YMMV but Grok 4.1 Fast can usually find via static analysis a few things that other models dont seem to catch with the same prompt

I can say without a shadow of a doubt: yes.


I am like "Yeah ok, use the Arcee Trinity models!" and its like, you got it boss, 3 opus agents in parallel, got it!


The more I work with AIs (I build AI harnessing tools), the more I see similarities between the common attention failures that humans make. I forgot this one thing and it fucks everything up, or you just told me but I have too much in my mind as context that I forget that piece, or even in the case of Claude last night attesting to me while I am ordering it around that it cannot SSH into another server but I find it SSHing into said server about the 5th time I come back with traceback and it just fixes it!

All of these things human do, and i don't think we can attribute it directly to language itself, its attention and context and we both have the same issues.


Right, but when humans are writing the code, they have learned to focus on putting downward pressure on the complexity of the system to help mitigate this effect. I don't get the sense that agents have gotten there yet.


Big business LLMs even have the opposite incentive, to churn as many tokens as possible.


At least tokens are equivalent to measuring 'thinking'... I wouldn't mind if it burned 100k tokens to output a one line change to fix a bug.

The problem is maximizing code generated per token spent. This model of "efficiency" is fundamentally broken.


>...I see similarities between the common attention failures that humans make. I forgot this one thing and it fucks everything up, or you just told me but I have too much in my mind as context that I forget that piece

Or you're working in a trendy, modern open-plan office and between the noise from the salespeople nearby talking loudly to customers on their speakerphones, some coworkers talking about their medical issues, and the guy right next to you talking loudly to himself in a different language, you're unable to concentrate at all on your programming task.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: