More

khazhoux · 2026-04-09T23:23:46 1775777026

Doubly so if the app looks like it works well and would be interesting to HN readers!

fragmede · 2026-04-10T12:12:26 1775823146

What a secret cheat code! Make awesome stuff that people like and charge them money for it. Who would have thought!

khazhoux · 2026-04-09T07:00:12 1775718012

Holy cow what a workout!! Thanks for sharing this.

khazhoux · 2026-04-05T06:39:27 1775371167

Me: has to babysit every feature for hours in Claude Code, building a good plan but then still iterating many many times over things that need to be fixed and tweaked until the feature can be called done.

Bloggers: Here's how we use 3,000 parallel agents to write, test, and ship a new feature to production every 17 minutes in an 8M-LOC codebase (all agent-generated!).

... I'm doing something wrong, or other people are doing something wrong?

jiffy_lubricant · 2026-04-05T06:53:19 1775371999

> 8M-LOC codebase

I think this is the difference. These toy examples of using parallel agents are *not* running against large codebases, allowing them to iterate more effectively. Once you are in real codebases (>1M LoC), these systems break down.

thejash · 2026-04-05T15:11:08 1775401868

(author here) I strongly agree that these systems start to break down once the code base gets larger (we've seen that with our own projects)

But our reaction to it has been to say "ok, well the best practice in software engineering is to make small, well-isolated components anyway, so what if we did that?"

We've been trying to really break things apart into smaller pieces (and that's even evident in mngr, where much of the code is split out into separate plugins), and have been having a ton of success with it.

I realize that that might not be an option for more brownfield / existing / legacy projects, but when making something new, I've really been enjoying this way of building things.

tossandthrow · 2026-04-05T06:55:37 1775372137

To an extend you are likely doing something wrong.

I understand that the natural instinct is to correct the output when you see your agent doing something wrong.

That is not productive.

The instinct should be to tweak the agent to do it right.

At this point I am almost not writing any code in an enterprise code base.

pinkmuffinere · 2026-04-05T07:06:56 1775372816

> The instinct should be to tweak the agent to do it right.

I'm extremely doubtful of this. It doesn't save time to tell it "you have an error on line 19", because that's (often) just as much work as fixing the error. Likewise, saying "be careful and don't make mistakes" is not going to achieve anything. So how can you possibly tweak the agent to "do it right" reliably without human intervention? That's not even a solved problem for working with _humans_ who don't have the context window limitations, let alone an LLM that deletes everything past 30k tokens.

faeyanpiraat · 2026-04-05T07:19:06 1775373546

Are you seriously interested in the answer, or are you just mad?

I could give you some pointers, but will only type it out if there is a point

Erem · 2026-04-05T07:28:06 1775374086

Not GP, but I would love pointers on precisely this problem

tossandthrow · 2026-04-05T09:20:46 1775380846

It is about tweaking inline documentation to make sure that

1. It is not ambiguous 2. It is as complete as possible.

I am surprised that I got down voted for proposing the improve a code base such that agents can run on it as a means to increased productivity.

khazhoux · 2026-04-05T07:03:32 1775372612

I'm not touching code. I'm trying out the feature, and there's any number of things to tweak (because I missed some detail during planning, or agent made bad assumption, etc).

lelanthran · 2026-04-05T07:48:15 1775375295

> The instinct should be to tweak the agent to do it right.

Ah, yes; must always remember to add "And don't make any mistakes" into the prompt /s

tossandthrow · 2026-04-05T09:23:30 1775381010

I am not entirely sure what you are referring to.

Improving the agent means improving the code base such that the agent can effectively work on it.

It can not Com as a surprise that an agent is better at working on a well documented code base with clear architecture.

On the other hand, if you expect that an agent can add the right amount of ketchup to your undocumented speghatti code, then you will continue to have a bad time.

khazhoux · 2026-04-03T06:27:17 1775197637

When you gotta boldly go, you gotta boldly go!

khazhoux · 2026-04-03T06:24:51 1775197491

I hope they remember to cut the mics during the fluid disposal event

https://www.youtube.com/watch?v=ruCsYGL3QlY

khazhoux · 2026-03-25T16:26:13 1774455973

Everyone do yourselves a favor and skip all trailers and go see this movie. It was a delight start to finish. I was so glad I knew zero what the story was.

khazhoux · 2026-03-20T17:04:09 1774026249

I’m genuinely worried that we’re the last generation who will study and appreciate this craft. Because now a kid learning to program will just say “Write me a terminal spreadsheet app in plain C.”

jdswain · 2026-03-20T19:19:50 1774034390

Which is somewhat akin to downloading one today. If, however, that same kid started small, with a data model, then added calculation, and UI and stepped through everything designing, reviewing, and testing as they went, they would learn a lot, and at a faster pace than if they wrote it character by character.

II2II · 2026-03-20T20:38:08 1774039088

The thing is, any generation can say something similar. Just look at the article: it manages to produce and describe the creation of a simple spreadsheet, yet the code and accompanying description would only fill a small pamphlet.

There are various reasons for that, and those reasons extend beyond leaving out vital functionality. While C is archaic by our standards, and existed at the time VisiCalc was developed, it was programmed in assembly language. It pretty much had to be, simply to hold the program and a reasonable amount of data in memory. That, in turn, meant understanding the machine: what the processor was capable of, the particular computer's memory map, how to interface with the various peripherals. You sure weren't going to be reaching for a library like curses. While it, like C, existed by the time of VisiCalc's release, it was the domain of minicomputers.

I mean, can the current generation truly understand the craft when the hard work is being done my compilers and libraries?

khazhoux · 2026-03-20T16:09:46 1774022986

The quote is “Walker says I have AIDS”

khazhoux · 2026-03-19T04:35:09 1773894909

How does this handle when Claude needs user input? To choose an option, grant tool permission, clarify questions…

staticvar · 2026-03-19T10:09:57 1773914997

On asking for user input during implementation, it's best to use this when you have a plan sufficiently written up that you can point it to. To prep that plan, you can also use cook to iterate on the plan for you. Having Claude Code use `/cook` directly is nice because it watches what the subagents are up to and can speak for them, although Claude can't speak to the subagents running through cook.

On permissions, by default, when it runs instances of Claude they will inherit your Claude's permissions. So if there is no permission to `rm -rf /`, Claude will just get denied and move on. Using the docker sandbox option (see bottom of page), then it runs inside that `--dangerously-skip-permissions` and get more stuff done (my preferred option). The hard part about that is it means you need to set up the Docker sandbox with any dependencies your project needs. Run `cook init` and edit the `.cook/Dockerfile` to set those up.

trumbitta2 · 2026-03-19T11:01:46 1773918106

Re: So if there is no permission to `rm -rf /`, Claude will just get denied and move on.

Until it doesn't and it finds a way to work around the restriction. Lots of stories around about that.

staticvar · 2026-03-19T13:02:53 1773925373

I would be interested in which stories you are thinking of. Stories of Claude breaking out of the restrictions set in its sandbox or stories of people not configuring Claude's sandbox correctly?

trumbitta2 · 2026-03-20T09:15:10 1773998110

Latest one: https://ona.com/stories/introducing-veto-security-for-the-ne...

staticvar · 2026-03-21T01:24:22 1774056262

> We told Claude Code to block npx using its own denylist. The agent found another way to run it and copied the binary to a new path using /proc/self/root to bypass the deny pattern. When Anthropic's sandbox caught that, the agent disabled the sandbox. No jailbreak, no special prompting. The agent just wanted to eagerly finish the task.

I wish that article went into more detail about that attack. But I believe it, the extent that the permissions are easy to get wrong in your claude setttings. For example: https://www.youtube.com/watch?v=3CSi8QAoN-s&lc=UgwFNAh5fvDGJ...

neilbb · 2026-03-19T08:18:19 1773908299

If you impl this as a backend and connect to Telegram bots, agents can just do `$ ask "Should I do this?"` for agent→human and `$ alert "this thing blocked me"` for coder→planner. That's what I'm actually doing — I have 1 manager + 3 designers + 1 researcher + 2 debugger + 1 communicator + any number of temporal coders/reviewers in my setup, all connected to taskwarrior for task-driven-dev

mizioand · 2026-03-19T20:11:53 1773951113

That is pretty cool building the whole dev team of agents and is it still with a star topology of a Manager agent interacting with all the other subagents?

I usually spawn 1 Mother Agent in a star topology with 3 subagents Planner, Reviewer, Implementer and them let them talk using Claude built-in agent tool. But the best thing I think was probably that a "do-nothing" setup wizard is part of the workflow.

https://github.com/mizioandOrg/claude-planner-reviewer-imple...

Did you have success with running stuff in a pipeline and being requested for input in agent->human needed scenarios?

neilbb · 2026-03-20T02:42:02 1773974522

Yeah the pipeline runs effectively and I'm able to be in the loop when the loop needs me.

In my setup there are two planes — manager and worker. On the manager plane, all primary agents form a mesh with p2p communication. Each designer connects to 1 or more workers in a star topology, since workers may have questions or get blocked while executing a plan.

The limitation of the built-in agent tool is it doesn't allow nested subagent spawning. But it's normal for a designer or researcher to need subagents — when a plan is done, I use a plan-review-leader agent to review it. If you try mother → planner → plan-review-leader → plan-vs-reality-validator, the nesting gets deep fast and blocks your manager from doing other work.

I wrote a blog post about this yesterday: https://dev.to/neil_agentic/ttal-more-than-a-harness-enginee...

mizioand · 2026-03-22T13:22:43 1774185763

I like your 2-plane mesh setup with multiple top-level agents and 'you' on the manager plane mesh that are able to communicate via a daemon mediated communication.

The daemon is the Telegram bridge, the tmux router, the CI status deliverer, and the cleanup coordinator all in one process. It allows for cross star topology communication unlike MoMa that basically just corresponds to a single manager and is similar to your plan-review-leader agent living in the manager plane if that one was isolated.

My previous concern was that maybe you would face a timeout in case user input was needed during a pipeline run, and in a case where the user was too slow to provide an answer through telegram (I imagine during the night), but maybe even github pipelines can be set to wait unlimited.

I really like the setup, and exactly I also faced the no nested agent spawning limit by Anthropic for Claude Code built-in agent tooling, that dictates the star topology in the first place.

I use the git worktree as well for every MoMa agent and they all live in Linux screen session. Maybe I should consider going to tmux myself instead of screen as I understand all your agents in top-level manager plane also are just tmux sessions.

neilbb · 2026-03-20T05:11:26 1773983486

also wrote a note about ttal's multi-agent patterns: https://dev.flicknote.app/notes/5a95cdcd-bb63-4eb6-9961-7007...

mizioand · 2026-03-22T13:28:30 1774186110

I had a look and I understand that maybe you also use flicknote.app for agent context management.

neilbb · 2026-03-20T02:47:55 1773974875

using a single plan-reviewer would be slow when there are multiple aspects to review. That's why a local star topology with a plan-review-leader is needed: it spawns multiple reviewers in parallel, each focused on a different aspect.

mizioand · 2026-03-22T13:26:55 1774186015

Yeah, with a 2 plane topology you are able to inherit concurrency as you for instance just hand off work from a designer agent to the plan-review-leader that spawns any number of reviewers in a star topology.

facorreia · 2026-03-19T05:21:58 1773897718

It seems to be in the spirit of automated vibecoding. I assume it skips all permission checks.

staticvar · 2026-03-19T10:11:51 1773915111

By default it's locked down to the permissions you have granted in your Claude config. If you use the docker sandbox mode, then you can really let it fly as it can issue more commands in a safer environment.

khazhoux · 2026-03-16T04:34:04 1773635644

I can easily name amazing movies in the last couple of years and in the last decade. I think we're actually in a bit of a movie renaissance right in cinematic craft and storytelling.