Me: has to babysit every feature for hours in Claude Code, building a good plan but then still iterating many many times over things that need to be fixed and tweaked until the feature can be called done.
Bloggers: Here's how we use 3,000 parallel agents to write, test, and ship a new feature to production every 17 minutes in an 8M-LOC codebase (all agent-generated!).
... I'm doing something wrong, or other people are doing something wrong?
I think this is the difference. These toy examples of using parallel agents are *not* running against large codebases, allowing them to iterate more effectively. Once you are in real codebases (>1M LoC), these systems break down.
(author here) I strongly agree that these systems start to break down once the code base gets larger (we've seen that with our own projects)
But our reaction to it has been to say "ok, well the best practice in software engineering is to make small, well-isolated components anyway, so what if we did that?"
We've been trying to really break things apart into smaller pieces (and that's even evident in mngr, where much of the code is split out into separate plugins), and have been having a ton of success with it.
I realize that that might not be an option for more brownfield / existing / legacy projects, but when making something new, I've really been enjoying this way of building things.
> The instinct should be to tweak the agent to do it right.
I'm extremely doubtful of this. It doesn't save time to tell it "you have an error on line 19", because that's (often) just as much work as fixing the error. Likewise, saying "be careful and don't make mistakes" is not going to achieve anything. So how can you possibly tweak the agent to "do it right" reliably without human intervention? That's not even a solved problem for working with _humans_ who don't have the context window limitations, let alone an LLM that deletes everything past 30k tokens.
I'm not touching code. I'm trying out the feature, and there's any number of things to tweak (because I missed some detail during planning, or agent made bad assumption, etc).
Improving the agent means improving the code base such that the agent can effectively work on it.
It can not Com as a surprise that an agent is better at working on a well documented code base with clear architecture.
On the other hand, if you expect that an agent can add the right amount of ketchup to your undocumented speghatti code, then you will continue to have a bad time.
Everyone do yourselves a favor and skip all trailers and go see this movie. It was a delight start to finish. I was so glad I knew zero what the story was.
I’m genuinely worried that we’re the last generation who will study and appreciate this craft. Because now a kid learning to program will just say “Write me a terminal spreadsheet app in plain C.”
Which is somewhat akin to downloading one today. If, however, that same kid started small, with a data model, then added calculation, and UI and stepped through everything designing, reviewing, and testing as they went, they would learn a lot, and at a faster pace than if they wrote it character by character.
The thing is, any generation can say something similar. Just look at the article: it manages to produce and describe the creation of a simple spreadsheet, yet the code and accompanying description would only fill a small pamphlet.
There are various reasons for that, and those reasons extend beyond leaving out vital functionality. While C is archaic by our standards, and existed at the time VisiCalc was developed, it was programmed in assembly language. It pretty much had to be, simply to hold the program and a reasonable amount of data in memory. That, in turn, meant understanding the machine: what the processor was capable of, the particular computer's memory map, how to interface with the various peripherals. You sure weren't going to be reaching for a library like curses. While it, like C, existed by the time of VisiCalc's release, it was the domain of minicomputers.
I mean, can the current generation truly understand the craft when the hard work is being done my compilers and libraries?
On asking for user input during implementation, it's best to use this when you have a plan sufficiently written up that you can point it to. To prep that plan, you can also use cook to iterate on the plan for you. Having Claude Code use `/cook` directly is nice because it watches what the subagents are up to and can speak for them, although Claude can't speak to the subagents running through cook.
On permissions, by default, when it runs instances of Claude they will inherit your Claude's permissions. So if there is no permission to `rm -rf /`, Claude will just get denied and move on. Using the docker sandbox option (see bottom of page), then it runs inside that `--dangerously-skip-permissions` and get more stuff done (my preferred option). The hard part about that is it means you need to set up the Docker sandbox with any dependencies your project needs. Run `cook init` and edit the `.cook/Dockerfile` to set those up.
I would be interested in which stories you are thinking of. Stories of Claude breaking out of the restrictions set in its sandbox or stories of people not configuring Claude's sandbox correctly?
> We told Claude Code to block npx using its own denylist. The agent found another way to run it and copied the binary to a new path using /proc/self/root to bypass the deny pattern. When Anthropic's sandbox caught that, the agent disabled the sandbox. No jailbreak, no special prompting. The agent just wanted to eagerly finish the task.
If you impl this as a backend and connect to Telegram bots, agents can just do `$ ask "Should I do this?"` for agent→human and `$ alert "this thing blocked me"` for coder→planner. That's what I'm actually doing — I have 1 manager + 3 designers + 1 researcher + 2 debugger + 1 communicator + any number of temporal coders/reviewers in my setup, all connected to taskwarrior for task-driven-dev
That is pretty cool building the whole dev team of agents and is it still with a star topology of a Manager agent interacting with all the other subagents?
I usually spawn 1 Mother Agent in a star topology with 3 subagents Planner, Reviewer, Implementer and them let them talk using Claude built-in agent tool. But the best thing I think was probably that a "do-nothing" setup wizard is part of the workflow.
Yeah the pipeline runs effectively and I'm able to be in the loop when the loop needs me.
In my setup there are two planes — manager and worker. On the manager plane, all primary agents form a mesh with p2p communication. Each designer connects to 1 or more workers in a star topology, since workers may have questions or get blocked while executing a plan.
The limitation of the built-in agent tool is it doesn't allow nested subagent spawning. But it's normal for a designer or researcher to need subagents — when a plan is done, I use a plan-review-leader agent to review it. If you try mother → planner → plan-review-leader → plan-vs-reality-validator, the nesting gets deep fast and blocks your manager from doing other work.
I like your 2-plane mesh setup with multiple top-level agents and 'you' on the manager plane mesh that are able to communicate via a daemon mediated communication.
The daemon is the Telegram bridge, the tmux router, the CI status deliverer, and the cleanup coordinator all in one process. It allows for cross star topology communication unlike MoMa that basically just corresponds to a single manager and is similar to your plan-review-leader agent living in the manager plane if that one was isolated.
My previous concern was that maybe you would face a timeout in case user input was needed during a pipeline run, and in a case where the user was too slow to provide an answer through telegram (I imagine during the night), but maybe even github pipelines can be set to wait unlimited.
I really like the setup, and exactly I also faced the no nested agent spawning limit by Anthropic for Claude Code built-in agent tooling, that dictates the star topology in the first place.
I use the git worktree as well for every MoMa agent and they all live in Linux screen session. Maybe I should consider going to tmux myself instead of screen as I understand all your agents in top-level manager plane also are just tmux sessions.
using a single plan-reviewer would be slow when there are multiple aspects to review. That's why a local star topology with a plan-review-leader is needed: it spawns multiple reviewers in parallel, each focused on a different aspect.
Yeah, with a 2 plane topology you are able to inherit concurrency as you for instance just hand off work from a designer agent to the plan-review-leader that spawns any number of reviewers in a star topology.
By default it's locked down to the permissions you have granted in your Claude config. If you use the docker sandbox mode, then you can really let it fly as it can issue more commands in a safer environment.
I can easily name amazing movies in the last couple of years and in the last decade. I think we're actually in a bit of a movie renaissance right in cinematic craft and storytelling.