That's a hobbyist's rule. In industry, every PR should strive to deliver maximum value to the company, which is sometimes achieved by doing as little as possible so you can work down other objectives.
thats a largely meaningless response. enforcing a decent and maintainable architecture is potentially of great value to the company. Unfortunately thats a subjective call. I'm sure you've lived through codebases that dont really admit bug fixes or feature enhancements - things that the company knows that it cares about.
as professionals who are invested in the long term success of the company, it our responsibility to bring up concerns about the future, and attempt to negotiate a good compromise between the short term and long terms goals.
leaving the codebase cleaner than you found it IS creating maximum value to the company because large changes are almost never walking on the knife's edge between "making the codebase better" and "making the codebase worse". Your codebase either gets better over time, or it gets worse. If it's getting 0.01% worse with every PR, that tech debt accumulates at 1.0001*(# of PRs) which grows faster than you'd think.
Having a maintainable codebase is of MASSIVE LONG-TERM value to a company - far too many orgs are paralyzed by mountains of tech debt.
Doing the minimal work possible is fine for 1-off hotfixes or tweaks or small features, but your argument assumes "maximum value to the company" is measured in the span of of a sprint, and it's not.
Cleaning up the codebase incrementally does deliver value to the company as long as you understand "cleaning up" as "making it easier and faster to contribute to, change, or debug in the future" rather than something adjacent to a form of performance art.
This is an excellent point - LLMs are autoregressive next-token predictors, and output token quality is a function of input token quality
Consider that if the only code you get out of the autoregressive token prediction machine is slop, that this indicates more about the quality of your code than the quality of the autoregressive token prediction machine
If you find it works for you, then that’s great! This post is mostly from our learnings from getting it to solve hard problems in complex brownfield codebases where auto generation is almost never sufficient.
Yep it is opinionated for how to get coding agents to solve hard problems in complex brownfield codebases which is what we are focused on at humanlayer :)
I imagine it’s highly-correlated to parameter count, but the research is a few months old and frontier model architecture is pretty opaque so hard to draw too too many conclusions about newer models that aren’t in the study besides what I wrote in the post
reply