More

0xblacklight · 2026-01-14T18:58:10 1768417090

0xblacklight · 2026-01-05T19:57:13 1767643033

every PR should leave the codebase cleaner than it found it

CuriouslyC · 2026-01-05T20:15:20 1767644120

That's a hobbyist's rule. In industry, every PR should strive to deliver maximum value to the company, which is sometimes achieved by doing as little as possible so you can work down other objectives.

convolvatron · 2026-01-05T20:35:29 1767645329

thats a largely meaningless response. enforcing a decent and maintainable architecture is potentially of great value to the company. Unfortunately thats a subjective call. I'm sure you've lived through codebases that dont really admit bug fixes or feature enhancements - things that the company knows that it cares about.

as professionals who are invested in the long term success of the company, it our responsibility to bring up concerns about the future, and attempt to negotiate a good compromise between the short term and long terms goals.

0xblacklight · 2026-01-05T20:59:25 1767646765

leaving the codebase cleaner than you found it IS creating maximum value to the company because large changes are almost never walking on the knife's edge between "making the codebase better" and "making the codebase worse". Your codebase either gets better over time, or it gets worse. If it's getting 0.01% worse with every PR, that tech debt accumulates at 1.0001*(# of PRs) which grows faster than you'd think.

Having a maintainable codebase is of MASSIVE LONG-TERM value to a company - far too many orgs are paralyzed by mountains of tech debt.

Doing the minimal work possible is fine for 1-off hotfixes or tweaks or small features, but your argument assumes "maximum value to the company" is measured in the span of of a sprint, and it's not.

Cleaning up the codebase incrementally does deliver value to the company as long as you understand "cleaning up" as "making it easier and faster to contribute to, change, or debug in the future" rather than something adjacent to a form of performance art.

0xblacklight · 2025-12-01T02:17:11 1764555431

This is an excellent point - LLMs are autoregressive next-token predictors, and output token quality is a function of input token quality

Consider that if the only code you get out of the autoregressive token prediction machine is slop, that this indicates more about the quality of your code than the quality of the autoregressive token prediction machine

acedTrex · 2025-12-01T16:33:59 1764606839

> that this indicates more about the quality of your code

Considering that the "input" to these models is essentially all public code in existence, the direct context input is a drop in the bucket.

0xblacklight · 2025-12-01T00:19:06 1764548346

Same!

0xblacklight · 2025-11-30T23:55:05 1764546905

If you find it works for you, then that’s great! This post is mostly from our learnings from getting it to solve hard problems in complex brownfield codebases where auto generation is almost never sufficient.

0xblacklight · 2025-11-30T21:18:59 1764537539

I looked when I wrote the post but the paper hasn’t been revisited with newer models :/

0xblacklight · 2025-11-30T21:18:14 1764537494

It might support AGENTS.md, you could check the site and see if it’s there

0xblacklight · 2025-11-30T21:17:18 1764537438

Yep it is opinionated for how to get coding agents to solve hard problems in complex brownfield codebases which is what we are focused on at humanlayer :)

0xblacklight · 2025-11-30T21:15:09 1764537309

I think you’re missing that CLAUDE.md is deterministically injected into the model’s context window

This means that instead of behaving like a file the LLM reads, it effectively lets you customize the model’s prompt

I also didn’t write that you have to “prompt it just the right way”, I think you’re missing the point entirely

0xblacklight · 2025-11-30T21:06:49 1764536809

I imagine it’s highly-correlated to parameter count, but the research is a few months old and frontier model architecture is pretty opaque so hard to draw too too many conclusions about newer models that aren’t in the study besides what I wrote in the post