So where do these binaries get built and how does the system know which binaries...

tom_ · on Oct 12, 2019

Suppose the binaries in question are build tools or similar: then this is good, because they never get rebuilt. The paperwork is done, the binaries get committed to version control, and everybody that builds the code then builds the code with the approved binaries. Everybody is happy.

Suppose the binaries are build byproducts, and people just check this stuff in, like, whatever. Well, if somebody needs to sign off on the output, that's a problem - so that person then doesn't use what's in the repo, but instead builds the output from scratch, from the source code, hopefully with known build tools (see above!), and signs off on whatever comes out.

But, day to day, for your average build, which is going to be run on your own PC and nowhere else, nobody need sign off on anything. If you link with some random object file that was built on a colleague's machine, say, then that's probably absolutely fine - and even if it isn't, it's still probably fine enough to be getting on with for now. If you work for the sort of company that's worried about this stuff, there's a QA department, so any issues arising are not going to get very far.

Overall, this stuff sorts itself out over time. Things that are problems end up having procedures introduced to ensure that they stop happening. And things that are non-problems just... continue to happen.

marcinzm · on Oct 12, 2019

>So where do these binaries get built and how does the system know which binaries to rebuild for a given change?

For simple things, if the code in a directory changes then the CI system does a rebuild of that directory. You can have the CI system either validate that the binary matches or commit the binary itself. More complicated things you'll have a build system such as Bazel which figures out what changed.

weberc2 · on Oct 12, 2019

(Sorry for being terse—on mobile). Validate the binary matches what? If the compiler has to compile the artifact to verify the artifact provided by the developer, why bother having the developer commit the artifact? The CI system could just do it. Never mind that having a bit-for-bit reproducible build is incredibly difficult. Anyway, such simple cases where a whole app lives under a single directory are vanishingly rare.

marcinzm · on Oct 12, 2019

>The CI system could just do it.

Depends if you want to wait for the CI system to upload or not. Also if you want CI to have commit permissions.

>Never mind that having a bit-for-bit reproducible build is incredibly difficult.

Debian is at something like 90% reproducible packages once they fix two outstanding things. Most languages will have settings and best practices at this point that will give reproducible builds.

>Anyway, such simple cases where a whole app lives under a single directory are vanishingly rare.

Then use Bezel once you get past that stage.

Look, to be blunt, it seems like you're trying to nitpick whatever anyone says while ignoring large parts of answers. Fact is, many people at small and large companies use monorepos successfully. They work for those people, you can keep trying to argue they don't or try to learn why they do.

weberc2 · on Oct 13, 2019

> Depends if you want to wait for the CI system to upload or not. Also if you want CI to have commit permissions.

I guess you could deploy first and verify automatically later. Hadn’t thought of that.

> Debian is at something like 90% reproducible packages once they fix two outstanding things. Most languages will have settings and best practices at this point that will give reproducible builds.

Never the less, getting (and keeping) bit-for-bit reproducibility is a ton of work, especially for software that changes every day, and the benefits aren’t compelling for many projects.

> Then use Bezel once you get past that stage.

This seems to be the answer, but it’s not very satisfying since Bazel’s support for many popular languages (e.g., Python) is lacking and there are lots of rough edges to iron out.

> Look, to be blunt, it seems like you're trying to nitpick whatever anyone says while ignoring large parts of answers. Fact is, many people at small and large companies use monorepos successfully. They work for those people, you can keep trying to argue they don't or try to learn why they do.

I never understand why people get defensive about things like this. I’m not attacking monorepos. I manage a monorepo at my small company, and I’ve run into lots of issues trying to make it work. I’m here trying to understand why so many people rave about monorepos, but often don’t have good answers for things like “how to manage rebuilds?”. You see this as “nitpicking”, but the distinction between “just git diff a directory!” and “use something like Bazel” is important.

jayd16 · on Oct 13, 2019

Its really not any different than depending on the exact version in some dependency manager. Instead of just the dependency config you check in the binary. When a dev needs a newer version of a dependency they can pull it down and check it in. You wouldn't check in random nameless binaries, just hard copies of things you would have linked to from a dependency repository.

This doesn't work well for dependencies where you're expected to be using the latest version of something that changes 10 times a day.

The rest of your questions are fairly irrelevant as they would be answered the same way as the in the dependency repo case. ie, use official binaries.

...but this is closer to multi-repo than monorepo. If you're in a monorepo you might as well use the source.

morelisp · on Oct 13, 2019

> So where do these binaries get built and how does the system know which binaries to rebuild for a given change?

By the CI. All major CI/CD tools support rules like build binary x whenever a file under x-src/* changes; commit binary x when the ref matches /v[0-9.]+/; don't allow developers to manually push to these refs / paths; (run a script to) bump the dependent x of y whenever binary x changes; merge the bumped version if all tests still pass; etc.

weberc2 · on Oct 13, 2019

The problem is dependency graphs aren’t strictly hierarchical, so it doesn’t suffice to say “rebuild whenever something under this directory changes”.

CuriousSkeptic · on Oct 12, 2019

Not sure how people do this in practice. But in principle it seems rather straight forward.

A compiler is just a program that takes some input and create some output. Both the compiler and the input can have a cryptographically secure hash. Putting both in a sealed box, like a docker image, with its own hash, gives you a program that takes no input and produces some output.

If the box changes, run it in a trusted machine and save the output together with a signed declaration of which box version produced it

fragmede · on Oct 13, 2019

Docker makes this drastically easier (need the exact same versions of all libraries and the compiler), but there are still compile time things that are unique per-compile. Debian has been working hard to get hashes of binaries to be useful but the work is far from trivial.

(See also: trusting trust)

CuriousSkeptic · on Oct 13, 2019

I see this an inherited technical debt though not a flaw in principle. It would be nice if we could solve this at the foundational level instead of forcing every dev organization to struggle with it on their own.

Edit: I think we’re getting there though, with all the efforts going on with containers, webassembly blockchains, ipfs and so forth it’s getting closer