Misadventures in process containment

sgentle · on Jan 14, 2019

Is it just me or is the cloud era of developer tooling fundamentally weird? It seems like there used to be this tacit assumption that a good tool is one that starts small with you and grows as your needs grow. How do I log in to a remote machine? ssh. How do I log in to hundreds of remote machines? ssh. How do I reverse tunnel my local webserver through a firewall that only allows HTTPs traffic? ssh.

Of course, there were always systems that asked a lot from you before they gave anything back. I think everyone's first experience of SQL was probably "what?" followed by "no, really, what?" Kneel at the altar of the cartesian product, my child, that you may be reborn into the kingdom of relational algebra. But, hey, if apostasy is your bag there's still nothing stopping you from putting all your data in one big table. If you're good enough at it you might even snag a job at Google.

But the Dockerverse is really something else. No shade on the technology – it's really cool stuff – but boy does it love to tell you how big and complex your problems are. You want to build images on one machine and run them on another? Sounds like what you really need is a Secure Container Registry! Want to have services with dependencies? Forget init^H^H^H^Hsystemd, try Docker Compose! Just kidding, I mean Docker Swarm! Just kidding, I mean Kubernetes! Actually you kind of want all of them maybe!

I don't think this is a weakness of the software itself; you can find simple ways to use containers if you try hard enough. But that difficulty really speaks to the motivations of the cloud services industry. Using Docker for simple problems is asking a car salesman for whatever will get you from A to B. Hey, buddy, you don't know what you're missing. Everyone thinks they want something simple until they see our amazing upgrade options. And what about safety? Think about what would happen to your family if you were caught in a tragic Byzantine Fault.

And, yes, absolutely, if you have Big Cloud Problems, the cost of finding the right cloud-native service discovery mesh is near-zero when amortised over thousands of servers worth of existential dread. But I just can't help but feel like I'm preparing for cloud scale like I'm preparing for a zombie apocalypse. It's an interesting problem space, and a great excuse to hang out on the tactical flashlight subreddit, but I'm not sure these tomorrow solutions actually help me with my today problems. The people selling the gear sure are doing well, though.

lkrubner · on Jan 14, 2019

And you have to go double-or-nothing at each stage, which is the real risk. Docker didn't solve everything? Try Kubernetes! But wait, Kubernetes didn't solve everything? Maybe try it with Ranch! Or give up and go to Mesos?

I've worked with startups that have invested more and more money in an attempt to get the assumed benefits of containers. The double-or-nothing aspect is what had me worried when I wrote "Docker is the dangerous gamble which we will regret":

http://www.smashcompany.com/technology/docker-is-a-dangerous...

JNRowe · on Jan 14, 2019

Great read about containers aside, I think the most interesting point for me is that `redo` is getting some love. 150 commits in the past couple of months after a six year hiatus, including some great documentation improvements.

--

Anyone familiar with the mentioned `bupdate` tool and zsync¹? I'd love to read a comparison, as I've only used the latter.

1. http://zsync.moria.org.uk/

apenwarr · on Jan 14, 2019

I'm the author of the article in question. I didn't know about zsync! Thanks, I'll link to it from the footnote.

Flipping quickly through the zsync docs, it looks very well done. I'm not sure if the way they adapted rsync is better or worse than the bup/bupdate way of doing it; it looks like they take more time to do the initial encoding of the index file, but that's not very important since you only do it once, especially if it saves bytes.

They also have a (complex and potentially error-prone) way to look into .gz files and sync them extremely efficiently even without gzip --rsyncable. That's really cool, but risky, and of course only works with gzip, not other compressors. Not sure if that's a good idea or a bad idea, but nobody forces you to use it.

tl;dr zsync has actual documentation and an actual release, so you should probably use it instead of bupdate.

(People should feel free to ask me any questions about redo, bupdate, the redo container builder, etc in the comments here if you like.)

voltagex_ · on Jan 14, 2019

I'm always interested in tools that can download only changed chunks, or repair damaged files without having to re-download the whole lot. Habit, after being on extremely slow connections for about 15 years.

See also:

* http://atterer.org/jigdo

* https://multipar.eu/

* I've abused webseed support in torrent clients to do similar things - a downloader based on the torrent chunk hashing format would be quite interesting

* rsync can do it, maybe

* http://xdelta.org/ - have used this to patch Raspberry Pi images

JNRowe · on Jan 14, 2019

> They also have a (complex and potentially error-prone) way to look into .gz files and sync them extremely efficiently even without gzip --rsyncable.

FWIW, the last release of `zsync` predates `gzip --rsyncable` by about six years¹.

> People should feel free to ask me any questions

"Thanks" isn't a question, but thanks for creating `redo` and the great articles. I always seem to come out of them having learnt something new, and often end up rabbit holed in the interesting side topics that are raised too.

1. http://git.savannah.gnu.org/cgit/gzip.git/tree/NEWS#n74

apenwarr · on Jan 14, 2019

You're welcome :)

The zsync docs clearly talk about "gzip --rsync", which I guess must have been an earlier version of "gzip --rsyncable".

zimbatm · on Jan 14, 2019

Balena seems to have a developed a docker-native solution; not sure how it's implemented though.

https://www.balena.io/docs/learn/deploy/delta/

zimbatm · on Jan 14, 2019

The only good thing about the Dockerfile format is that most developers can write one easily. Here are a list of a few issues:

* Dependencies are best represented as a tree. Dockerfile forces to linearize that tree.

* It's not possible to compose two or more Dockerfile together.

* If `COPY` or `ADD` instructions are being used, all the files are being send by the client to the daemon, including timestamp and UID. This breaks caching badly as two different users would not produce different images.

* In general Dockefile are not bit-reproducible; two developers building the same Dockerfile will get a different output.

The underlying v2 image format has content-addressable layers which is great. It means that in theory it's possible to upgrade the base OS layer without rebuilding the rest.

PS: Nix's dockerTool.buildLayeredImages fixes all those issues and can take advantage of the CAS format.

0xbadcafebee · on Jan 14, 2019

Wow, I had no idea about gzip --rsyncable. I haven't used rsync in years, so I guess it never became necessary. (Also, I stopped using gzip years ago...)

ryanpetrich · on Jan 14, 2019

Odd that apt-get was mentioned as a tool that could use incremental downloads: it has supported them since 2006.

apenwarr · on Jan 14, 2019

As far as I know, it does this by having a base file and a bunch of delta files alongside it on the server. Applying these deltas is time consuming, you have to choose how many levels of delta to keep around, and you end up downloading a lot of redundant content (ie. information about old packages, only to replace it with the delta content).

bupdate avoids that by just letting you post only the latest file (and .fidx), then the client figures out which blocks it doesn't yet have.

voltagex_ · on Jan 14, 2019

are you talking about pdiff for list files, or debdelta (http://debdelta.debian.net/html/index.html)?

I've never seen debdelta used in the wild, but I know Fedora's delta RPMs work quite well.

je42 · on Jan 16, 2019

mmh. i wonder why he doesn't mention multi-staged docker builds. they are clearly more limited than redo, but for small + simple dependency trees they are pretty okish.