Linux Container Internals

kragniz · on June 5, 2017

Writing the basics of a container runtime is easier than it sounds. Last summer I was curious how they work and wrote something simple in python that can run docker images:

https://github.com/kragniz/omochabako/blob/master/omochabako

https://asciinema.org/a/77296?speed=2&autoplay=true

I learned a lot doing this, and I'd recommend it to anyone who's interested about containers.

pooktrain · on June 5, 2017

Thanks for sharing!

When you set out to do this, had you studied docker's source code at all? Or did you just have a basic understanding of containers? Other than the link from OP, are there any resources you'd recommend to get one to the point where you have enough understanding of the concepts without having to "cheat" and look at the docker implementation?

I want to do this too, but it's not as much fun if you need to go to the source due to not understanding the fundamentals.

kragniz · on June 6, 2017

I started with a basic understanding about the parts involved, but not so much how they fit together. Most of the necessary information came from the lwn series of articles posted in another comment: https://lwn.net/Articles/531114/

The actual namespace stuff was easy, the harder part was pivoting the root fs and figuring out all the things to mount. At some point I looked at the source for systemd-nspawn, but I forget exactly what for.

bogomipz · on June 6, 2017

Thanks for the links. What was your starting point the Docker/Golang source?

kragniz · on June 6, 2017

I avoided looking at docker/runc as a starting point since that would spoil the fun of trying to figure it out from first principles. I looked at systemd-nspawn when I was stuck adding mounts.

bogomipz · on June 6, 2017

Thanks, any other references you would recommend?

tyingq · on June 5, 2017

A great resource to understand how Linux containers work is "Linux containers in 500 lines of code": https://blog.lizzie.io/linux-containers-in-500-loc.html

Or, if you just want to skip to the code: https://blog.lizzie.io/linux-containers-in-500-loc/contained...

liaoyw · on June 6, 2017

bocker(https://github.com/p8952/bocker) is also very good for understanding containers

corbet · on June 5, 2017

If you want more information on how Linux namespaces work, there's an extensive series of articles on LWN at https://lwn.net/Articles/531114/

dankohn1 · on June 6, 2017

For a very high level overview, I really like this essay:

You Could Have Invented Container Runtimes: An Explanatory Fantasy

https://medium.com/@gtrevorjay/you-could-have-invented-conta...

jbb67 · on June 5, 2017

Which language is the sample code written in? Looks.... awful.

deathanatos · on June 6, 2017

There's some places where this example could be better written. For example, this:

  let oldrootfs = String::from(format!("{}/.oldrootfs", rootfs.clone()));

can be reduced to either of:

  let oldrootfs = format!("{}/.oldrootfs", rootfs);
  let oldrootfs = rootfs.clone() + "/.oldrootfs";

You could also do something like

  Path::new(rootfs).join(".oldrootfs")

but I'm not entirely sure how to get that to a pointer for the FFI stuff. It seems like the smartest way would be to go through OsStr, and if one wanted to use Path instead (which seems like the appropriate type), then sys_pivot_root should probably be changed to accept them instead.

A lot of the complexity here is that you're interfacing with C, which is inherently unsafe, and cdecl is very simple in what can be passed. (And stuff like POSIX file paths are just hard to statically type around, because they're not text strings.) Normally, you'd write some wrappers (which the original author is well on the way to), and the rest of the code should look much simpler.

Similarly here:

  create_dir(oldrootfs.clone());

The clone isn't needed; you can simply borrow oldrootfs:

  create_dir(&oldrootfs);

If you change rootfs in both pivot_root and sys_pivot_root to a &str, you can then call it as just

  pivot_root("/")

which is simpler than

  pivot_root(String::from("/"))

(I generally find that taking &str is simpler than String, if you're not going to modify the String object.)

archrabbit · on June 6, 2017

Sure, there is definitely a lot of possible improvements for the code examples, like avoiding `clone` on a heap allocated strings, etc. I had some really odd issues when passing a statically allocated strings to the `libc` functions (the function's arguments from the previous calls ended up concatenated in the later function's invocations, just use `strace` to observe that behavior).

mhh__ · on June 5, 2017

Rust?

simcop2387 · on June 5, 2017

Looking at the blog's source, yes rust (and the code looks like rust also).

https://github.com/rabbitstack/rabbitstack.github.io/blob/ma...

archrabbit · on June 5, 2017

it's Rust. Btw, did you see erlang or clojure? ;p

striking · on June 5, 2017

What's the point of writing your program in Rust if it's almost entirely wrapped in `unsafe{}`?

You'd be better off just writing a C program. It could even be clearer to a wider audience what exactly you're doing.

archrabbit · on June 5, 2017

I probably could wrap in the `unsafe` block just the invocations to the system calls. I'm learning Rust and I wanted to give the post some freshness. There are already a plethora of examples in C.

tiles · on June 6, 2017

That's a bizarre argument; the post is about writing an abstraction over another interface, and it's clearly meant to be extended. The abstraction can be written in a safer language than C. Seems like there's an obvious upside.

deathanatos · on June 6, 2017

The large unsafe call inside pivot_root is much larger than it needs to be. It only needs to encompass the mount() call in that function. It's a pretty trivial change to have it only wrap the if. (Though it could wrap less, but that would either look uglier or need a variable binding the result, which honestly wouldn't be that bad either.)

gtirloni · on June 6, 2017

With Rust you can at least have some certainty of where problems could happen. It gets boxed inside unsafe and affords extra scrutiny. You don't have to worry much about everything else surrounding you code.

Disclaimer: only went through basic tutorials and I don't program in Rust daily.