Thoughts on Rust bloat (2019)

raphlinus · on March 29, 2021

This was written a year and a half ago, but I think holds up pretty well. Some thoughts.

First, one lesson is that the exact wording in headlines matters. Many people interpreted the essay to say "Rust has a bloat problem," which is not what I was trying to say. Bloat is unfortunately a super common problem in the modern world, and Rust is not immune. But Rust also gives the end developer a lot of control, and is capable of producing wicked-lean programs (at some cost) if that's the goal. The point of the essay was to give practical advice on how to reduce compile times and executable size. From that perspective, publishing the essay was a pretty big success, as it immediately provoked improvements in the gfx-hal project.

Serialization continues to be a problem. Now, something like half the compile time and executable size for Runebender is serialization for the XML-based UFO file format for representing fonts. I think a better solution to serialization might make a good project for a smart, motivated person, but it's pretty far away from current academic fads, and there's also this thing that a lot of people have tried and we're still not in a great place. What about serialization makes it so intractable? Could some kind of language support make it better?

Also, from followups in UI threads, the expectation for what constitutes "bloat" is very different for different people. The calculator example in Druid is now about 1.5M (release build, Windows), which I think is acceptable for most people, but I'm also aware that a simple calculator like that could be built in a couple hundred k or so without too much problem, and I'm sure much less if you start using demoscene style techniques. But I also don't want to focus on that too much - I don't want tiny calculator demos, I want rich applications that work in all languages (including input methods) and also play nicely with accessibility tools such as screen readers. That can't be, and shouldn't be, a tiny little executable.

wtetzner · on March 29, 2021

> What about serialization makes it so intractable? Could some kind of language support make it better?

I've thought about serialization a lot over the years, and I think a big part of the problem is that we try to map our programming language data types directly to serialized types (e.g. mapping JSON objects to Rust structs). That can be useful, but I think often the serialized representation is sufficiently different from the desired data type that it's not good enough. So we end up with one set of data types for serialization/deserialization, and a bunch of code that maps between those and the desired representation.

I wonder if a system similar to Scheme's `syntax-rules` (or Rust's `macro_rules!`) for mapping the serialized data into the program's data types (and vice-versa) would be a nice mechanism.

zackbrown · on March 29, 2021

> I think a big part of the problem is that we try to map our programming language data types directly to serialized types

Much of the time, in-program data is relational with a possibly-cyclic graph describing relationships, ownership, etc. Systems that work with RDBMSes do a fine job of de/serializing data for relational stores, and ORMs are a powerful and delightful way to do this.

The problem with JSON and XML for these kinds of data is these formats are inherently non-relational. They are simple trees of data, not arbitrary graphs. In practice, e.g. in every full-stack web codebase I've ever seen, JSON blobs are hacked together e.g. by doing a makeshift front-end "join by ID" when needed — or adjusting a back-end method to perform the join and return — wait for it — a flattened JSON blob.

It seems like we need a "JSON" or "XML" where relationships are a first-class citizen. To my knowledge, the closest we've come to this is when the data broker and format are tightly coupled, e.g. with SOAP or GraphQL. Can this be achieved strictly declaratively, in a format that's better than a SQL data dump? Maybe!

> I wonder if a system similar to Scheme's `syntax-rules` (or Rust's `macro_rules!`) for mapping the serialized data into the program's data types (and vice-versa) would be a nice mechanism.

Does `serde`[1] look like it's close to your imagined mark? I haven't tried it personally, but it looks akin to Java's `Serializable` or C#'s `ISerializable`— all reasonably elegant ways of handling the messy binding logic of serialization over arbitrary formats.

[1] https://serde.rs/

wtetzner · on March 30, 2021

> and ORMs are a powerful and delightful way to do this

I think we'll end up disagreeing there.

However, I think this type of model could maybe work better for "static" serialization/deserialization of file formats. But even so, I think the biggest part that's missing is a convenient way to describe the mapping between the structure of the serialized data and the internal representation. ORMs typically give you a single method of doing that mapping, which can work when you have control over the serialized representation. But it doesn't work very well when you're serializing to an existing file format, for example.

> Does `serde`[1] look like it's close to your imagined mark?

No, serde is exactly what I was talking about in terms of the serialized data matching your internal data structures. I think it's a necessary piece, but I think the part that's missing is a nice way to map that data to the representation your program want's to work on.

mwcampbell · on March 29, 2021

> I want rich applications that work in all languages (including input methods) and also play nicely with accessibility tools such as screen readers. That can't be, and shouldn't be, a tiny little executable.

Apple historically did a pretty good job of solving this by including a rich GUI framework in the OS. Even now, from what I've heard, that approach seems to work pretty well for Mac-only apps. But then, in your xi retrospective, you said the "native" GUI wasn't all that great even on Mac. And now, except for the indie Mac crowd, developers are generally expected to target multiple platforms. So now, I guess applications have to carry with them something that was formerly expected to be part of the OS.

jcranmer · on March 29, 2021

> What about serialization makes it so intractable? Could some kind of language support make it better?

I think the main problem with serialization is that it is used to mean several different things which at first glance could all be handled by one system but experience instead shows are just too different to be handled in that manner.

One kind of serialization is handling something that is mostly "bit blitting." I've got a struct in memory, it needs to be copied to disk to be read back into memory. Maybe you need to do some normalization for things like endianness or charset conversion.

A different kind of serialization is when you're trying to read/write a standard-specified binary layout. This sounds like it's the same thing as before, but the difference here is that the structs are not readily blittable. You might have cases like x86's definitions, where fields are splayed out in different locations (zmm0 is spread out across three locations in the XSAVE struct). Or you might have the ... interesting features of ASN.1's DER format. Handling these kinds of cases requires at the same time as a more blittable struct requires adding lots of customization points.

A third situation is where you're reading in something like JSON or XML: now you have symbolic field names that can occur in any order, and your serialization code needs to account for unexpected orders. Customization points really kick in here, because now you also usually want to have default values and optional values and maybe conditionally required (this field is required only if some other field has a particular value).

On top of these, you also start to conflate orthogonal issues of validation and conversion. Are the structs in memory meant to be close to on-disk representations and clients responsible for handling data representation issues (consider the way the object crate handles endianness in ELF files)? Where, and when, does validation of parsing data happen? How much of this is automatable from the serialization framework itself (e.g., the value must have values in such-and-such range)? What about supporting multiple versions of save data--where is the ground truth about how to upconvert (or maybe even downconvert!) old versions to the current library version?

This is a situation where you can build something pretty tight for a single use case, but feature scope quickly comes in when it comes to extending it for multiple use cases, to the point that it's too unwieldy if you're not using any advanced features and the advanced features are too advanced to use.

As for what language support makes it better, the obvious answer is reflection. Having written my own auto-derive-serialization crate, what I mostly need is a) given this struct, give me a list of fields, their names, types, and attributes, b) let me do specialization on types, and c) let me turn that into writing some pretty unconstrained code. Rust's current macro system mostly falls down on the type system, although the need to include a hefty parsing quote to reparse everything to get the field lists is pretty poor (not to mention actually handling the corner cases in Rust's type system is challenging, although serialization code is likely to screw up here anyways). There's probably room to specify a MVP to make implementing serialization easier that wouldn't need to go through syn.

freeopinion · on March 29, 2021

This is a great place to explore a recent line of thought I have had. I'd love to hear your thoughts on sqlite as a file format. Just hypothetically, if Runebender ditched xml and used sqlite files instead, what would that do to serialization? What other bloat implications would it have? What value would you lose by going away from xml?

raphlinus · on March 29, 2021

It's something we thought about, but I think it's somewhat orthogonal to the question of serialization - you still need code to go back and forth between the representation in the database and the native data structures.

See https://github.com/simoncozens/babelfont/issues/10 for a deeper discussion of issues around a file format for a font editor. One of the biggest current pain points is dealing with merges in version control (usually git). I think there are ways of dealing with that in an sqlite-based file format, but I worry it would require some custom tooling, especially to make sure invariants aren't violated. Another interesting example is LibrePCB (see https://www.youtube.com/watch?v=vu-h5y6tK34) who devised their own text based format specifically to make version control more friendly.

freeopinion · on March 29, 2021

Focusing strictly on compile time and executable size, I'm curious whether sqlite-as-a-file-format would help or hurt. Of course you still have to move from file to native structures. I just wonder how all the overhead of sqlite compares to whatever xml libs you are using. It seems that both xml and sql tables could be modeled very closely to your native structure. So the question is which approach compiles more efficiently.

pmarin · on March 29, 2021

Alternative text base format was very common in the past because the standard binary format used to be machine dependent for speed (almost a direct dump of the structures in memory).

raphlinus · on March 29, 2021

Right. Dumping internal memory representations is one point in the spectrum, with extremely appealing properties for compile time, code size, and run time (basically no cost), but other serious problems, including portability (32 bits or 64 bits for pointers) and security. Many of the zero-copy serialization formats (Flatbuffers, Cap'n Proto, FIDL) are inspired by this and to some extent have the property that a "plain ol' data" struct might be serialized by a memcpy of the C representation, but try to improve in the other dimensions.

That said, I don't think dealing with the byte representation is the real problem. Projects like simdjson show that you can convert in and out of JSON very fast. The challenge to bloat specifically is getting those converted to the data types of the application. Many serialization approaches, including serde, generate a goodly amount of code for each data type, and this adds up.

swsieber · on March 29, 2021

Did you ever take a look at miniserde? It's pretty much just an experiment (e.g. no support for enums), but it seems like the general approach it takes would cut down the amount of code per type.

[0] https://github.com/dtolnay/miniserde

raphlinus · on March 29, 2021

Yes. I think the characterization in TFA is still valid.

swsieber · on March 29, 2021

Oh, doh. It's been a while since I read this and forgot you mentioned it there.

eximius · on March 30, 2021

Could you wrap the crate mentioned into wasm like watt does for proc macros?

Obviously not ideal...

kzrdude · on March 29, 2021

Minor, but rand was mentioned in the original post and it has improved a lot.

dralley · on March 29, 2021

>Now, something like half the compile time and executable size for Runebender is serialization for the XML-based UFO file format for representing fonts.

Using which library? xml-rs? quick-xml? serde + quick-xml?

raphlinus · on March 29, 2021

A large offender is the fontinfo[1] struct, which is serialized with serde and quick-xml. Other parts (the glyphs etc) are done with quick-xml and handwritten code.

[1]: https://github.com/linebender/norad/blob/master/src/fontinfo...

jerf · on March 29, 2021

The article does mention this a bit, but I would suggest to people to be a bit more careful about accusing everything of executable-size-bloat. We have the mentioned Unicode tables. Can you cut them? Sure, but now you've got a binary that only works in English and very-English like languages for no reason. We use 64-bit stuff for good reason, because we routinely blow out 32-bit limits, but that's 8 bytes for all sorts of things that used to be 4, pervasively throughout the code. We have localization support. We have some sort of async runtime or memory managing runtime or other thing like that. We have all sorts of things like this. If you sit down and take an inventory of what's going on, the minimum size for some sort of "serious" executable is still going to be quite a lot.

(Could you move some of this stuff out of the executable? Yeah, but that really only addresses the immediately visible executable size and comes with tradeoffs of its own... it doesn't mean the executable is "smaller", it just means it's split into pieces.)

It's a bit of a pipe dream now to expect some sort of "serious" executable (one that actually has all these concerns) to fit into half-a-megabyte anymore, just because Commodore 64 programs used to fit into a hundred kilobytes.

flohofwoe · on March 29, 2021

100 KBytes is quite bloaty for a C64 program and wouldn't even fit into RAM in one piece ;)

But anyway... wouldn't the better and "obvious" solution be to move foundational things like the above mentioned UNICODE tables into operating systems (if they're not already provided) instead of shipping them with every application?

After all, what's the point of operating systems if they don't provide commonly needed services (and I wonder if such solutions had even been considered for said Rust libraries).

PS: One situation where binary size matters a lot is WASM running in browsers. It's not just download bandwidth, but also the time it takes to compile the executable. We're now back at WASM binaries stuttering on startup for a few seconds because the JIT needs to "warm up". For a very short time, when WASM was AOT compiled, this wasn't an issue. But it became an issue as people became careless about binary size and just-in-time compilation had to be reintroduced to cope with massively big WASM binaries.

gpm · on March 29, 2021

> what's the point of operating systems if they don't provide commonly needed services

To abstract over hardware, including to allow running multiple programs simultaneously on the same hardware.

I'd argue that there's pretty widespread agreement in how we actually deploy modern systems when given the choice, that this is all we want the operating system to do too. There's all these operating systems that try to do more (e.g. approximately every linux distribution ever), and yet modern software primarily runs in some sort of sandbox who doesn't allow any of that to seep into the environment. On servers that sandbox looks like "containers" or less frequently "jails". On clients that sandbox looks like "a browser" or "electron".

Empirically the idea of shipping commonly used dependencies with the hardware abstraction layer, seems to have been the wrong idea. Even if you disagree with that, certainly it's not the case that it's the only point of shipping the hardware abstraction layer (operating system) in the first place.

flohofwoe · on March 29, 2021

That's a fairly "mainframe era" definition of an operating system though ;) Let me reformulate: If an "operating system" or any other sort of application runtime (like a web browser) needs to handle UNICODE strings anyway in its UI layer, doesn't it make sense to expose an UNICODE API to the applications because all the code and data exists anyway in the OS.

This is one of my main gripes with web browsers: They have excellent text rendering, but there are no APIs which expose the text rendering capabilities to APIs like WebGL.

gpm · on March 29, 2021

Now you're getting into nuanced answers anyways

Does it make sense to expose them. Yes, probably, if there's no cost to doing so (which is approximately what you said when you says "needs to handle unicode strings anyways").

Does it make sense to use the exposed dependency? That depends

- What is the cost of including an extra copy (high in wasm, low in server side containers)

- [Edit: Added] What is the cost of using the system copy (typically low, but non 0 performance impact)

- What set of symbols is more likely to be outdated?

- How many times are you going to run into issues where you are missing the dependency in the runtime?

- How expensive is it going to be to implement a different solution for every different runtime that exposes it in a different way?

- How likely is the runtimes API ever to change in a backwards incompatible way?

Most of the time, for things that are not downloaded just when the user tries to run them (i.e. websites), I think it doesn't make sense to use the exposed dependency, even if it exists.

flohofwoe · on March 29, 2021

Agreed. As always, the only correct answer is "it depends".

But even considering those questions is probably much more effort than most application developers put into selecting their dependencies, unfortunately.

There's also the question of scale.

Currently, whether your calculator.exe is 50 KB or 5 MB doesn't matter much, trimming down the size towards 50 KB is a nice exercise and deserves praise, but is probably not even noticeable by users.

50 MBytes for a calculator deserves raised eyebrows, something must be wrong which should be investigated. 500 MB would be unacceptable IMHO. Even if users wouldn't complain, I would be deeply ashamed to produce such a turd of a program ;)

setpatchaddress · on March 29, 2021

Yes, it depends in a very absolute sense, but for most cases, efficient use of memory is a hugely important factor in a large software system for performance. It’s worth doing an abstraction layer. ICU has a stable API, to the best of my knowledge.

throwaway894345 · on March 29, 2021

> After all, what's the point of operating systems if they don't provide commonly needed services

My answer is "managing access to shared resources"--the primary role of an OS is to allow multiple programs to run on the same hardware without needing to be aware of each other. I specifically don't think the OS should be a kitchen sink for common utilities / services. I would actually prefer fewer things managed by the OS (microkernel if not unikernel). In my utopia, the line between "OS" and "Hypervisor" would be very thin/blurry.

If we really want to be space-conscious, I think the answer is dynamic linkage, although I think people have forgotten the horror that is dependency hell. Nix conceptually provides an answer for dependency hell, but in my experience there remains quite a chasm between what Nix promises or aspires towards and what it actually delivers. For the time being, I have to think that static binaries are overwhelmingly the most economical solution when you account for time spent mucking with dependencies.

jerf · on March 29, 2021

"100 KBytes is quite bloaty for a C64 program and wouldn't even fit into RAM in one piece ;)"

I was thinking of the numerous games that involving loading from floppy disks quite routinely. There's a ton of C64 programs over what fit in RAM at once. Cutting it down to what fits in RAM itself just seemed a bit harsh since even the C64 back in the day wasn't actually that limited.

Also, possibly you wrote your reply after I edited my comment about the ability to move pieces out, but I've seen people still whine about the "bloat" programs have from the sum of their linked libraries. My point is more that a modern program really does get into the dozens of megabytes fairly quickly on modern systems, for what are generally agreed to be good reasons... until we all start talking about "bloat", after which all sense is thrown out the window.

If you honestly sit down and look at how many numbers it takes to be a modern program, it gets large fast.

jandrese · on March 29, 2021

Does this mean you need to ship a new Rust binary for every application you build when the Unicode consortium updates the tables?

PopsiclePete · on March 29, 2021

If your application needs support for new emojis or some dead language, then yes.

Jasper_ · on March 29, 2021

Operating systems like Windows and Mac OS X do ship those tables, but there is no Rust crate to use them directly. It's easier to include them once rather than use a different API per operating system.

setpatchaddress · on March 29, 2021

Unicode tables are gigantic. If it’s possible to build an abstraction layer, it should be done.

raphlinus · on March 29, 2021

The ICU4X project[꧁] is working on some of those goals, and they're also using Rust as a primary implementation language.

[꧁]: https://github.com/unicode-org/icu4x

setr · on March 29, 2021

Unrelated but wow that’s a nice flourish character

꧁ TEXT ꧂

andoriyu · on March 29, 2021

You know, I agree that shipping the same damm Unicode table over and over is stupid.

However, if they were part of the OS sooner rather than later, certain OS vendors would have found a way to make them incompatible or distribute them separately and make the user explicitly install them. (if you play in Windows, how many times and how many versions of VC++ redistributable you had to install?)

I remember there were projects where I opted-in into writing my own parser (with Pest or nom) just to avoid depending on regex crate. Which is usually the biggest dependency in my projects.

agumonkey · on March 29, 2021

Reminds me of the articles about text processing in haskell vs C. haskell wasn't as lean but it was more general (full unicode support). There was no mystery in the size difference.

kvark · on March 29, 2021

We need more downstream projects like this putting pressure on the library authors to cut down their dependency trees as well as generic usage.

Perhaps, we need better tools for this. Take `cargo-bloat` for example: if library A makes a generic type from library B to monomorphize, would the build time reflected in `A`? It would be good to have something telling us: this API has a build complexity metric of 1.23 - something we could feature and/or track on CI.

throwaway894345 · on March 29, 2021

I think "better tools" would be a good way to go. In general I think we need to make these things that we care about (performance, binary size, compilation time, etc) more visible and quantifiable and to allow us to easily root cause them. I think we need to make profiling these things dead simple and fast--with respect to performance, the time to get a flamegraph should be nearly zero, and the flamegraph's UI should be optimized (e.g., the ability to "peek" at a function's definition directly from the flamegraph). We need similarly easy ways to trace compilation time and code bloat to their causes in the code, perhaps even with automated suggestions for improvements.

pjmlp · on March 29, 2021

This is mostly an issue because cargo doesn't handle binary libraries.

wyldfire · on March 29, 2021

Previous HN discussion: https://news.ycombinator.com/item?id=20761449

fmakunbound · on March 29, 2021

> proc macros. The support crates for these (syn and quote) take maybe 10s to compile

Common Lisp user here. What's up with Rust where macros take 10s to compile? I'm keen on seeing macros in other languages, but that's kinda brutal from a usability perspective.

IshKebab · on March 29, 2021

It takes 10s to compile the crate that defines the macros, it doesn't take 10s to evaluate them.

And the reason they're so slow to compile is because they're compiled to native code, and you aren't given a parser, just a list of tokens, so basically all of them depend on some extra parsing crates.

fmakunbound · on April 3, 2021

> It takes 10s to compile the crate that defines the macros

Brutal.

> they're so slow to compile is because they're compiled to native code

I guess that applies just to Rust

diragon · on March 29, 2021

Compiling Common Lisp took 10 seconds or more as well back when it was last used. ;-)

slmjkdbtl · on March 29, 2021

    "but I’ve found that the rust-objc macros produce quite bloated code, on the order of 1.5k per method invocation"

this is horror story, is dynamic ObjC even a valid approach over ObjC-in-C FFI? I hear it's quite slow / bloated in other languages' bindings too

comex · on March 29, 2021

Either there's something wrong with their measurement or the issue has been fixed. I tried modifying rust-objc's examples/example.rs to add 1000 method invocations, each with a unique selector:

    let hash123: usize = msg_send![*obj, hash123];

Comparing the built version with and without the addition, on x86-64 macOS, each message send adds, on average, 52 bytes of code (suboptimal but not terribly unreasonable) and 8 bytes to store the selector name itself (including nul terminator). That's it. Plus 113 bytes for the symbol table entry, but that goes away if you strip the binary.

It's possible that the bloat comes from something else, perhaps class declarations.

raphlinus · on March 29, 2021

I think it got better, but unfortunately I never filed a bug documenting the bloat, so it's hard to track down what changed, or if I was simply wrong. My recollection is that most of the bloat was a per-invocation copy of the panic code, including formatting of the error message.

Class declarations are still heavy, cargo-bloat on the current druid-shell library is showing about 900 bytes per add_method, which can add up but is unlikely to be a major source of bloat.

baby · on March 29, 2021

> I’m about to accept a PR that will increase druid’s compile time about 3x and its executable size almost 2x

I looked at the PR and it is a mere 700 LOC. I agree that compile time are an awful thing in Rust, but there are big glaring issues as well: LOC expansion and dependencies expansion.

richardwhiuk · on March 29, 2021

This change:

  [dependencies]
  fluent-bundle = "0.7"
  fluent-locale = "0.7"
  fluent-syntax = "0.9"
  unic-langid = "0.4.1"

and the corresponding change in Cargo.lock is the reason.

I'm not sure about how you are expecting dependencies to be handled?

baby · on March 30, 2021

Yeah, that's the issue. Dependencies growth is absurd in Rust.

> I'm not sure about how you are expecting dependencies to be handled?

I'll give you my point of view: a better standard library.

richardwhiuk · on April 1, 2021

That just mean every binary needs to ship everything, even stuff they don't use.