Hackers exploit any avenue (and usually come in through the basement!), regardle...

woodruffw · on June 23, 2022

> For example, another way to think of this is that you have a buffer of initialized memory (no unsafe), containing a view onto some piece of data, from which you serve a subset to the user, but you get the format of the subset wrong, so that parts of the view leak through. That's a bleed.

If there's full initialization then this is just a logic error, no? Apart from some kind of capability typing over ranges of bytes (not very ergonomic), this would be a very difficult subtype of "bleed" to statically describe, much less prevent.

_vvhw · on June 23, 2022

Yes, exactly. That's what I was driving at. It's just a logic error, that leaks sensitive information, by virtue of leaking the wrong information. File formats in particular can make this difficult to get right. For example, the ZIP file format (that I have at least some experience with bleeds in) has at least 9 different places where a bleed might happen, and this can depend on things like: whether files are added incrementally to the archive, the type of string encoding used for file names in the archive etc.

woodruffw · on June 23, 2022

Makes sense! My colleagues work on some research[1] that's intended to be the counterpart to this: identifying which subset of a format parser is actually activated by a corpus of inputs, and automatically generating a subset parser that only accepts those inputs.

I think you mentioned WUFFS before the edit; I find that approach very promising!

[1]: https://www.darpa.mil/program/safe-documents

_vvhw · on June 23, 2022

Thanks! Yes, I did mention WUFFS before the edit, but then figured I could make it a bit more detailed. WUFFS is great.

The SafeDocs program and approach looks incredible. Installing tools like this at border gateways for SMTP servers, or as a front line defense before vulnerable AV engine parsers (as Pure is intended to be used), could make such a massive dent against malware and zero days.

raphlinus · on June 23, 2022

Thanks for the explanation. I would consider that type of logic error more or less impossible to defend at the language level, but I can see how analysis tools can be helpful.

_vvhw · on June 24, 2022

Thanks for the question.

"I would consider that type of logic error more or less impossible to defend at the language level, but I can see how analysis tools can be helpful."

This is how I see it too, as a logic error that exposes sensitive memory in a way that can be as unsafe as a UAF. It's a great leveler of languages, and why I get excited about enabling checked arithmetic by default in safe builds, explicit control flow, and minimizing complexity, abstractions and dependencies—this all helps to reduce the probability of semantic gaps and leaks.