Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hackers exploit any avenue (and usually come in through the basement!), regardless of how skeptical we might be that they won't. They don't need the details, they'll figure it out. You give them a scrap and they'll get the rest. It's a different way of thinking that we're not used to, and don't understand unless we're exposed to it first-hand, e.g. through red-teaming.

For example, another way to think of this is that you have a buffer of initialized memory, containing a view onto some piece of data, from which you serve a subset to the user, but you get the format of the subset wrong, so that parts of the view leak through. That's a bleed.

Depending on the context, the bleed may be enough or it might be less severe, but the slightest semantic gap can be chained and built up into something major. Even if it takes 6 chained hoops to jump through, that's a low bar for a determined attacker.



> For example, another way to think of this is that you have a buffer of initialized memory (no unsafe), containing a view onto some piece of data, from which you serve a subset to the user, but you get the format of the subset wrong, so that parts of the view leak through. That's a bleed.

If there's full initialization then this is just a logic error, no? Apart from some kind of capability typing over ranges of bytes (not very ergonomic), this would be a very difficult subtype of "bleed" to statically describe, much less prevent.


Yes, exactly. That's what I was driving at. It's just a logic error, that leaks sensitive information, by virtue of leaking the wrong information. File formats in particular can make this difficult to get right. For example, the ZIP file format (that I have at least some experience with bleeds in) has at least 9 different places where a bleed might happen, and this can depend on things like: whether files are added incrementally to the archive, the type of string encoding used for file names in the archive etc.


Makes sense! My colleagues work on some research[1] that's intended to be the counterpart to this: identifying which subset of a format parser is actually activated by a corpus of inputs, and automatically generating a subset parser that only accepts those inputs.

I think you mentioned WUFFS before the edit; I find that approach very promising!

[1]: https://www.darpa.mil/program/safe-documents


Thanks! Yes, I did mention WUFFS before the edit, but then figured I could make it a bit more detailed. WUFFS is great.

The SafeDocs program and approach looks incredible. Installing tools like this at border gateways for SMTP servers, or as a front line defense before vulnerable AV engine parsers (as Pure is intended to be used), could make such a massive dent against malware and zero days.


Thanks for the explanation. I would consider that type of logic error more or less impossible to defend at the language level, but I can see how analysis tools can be helpful.


Thanks for the question.

"I would consider that type of logic error more or less impossible to defend at the language level, but I can see how analysis tools can be helpful."

This is how I see it too, as a logic error that exposes sensitive memory in a way that can be as unsafe as a UAF. It's a great leveler of languages, and why I get excited about enabling checked arithmetic by default in safe builds, explicit control flow, and minimizing complexity, abstractions and dependencies—this all helps to reduce the probability of semantic gaps and leaks.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: