Very interesting. AFAIK the kernel explicitly gives consume semantics to read_once (and in fact it is not just a compiler barrier on alpha), so technically lowering it to a relaxed operation is wrong.
Does rust have or need the equivalent of std::memory_order_consume? Famously this was deemed unimplementable in C++.
right, so I would expect that the equivalent of READ_ONCE is converted to an acquire in rust, even if slightly pessimal.
But the article says that the suggestion is to convert them to relaxed loads. Is the expectation to YOLO it and hope that the compiler doesn't break control and data dependencies?
It is cheaper on ARM and POWER. But I'm not sure it is always safe. The standard has very complex rules for consume to make sure that the compiler didn't break the dependencies.
edit: and those rules where so complex that compilers decided where not implementable or not worth it.
The rules were there to explain what optimizations remained possible. Here no optimization is possible at the compiler level, and only the processor retains freedom because we know it won't use it.
It is nasty, but it's very similar to how Linux does it (volatile read + __asm__("") compiler barrier).
This is still unsound (in both C and Rust), because the compiler can break data dependencies by e.g. replacing a value with a different value known to be equal to it. A compiler barrier doesn't prevent this. (Neither would a hardware barrier, but with a hardware barrier it doesn't matter if data dependencies are broken.) The difficulty of ensuring the compiler will never break data dependencies is why compilers never properly implemented consume. Yet at the same time, this kind of optimization is actually very rare in non-pathological code, which is why Linux has been able to get away with assuming it won't happen.
In principle a compiler could convert the data dependency into to a control dependency (for example, after PGO after checking against the most likely value), and those are fairly fragile.
I guess in practice mainstream compilers do not do it and relaxed+signal fence works for now, but the fact that compilers have been reluctant to use it to implement consume means that they are reluctant to commit to it.
In any case I think you work on GCC, so you probably know the details better than me.
edit: it seems that ARM specifically does not respect control dependencies. But I might misreading the MM.
It's a persistent misunderstanding that release-consume is about Alpha. It's not; in fact, Alpha is one of the few architectures where release-consume doesn't help.
In a TSO architecture like x86 or SPARC, every "regular" memory load/store is effectively a release/acquire by default. Using release/consume or relaxed provides no extra speedup on these architectures. In weak memory models, you need to add in acquire barriers to get release/acquire architectures. But also, most weak memory models have a basic rule that a data-dependent load has an implicit ordering dependency on the values that computed it (most notably, loading *p has an implicit dependency on p).
The goal of release/consume is to be able to avoid having an acquire fence if you have only those dependencies--to promote a hardware data dependency semantic rule to a language-level semantic rule. For Alpha's ultra-weak model, you still need the acquire fence in this mode, it doesn't help Alpha one whit. Unfortunately, for various reasons, no one has been able to work out a language-level semantics for consume that compilers are willing to implement (preserving data dependencies through optimizations is a lot more difficult than it appears), so all compilers have remapped consume to acquire, making it useless.
consume is trivial on alpha, it is the same as acquire (always needs a #LoadLoad). It is also the same as acquire (and relaxed) on x86 and SPARC (a plain load, #LoadLoad is always implied).
The only place where consume matters is on relaxed but not too relaxed architectures like ARM and POWER, where consume relies on the implicit #LoadLoad of controls and data dependencies.
Indeed. On the other hand recently ARM has added explicit load acquires primitives which are relatively cheap, so converting a consume to an acquire is not a big loss (and Linus considered doing it for the kernel a while ago just to avoid having to think too hard about compiler optimizations).
Sure, but not all compilers are created equal and are going to go to the same lengths of analysis to discover optimization opportunities, or to have the same quality of code generation for that matter.
It might be interesting to compare LLVM generated code (at same/maximum optimization level) for Rust vs C, which would remove optimizer LOE as a factor and more isolate difficulties/opportunities caused by the respective languages.
The Rust version of this is "turn .iter() into .par_iter()."
It's also true that for both, it's not always as easy as "just make the for loop parallel." Stylo is significantly more complex than that.
> to this I sigh in chrome.
I'm actually a Chrome user. Does Chrome do what Stylo does? I didn't think it did, but I also haven't really paid attention to the internals of any browsers in the last few years.
Concurrency is easy by default. The hard part is when you are trying to be clever.
You write concurrent code in Rust pretty much in the same way as you would write it in OpenMP, but with some extra syntax. Rust catches some mistakes automatically, but it also forces you to do some extra work. For example, you often have to wrap shared data in Arc when you convert single-threaded code to use multiple threads. And some common patterns are not easily available due to the limited ownership model. For example, you can't get mutable references to items in a shared container by thread id or loop iteration.
> For example, you can't get mutable references to items in a shared container by thread id or loop iteration.
This would be a good candidate for a specialised container that internally used unsafe. Well, thread id at least; since the user of an API doesn't provide it, you could mark the API safe, since you wouldn't have to worry about incorrect inputs.
Loop iteration would be an input to the API, so you'd mark the API unsafe.
The biggest difference between the UK and other constitutional countries is that parliament power is pretty much absolute and it is not bound by any document or pre-existing law.
In theory at least. In practice the courts have hinted that there are limits even for the parliament, and if it were to overstep some unwritten rules, it would cause a constitutional crisis.
It could do the "what I have for you today ..."routine though!
reply