Hacker Newsnew | past | comments | ask | show | jobs | submit | gpderetta's commentslogin

Well no, as it doesn't rely on force feedback.

It could do the "what I have for you today ..."routine though!


Very interesting. AFAIK the kernel explicitly gives consume semantics to read_once (and in fact it is not just a compiler barrier on alpha), so technically lowering it to a relaxed operation is wrong.

Does rust have or need the equivalent of std::memory_order_consume? Famously this was deemed unimplementable in C++.


It wasn’t implemented for the same reason. Rust uses C++20 ordering.

right, so I would expect that the equivalent of READ_ONCE is converted to an acquire in rust, even if slightly pessimal.

But the article says that the suggestion is to convert them to relaxed loads. Is the expectation to YOLO it and hope that the compiler doesn't break control and data dependencies?


There is a yolo way that actually works, which would be to change it to a relaxed load followed by an acquire signal fence.

Is that any better than just using an acquire load?

It is cheaper on ARM and POWER. But I'm not sure it is always safe. The standard has very complex rules for consume to make sure that the compiler didn't break the dependencies.

edit: and those rules where so complex that compilers decided where not implementable or not worth it.


The rules were there to explain what optimizations remained possible. Here no optimization is possible at the compiler level, and only the processor retains freedom because we know it won't use it.

It is nasty, but it's very similar to how Linux does it (volatile read + __asm__("") compiler barrier).


This is still unsound (in both C and Rust), because the compiler can break data dependencies by e.g. replacing a value with a different value known to be equal to it. A compiler barrier doesn't prevent this. (Neither would a hardware barrier, but with a hardware barrier it doesn't matter if data dependencies are broken.) The difficulty of ensuring the compiler will never break data dependencies is why compilers never properly implemented consume. Yet at the same time, this kind of optimization is actually very rare in non-pathological code, which is why Linux has been able to get away with assuming it won't happen.

In principle a compiler could convert the data dependency into to a control dependency (for example, after PGO after checking against the most likely value), and those are fairly fragile.

I guess in practice mainstream compilers do not do it and relaxed+signal fence works for now, but the fact that compilers have been reluctant to use it to implement consume means that they are reluctant to commit to it.

In any case I think you work on GCC, so you probably know the details better than me.

edit: it seems that ARM specifically does not respect control dependencies. But I might misreading the MM.


C++20 actually [changed the semantics of consume](https://devblogs.microsoft.com/oldnewthing/20230427-00/?p=10...), but Rust doesn't include it. And last I remember compilers still treat it as acquire, so it's not worth the bytes it's stored in.

In the current drafts of C++ (I don't know which version it landed in), memory_order::consume is fully dead and listed as deprecated in the standard.

Does anything care about Alpha? The platform hasn't been sold in 20 years.

It's a persistent misunderstanding that release-consume is about Alpha. It's not; in fact, Alpha is one of the few architectures where release-consume doesn't help.

In a TSO architecture like x86 or SPARC, every "regular" memory load/store is effectively a release/acquire by default. Using release/consume or relaxed provides no extra speedup on these architectures. In weak memory models, you need to add in acquire barriers to get release/acquire architectures. But also, most weak memory models have a basic rule that a data-dependent load has an implicit ordering dependency on the values that computed it (most notably, loading *p has an implicit dependency on p).

The goal of release/consume is to be able to avoid having an acquire fence if you have only those dependencies--to promote a hardware data dependency semantic rule to a language-level semantic rule. For Alpha's ultra-weak model, you still need the acquire fence in this mode, it doesn't help Alpha one whit. Unfortunately, for various reasons, no one has been able to work out a language-level semantics for consume that compilers are willing to implement (preserving data dependencies through optimizations is a lot more difficult than it appears), so all compilers have remapped consume to acquire, making it useless.


consume is trivial on alpha, it is the same as acquire (always needs a #LoadLoad). It is also the same as acquire (and relaxed) on x86 and SPARC (a plain load, #LoadLoad is always implied).

The only place where consume matters is on relaxed but not too relaxed architectures like ARM and POWER, where consume relies on the implicit #LoadLoad of controls and data dependencies.


Also on alpha there's only store-store and full memory barriers. Acquire is very expensive.

Indeed. On the other hand recently ARM has added explicit load acquires primitives which are relatively cheap, so converting a consume to an acquire is not a big loss (and Linus considered doing it for the kernel a while ago just to avoid having to think too hard about compiler optimizations).

Yeah! It is not like America is going to come to arrest their leaders!

Don't think so, this the "Garante della Privacy", two different institutions.

Different — but driven by the same mindset, the same nonsense, and a system run by recycled old-guard politicians.

Language design still has a huge impact on which optimizations are practically implementable.

The Mythical Sufficiently Smart Compiler is, in fact, still mythical.


Sure, but not all compilers are created equal and are going to go to the same lengths of analysis to discover optimization opportunities, or to have the same quality of code generation for that matter.

It might be interesting to compare LLVM generated code (at same/maximum optimization level) for Rust vs C, which would remove optimizer LOE as a factor and more isolate difficulties/opportunities caused by the respective languages.


Then again, often

  #pragma omp for 
is a very low mental-overhead way to speed up code.

Depends on the code.

OpenMP does nothing to prevent data races, and anything beyond simple for loops quickly becomes difficult to reason about.


No.

It is easy to divide loop body into computation and share info update, the latter can be done under #pragma omp critical (label).


Yes! gcc/omp in general solved a lot of the problems which are conveniently left out in the article.

The we have the anecdotal "They failed firefox layout in C++ twice then did it in Rust" < to this I sigh in chrome.


The Rust version of this is "turn .iter() into .par_iter()."

It's also true that for both, it's not always as easy as "just make the for loop parallel." Stylo is significantly more complex than that.

> to this I sigh in chrome.

I'm actually a Chrome user. Does Chrome do what Stylo does? I didn't think it did, but I also haven't really paid attention to the internals of any browsers in the last few years.


And the C++ version is add std::execution::par_unseq as parameter to the ranges algorithm.

This has the same drawbacks as "#pragma omp for".

The hard part isn't splitting loop iterations between threads, but doing so _safely_.

Proving an arbitrary loop's iterations are split in a memory safe way is an NP hard problem in C and C++, but the default behavior in Rust.


Well, if you are accessing global data with ranges, you are doing it wrong.

Naturally nothing on C++ prevents someone to do that, which is why PVS, Sonar and co exist.

Just like some things aren't prevented by Rust rather clippy.


Concurrency is easy by default. The hard part is when you are trying to be clever.

You write concurrent code in Rust pretty much in the same way as you would write it in OpenMP, but with some extra syntax. Rust catches some mistakes automatically, but it also forces you to do some extra work. For example, you often have to wrap shared data in Arc when you convert single-threaded code to use multiple threads. And some common patterns are not easily available due to the limited ownership model. For example, you can't get mutable references to items in a shared container by thread id or loop iteration.


> For example, you can't get mutable references to items in a shared container by thread id or loop iteration.

This would be a good candidate for a specialised container that internally used unsafe. Well, thread id at least; since the user of an API doesn't provide it, you could mark the API safe, since you wouldn't have to worry about incorrect inputs.

Loop iteration would be an input to the API, so you'd mark the API unsafe.


There’s split_at_mut to avoid writing unsafe yourself in this case.

Afaik it does all styling and layout in the main thread and offloads drawing instructions to other threads (CompositorTileWorker) and it works fine?

That does sound like Chrome has also either failed to make styling multithreaded in C++ (or haven't attempted it), while it was achieved in Rust?

pro capita?

constraints breed creativity.

For the art I suppose.

The biggest difference between the UK and other constitutional countries is that parliament power is pretty much absolute and it is not bound by any document or pre-existing law.

In theory at least. In practice the courts have hinted that there are limits even for the parliament, and if it were to overstep some unwritten rules, it would cause a constitutional crisis.


> if it were to overstep some unwritten rules

What rules are those?


Boris Johnson asking the Queen to prorogue parliament during Brexit debates is a solid recent example.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: