I interviewed at Google last year and they said something similarly magnanimous: that they rejected people who wouldn't have been successful at Google and that the rejects actually thanked them for the wisdom. My eyes rolled all the way back in my head. I cancelled the rest of my loop and went to a different FAANG. When I sent the cancellation email I thanked the recruiter for sharing his wisdom.
Every layer thinks they're the most important, most highly specialized, most highly skilled layer. Every layer is wrong because every layer is built on top of the abstractions of the layer beneath. Take it all the way down to the physics and the math and you'll notice that even the set theorists assume some axioms (no one knows what the logicians are doing...)
Yes it's called being XYed and it's infuriating when some condescending person decides they know so much better than you that even your question is wrong.
What resonated with me in this article was that there's something about question itself that seems off. And that's even more the case if you answer a lot of people's questions on similar topics. This guy's post is obviously coming rom that standpoint, he's an expert on the software he's supporting people to use.
But if you genuinely find that you're often being guided to Y when your actual need is X then perhaps you need to think about how you approach it. For example, are you including enough context of why you're asking the question in the first place?
After becoming familiar with XY I have learned to specify "yes this is really the problem I'm trying to solve". Invariably you still get people asking "are you sure".
If you are sure, you're generally not going to answer "Yea, I'm sure", you'll answer "Yea, I'm sure because A, B, C"
In fact, if you're asking on a forum about how to solve X, be sure to add "because A, B, C" or you're just wasting everyone else's fucking time. The more details you put up front, the more apt you're going to get the answer you're looking for in the first place instead of wasting everyone elses time of exploring the problem space.
> you're just wasting everyone else's fucking time
you sure you're not the one wasting people's time by demanding they convince you of something you don't have any need to be convinced of - like did you know that just answering a question at face value is a completely plausible option?
Funny term to use as it has a very particular meaning of something that cannot/should not be interpreted because it is well defined.
This is a common flaw of the programmer type, especially when they go about writing code and documentation. That is, that what they are writing can only be interpreted in one way.
Then they toss it out into the real world and it turns out every human has a different interpretation, the runtime operates differently based on the phase of the moon, and even the compiler seems to be making its own best guess based on undefined behavior.
The value of int is typically well defined, but once you start stacking up these well defined bits, subjectivity quickly takes over.
> 3. Figurative Usage ("Taking at face value")When used in everyday language as a phrase, "taking something at face value" means accepting a statement, situation, or person exactly as they appear or are presented, without digging deeper for hidden meanings, motives, or questioning their authenticity
...
not sure what version of the english language you're using but in colloquial english "taking something at face value" means having a good faith interpretation.
In this context your question is likely a request for help, yet you're unwilling to take the time to include the relevant information in that question.
The fact you're complaining about the this, rather than modifying your own behaviour in a way that improves people's chances of helping you, seems quite extraordinary.
What if they don't, in fact, have more experience than me?
Edit: this is why tech people are insufferable socially. In any other walk of life assuming you know more than someone is a manifestly obvious faux pas.
> Then solve the problem yourself? Why are you asking someone who knows less than you?
you seem to completely misunderstand XY: it's not someone giving you the right solution to your problem which you aren't capable of arriving at yourself. it's someone telling you the problem you're having isn't the one you should be solving. it happens very frequently that some arrogant person is 100% certain X problem isn't possible, or isn't really happening, or isn't really the source of issues and they try to gaslight you into believing you've made a mistake in your reasoning and you should solve problem Y instead. you know... kind of like how you're doing right now...
This is a hypothetical, not an assumption. I’m interested in your response to the hypothetical.
As a side note, experience isn’t a unidimensional value that is directly comparable. You can have more experience than someone else in one dimension, and the other person can have more experience than you in a different dimension. I’d never argue with my mother about how to perform a blood draw.
> experience isn’t a unidimensional value that is directly comparable. You can have more experience than someone else in one dimension, and the other person can have more experience than you in a different dimension.
When two such people communicate, it's rarely clear who knows more. Typically, I know more about the task I'm working on, and he knows more about X, and that's why I'm asking him about X. Sure, if he wants to know what I'm working on - happy to engage, provided he doesn't withhold the information I'm asking for.
I already implicitly responded: this should be handled like in any other walk of life - a few probing questions, maybe a preceding dialogue, etc. Admittedly tech people probably don't know how to handle this outside of tech either <shrug>
I didn't know that, thanks for sharing. It makes sense, but then it also makes me wonder why none of the deep learning libraries (Torch, Jax/NNX, Eigen etc...) make this information available. Instead, ML people all have their own schemes for tracking shape information, like commenting '# (b, n, t)' on every line, or suffixing shapes to variable names - and in my experience it's a common source of bugs.
I think we're talking at cross purposes. The reason I'm excited about the Pyrefly work is that it leverages the type system to infer array shapes statically, which makes it simpler to reason about them when you're writing the code and catch bugs. The fact that people have developed these janky approaches to shape tracking suggests that there's a gap to be filled.
Jax and Torch don't do that statically. They obviously have to do it at runtime, but that doesn't address this particular issue. I mention Eigen because array shape hinting is generally useful for any linalg library, not purely for ML applications.
> Jax and Torch don't do that statically. They obviously have to do it at runtime, but that doesn't address this particular issue.
You don't understand what you're talking about.
Jax is explicitly mentioned in your pyrefly link as having a parallel (but slightly weaker) system. In addition Jax is built on stablehlo which uses shape dialect, which is part of the compiler (and therefore statically known).
Torch has a symbolic shape inference system that I helped build:
> The fact that people have developed these janky approaches to shape tracking suggests that there's a gap to be filled.
I have already said it: the fact that people do not know how to use the tools does not mean the tools are lacking - it means the users are unsophisticated. Let me put it this way: almost everyone that is employed to work with these tools is aware of these features and therefore eschews those kinds of comment strings.
Every single comment you've posted was unhelpful and demeaning and you've constantly had to backtrack every single time you've had to respond. This is extremely poor form and I see zero goodwill here.
ainch doesn't care about the inference engine knowing the shapes. This is obvious. He has been talking about developer ergonomics and you've basically said "only idiots don't know where the ergonomics features are", while linking to a file which explicitly states that it is a experimental/private API that is only needed in niche situations which is basically doubling down on having poor ergonomics.
If you can't understand the problem as a developer working on those features and you had to link to the source you've written rather than the docs, you're basically admitting that you're the problem.
This is the whiniest response I've ever seen on hn. Congrats.
> you had to link to the source you've written rather than the docs, you're basically admitting that you're the problem.
God forbid i expect people using complex tools to know how they actually work.
Edit: there are two categories of devs out there - those that expect code to be a fully perfected product delivered to them on a silver platter (only requiring docs, which must be perfect btw), and those which understand all code is merely a suggestion. For the latter a code pointer is more than enough. I leave you to infer which are more productive.
Thanks for the reply - you're clearly much more experienced with the internals here, but I believe we're still talking at cross-purposes. I believe you're talking about compilers like jax.jit or torch.compile performing symbolic shape inference. I'm talking about the ergonomics of tracking shape information while writing Python code that calls these libraries. I don't use Torch much, so I'll just comment on the Jax side.
> Jax is explicitly mentioned in your pyrefly link as having a parallel (but slightly weaker) system
Jaxtyping is limited to runtime-only checks (which might as well be assert statements), and doesn't infer shapes based on operations. I'm interested in Pyrefly because I've run into the limitations of Jaxtyping in my own usage.
> Jax is built on stablehlo which uses shape dialect, which is part of the compiler (and therefore statically known).
It's true that JAX does shape inference when it compiles down to HLO - but that isn't available to the Python typing system. The Pyrefly development is addressing that, so you get static analysis before even running anything, or without having to add eval_shape calls all over your codebase. I think that's helpful, and will catch bugs. When I say Jax does inference at runtime, I mean that you have to run for the jit compiler to kick in - you don't get feedback as you edit.
> the fact that people do not know how to use the tools does not mean the tools are lacking... almost everyone that is employed to work with these tools is aware of these features and therefore eschews those kinds of comment strings.
The examples I took are from Andrej Karpathy and Noam Shazeer - maybe the disconnect is that they're more on the research side. Perhaps only unsophisticated users rely on these hacks - but as one such user I'm very excited that Pyrefly is addressing a problem I have. I suspect part of the misunderstanding that's evolved here is that these tools serve audiences with different needs.
Actually there are more if you count the ones which are not at the cutting edge but your point still stands, most high-end silicon companies only do design.
> Yes, it is possible to complete a PhD in 3-4 years, but it's not really good for your career.
this is such a "trust me bro it's good for you" con.
i graduated in 3.5 years and went directly to FAANG where i make 2x the highest paid TT at the T10 school i graduated from. do you really have the gall to tell me that it wasn't good for my career to accelerate my PhD and thereby minimize its cost (i.e., opportunity cost).
> A PhD is more like an apprenticeship
the vast majority of advisors have no skills other than how to hack the pub game. they literally have zero clue about the research. the remainder are the "exceptions that prove the rule".
No you don't understand, on Apple Silicon my CPU has comparable memory bandwidth to a $400 Pascal-era GPU. With the unified memory architecture, that means my iGPU gets 2016-levels of DDR transfer speed with none of the upsides of CUDA. It's the most cutting-edge hardware ever put in a personal computer, without a doubt.
> Nvidia didn't ship a 256gb system at sub-500gb/s transfer rate
DGX Spark has 128 GB and only 273 GB/s BW. Are we lucky that NVIDIA did ship something even worse than what you specified? I'm confused.
People have been complaining [1] about how little VRAM NVIDIA ships with their GPUs for decades. Their whole game has been "oh, you want more VRAM? Buy more or pay us 50x for server grade with 10x as much VRAM. The more you buy, the more you save."
Apple did everyone a solid by shipping something way out of that distribution. We now know more than we did before! We know that a 284B parameter model with 13B active params (or 35B with 3B active, or 671B with 37B active) can outperform a 2T model and draw a fraction as much power. How can you think that's a bad thing?
You could point out that Apple didn't invent the idea of MoE. Everyone knows that. But other than Macs, there simply were no machines with >100GB VRAM directly coupled to ~50 TFLOP/s of compute until the DGX Spark last Dec. If you wanted to run a model with more than 32 GB of weights, you had to either pay up for dozens of GPUs idling at hundreds of watts or really pay up for some $50,000 server GPUs idling at... also 100-200W each.
I feel lucky to have a $3k machine on my shelf that can run DS4-Flash with 1M context at 20t/s while drawing ~150W and making very little noise. The best part? It idles at 30W with DS4 loaded, dropping to 6W after a reboot. There isn't a single GPU on the market that can match that in the same shoebox volume.
The DGX Spark is also a niche, arbitrarily limited machine that will not displace serious datacenter workloads. It's targeted directly at the homelab LARPers and arguably a waste of money versus similarly priced GPU clusters. A 256gb Spark at LPDDRX5 transfer rates would be a genuine travesty.
You can try to weasel out any sort of edge justification you want - these are not industry-grade machines. They are slow, expensive, bandwidth-constrained SOCs that don't hold a candle to either datacenter GPUs or even decade-old gaming GPUs. It's worth criticizing when Apple does it, and also worth criticism when Nvidia does it. The only difference being that Nvidia has natural datacenter buy-in, while Apple can't even justify their own hardware in the face of TPU inference costs: https://9to5mac.com/2026/03/02/some-apple-ai-servers-are-rep...
The quoted writing is AI slop, and OP is reacting to the fact that they did not write even the introductory text themselves (or at least bother to edit out clear AI/slop indicators)
reply