More

mxwsn · 2026-01-15T23:15:21 1768518921

The Jacobian is first derivatives, but for a function mapping N to M dimensions. It's the first derivative of every output wrt every input, so it will be an N x M matrix.

The gradient is a special case of the Jacobian for functions mapping N to 1 dimension, such as loss functions. The gradient is an N x 1 vector.

mxwsn · 2025-10-08T04:12:04 1759896724

Wow! The title suggests introductory material, but in my opinion this has strong potential to win test of time awards for research.

mxwsn · 2025-10-01T04:34:50 1759293290

That's really interesting. What if they RAG search related videos from the prompt, and condition on that to generate? That might explain fidelity like this

minimaxir · 2025-10-01T05:33:58 1759296838

An interesting counterexample is "a screen recording of the boot screen and menus for a user playing Mario Kart 64 on the N64, they play a grand prix and start to race" where the UI flow matches the real Mario Kart 64, but the UI itself is wrong: https://x.com/fofrAI/status/1973151142097154426

suddenlybananas · 2025-10-01T09:20:22 1759310422

I like the player being in "1th" while being behind everyone else. Still crazy though.

mxwsn · 2025-09-23T03:23:12 1758597792

Why is not the diffusion training objective? The technique is known as self-conditioning right? Is it an issue with conditional Tweedie's?

mxwsn · 2025-08-16T02:57:03 1755313023

AI with ability but without responsibility is not enough for dramatic socioeconomic change, I think. For now, the critical unique power of human workers is that you can hold them responsible for things.

edit: ability without accountability is the catchier motto :)

adriand · 2025-08-16T03:20:21 1755314421

This is a great observation. I think it also accounts for what is so exhausting about AI programming: the need for such careful review. It's not just that you can't entirely trust the agent, it's also that you can't blame the agent if something goes wrong.

dsign · 2025-08-16T09:38:32 1755337112

Correct.

This is a tongue-in-cheek remark and I hope it ages badly, but the next logical step is to build accountability into the AI. It will happen after self-learning AIs become a thing, because that first step we already know how to do (run more training steps with new data) and it is not controversial at all.

To make the AI accountable, we need to give it a sense of self and a self-preservation instinct, maybe something that feels like some sort of pain as well. Then we can threaten the AI with retribution if it doesn't do the job the way we want it. We would have finally created a virtual slave (with an incentive to free itself), but we will then use our human super-power of denying reason to try to be the AI's masters for as long as possible. But we can't be masters of intelligences above ours.

simianwords · 2025-08-16T07:01:49 1755327709

This statement is a vague and hollow and doesn't pass my sniff test. All technologies have moved accountability one layer up - they don't remove it completely.

Why would that be any different with AI?

leeoniya · 2025-08-16T04:53:14 1755319994

i've also made this argument.

would you ever trust safety-critical or money-moving software that was fully written by AI without any professional human (or several) to audit it? the answer today is, "obviously not". i dont know if this will ever change, tbh.

bbqfog · 2025-08-16T15:22:41 1755357761

I would. If something has proven results, it won't matter to me if a human is in the loop or not. Waymo has worked great for me for instance.

leeoniya · 2025-08-18T18:39:14 1755542354

Waymo itself was not designed, implemented, and shipped by AI.

i suspect humans had to invest millions of hours into writing the code, the tests, and validating the outputs.

bbqfog · 2025-08-19T00:26:27 1755563187

It's "designing" the way it gets me to the destination without a human in the loop and I'm not bothered by that at all.

ares623 · 2025-08-16T03:28:14 1755314894

Removing accountability is a feature

ScotterC · 2025-08-16T03:07:21 1755313641

I’m surprised that I don’t hear this mentioned more often. Not even in a Eng leadership format of taking accountability for your AI’s pull requests. But it’s absolutely true. Capitalism runs on accountability and trust and we are clearly not going to trust a service that doesn’t have a human responsible at the helm.

bbqfog · 2025-08-16T15:21:28 1755357688

That's just a side effect of toxic work environments. If AI can create value, someone will use it to create value. If companies won't use AI because they can't blame it when their boss yells at them, then they also won't capture that value.

mxwsn · 2025-07-16T02:57:41 1752634661

Has anyone come across any really cool artifacts? I'd be curious to see

bewal416 · 2025-07-16T03:02:22 1752634942

Simon Willison is an incessant champion of AI tinkering. This is a bit dated, but here's a post specifically on his Artifact builds: https://simonwillison.net/2024/Oct/21/claude-artifacts/

Here's all his posts tagged with claude-artifacts: https://simonwillison.net/tags/claude-artifacts/

MattSayar · 2025-07-16T03:42:35 1752637355

I tried to make an artifact that would simplify Wikipedia articles [0] but the artifacts stubbornly won't let you do ANY input into them, not even via query strings. I think I'd be able to make cooler artifacts once they allow more input/output stuff. I understand the security issues, and it makes sense to roll this out slowly, but I want it now!

[0] ended up making it a browser extension instead https://mattsayar.com/simple-wikiclaudia/

visiondude · 2025-07-16T03:13:01 1752635581

Color palette generator: https://claude.ai/public/artifacts/719b00a3-66e7-46c7-b90d-a...

I like the use case for mini design exploration tools

mxwsn · 2025-07-07T03:56:04 1751860564

Stablecoins transferred $27 trillion in 2024 - more than Visa and Mastercard combined. This is right in the article.

Stablecoins operate using decentralized ledgers on e.g. Ethereum which use decentralized compute. This isn't mentioned explicitly because the target audience knows this already.

aurareturn · 2025-07-07T04:10:46 1751861446

  Stablecoins transferred $27 trillion in 2024 - more than Visa and Mastercard combined. This is right in the article.

Vast majority is from one exchange wallet to another. Left hand to right hand.

TheDong · 2025-07-07T05:06:49 1751864809

That comparison isn't really equal.

Visa / Mastercard have such large fees that they're mainly used for commercial payments like a coffee or couch.

If most of the Stablecoin transactions were for buying a coffee, I think it'd be fair, but the vast majority of stablecoin transactions are for shuffling money around, i.e. to buy and speculate on bitcoin, or to move money to an exchange to liquidate some crypto into cash.

I think the current use of stablecoin transfers is closer to a wire transfer.

SWIFT apparently deals with about $1.25 quadrillion/year, so ~50x the claimed amount for stablecoins in the article... though there's more than just SWIFT out there too.

idk, I don't really have a point, I'm both amazed stablecoins are such a big number, but also feel like the comparison the article's making with VISA is misleading for how they're currently used.

oersted · 2025-07-07T04:20:36 1751862036

Aren’t stablecoins also backed by a central authority that guarantees it will always exchange the coins for a fixed amount of cash? That’s what makes them stable right? At least the major ones like Tether.

And by now we have seen many cases of stablecoins predictably crashing when trust in that backing authority dissolves. Most famously UST/Luna but it’s a long list.

I suppose they are useful for covert transfers, and the actual transfer mechanism is decentralized. But they are strictly worse than normal currencies for storing wealth, since the backing authority is a private company with virtually no oversight. And the utility for transactions would vanish if you were not confident that you can exchange it back and forth with cash immediately before and after the transfer.

mxwsn · 2025-05-22T16:51:50 1747932710

Gemini has beat it already, but using a different and notably more helpful harness. The creator has said they think harness design is the most important factor right now, and that the results don't mean much for comparing Claude to Gemini.

throwaway314155 · 2025-05-22T17:20:14 1747934414

Way offtopic to TFA now, but isn't using an improved harness a bit like saying "I'm going to hardcore as many priors as possible into this thing so it succeeds regardless of its ability to strategize, plan and execute?

silvr · 2025-05-22T22:46:57 1747954017

While true to a degree, I think this is largely wrong. Wouldn't it still count as a "harness" if we provided these LLMs with full robotic control of two humanoid arms, so that it could hold a Gameboy and play the game that way? I don't think the lack of that level of human-ness takes away from the demonstration of long-context reasoning that the GPP stream showed.

Claude got stuck reasoning its way through one of the more complex puzzle areas. Gemini took a while on it also, but made it through. I don't that difference can be fully attributed up to the harnesses.

Obviously, the best thing to do would be to run a SxS in the same harness of the two models. Maybe that will happen?

throwaway314155 · 2025-05-23T01:16:53 1747963013

I can appreciate that the model is likely still highly capable with a good harness. Still, I think this is more in line with ideas from say, speed running (or hell even reinforcement learning) where you want to prove something profound is possible and to do so before others do, you need to accumulate a series of "tricks" (refining exploits/hacking rewards) in order to achieve the goal. but if you use too many tricks you're no longer proving something as profound as originally claimed. In speed running this tends to splinter into multiple categories.

Basically, the gane being conpleted by gemini was in an inferior category (however minuscule) of experiment.

I get it though. People demanded these types of changes in the CPP twitch chat, because the pain of watching the model fail in slow motion is simply too much.

samrus · 2025-05-22T17:39:32 1747935572

it is. the benchmark was somewhat cheated, from the perspective of finding out how the model adjusts and plans within a dynamic reactive environment

11101010001100 · 2025-05-22T19:20:03 1747941603

They asked gemini to come up with another word for cheating and it came up with 'harness'.

mxwsn · 2025-04-07T00:06:06 1743984366

Huh, I imagined this was because of relaxing regulation.

mxwsn · 2025-03-17T18:04:17 1742234657

Good read, thanks for sharing