Hacker Newsnew | past | comments | ask | show | jobs | submit | mh-'s commentslogin

I still haven't seen any statistically sound data supporting that this is happening on the API (per-token pricing.)

If you've got something to share I'd love to see it.


There's an interesting analysis here: https://github.com/anthropics/claude-code/issues/42796

>The most striking row is user prompts: 5,608 in February vs 5,701 in March. The human put in the same effort. But the model consumed 80x more API requests and 64x more output tokens to produce demonstrably worse results.


If I'm understanding the thread correctly, I have a git alias to `git commit --amend --no-edit`, for exactly this workflow. When I'm hacking on something locally and want to just keep amending a commit. I only ever do this if it's HEAD though.

Yes, one way to think about jj in a sort of low-level way is that every jj command does the equivalent of that, every time.

(You can also set up watchman and have that happen on every file change...)


A lot of people are making the mistake of noticing that local models have been 12-24 months behind SotA ones for a good portion of the last couple years, and then drawing a dotted line assuming that continues to hold.

It simply.. doesn't. The SotA models are enormous now, and there's no free lunch on compression/quantization here.

Opus 4.6 capabilities are not coming to your (even 64-128gb) laptop or phone in the popular architecture that current LLMs use.

Now, that doesn't mean that a much narrower-scoped model with very impressive results can't be delivered. But that narrower model won't have the same breadth of knowledge, and TBD if it's possible to get the quality/outcomes seen with these models without that broad "world" knowledge.

It also doesn't preclude a new architecture or other breakthrough. I'm simply stating it doesn't happen with the current way of building these.

edit: forgot to mention the notion of ASIC-style models on a chip. I haven't been following this closely, but last I saw the power requirements are too steep for a mobile device.


Don’t underestimate the march of technology. Just look at your phone, it has more FLOPS than there were in the entire world 40 years ago.

And I think it's very likely that with improved methods you could get opus 4.6 level performance on a wrist watch in few years.

You needed supercomputer to win in chess until you didn't.

Currently local models performance in natural language is much better than any algorithm running on a super computer cluster just few years ago.


Yeah, but that's the current state of the art after decades of aggressive optimizations, there's no foreseeable future where we'll ever be able to cram several orders of magnitude more ram into a phone.

We already cram several orders of magnitude more flash storage into phone than RAM (e.g. my phone has 16 GB RAM but 1 TB storage); even now, with some smart coding, if you don't need all that data at the same time for random access at sub millisecond speed, it's hard to tell the difference.

Agreed. Apple is sells an iPhone Pro Max with 2 TB of storage.

but it doesn't have that much more flops than it did a couple of years ago.

There's been plenty of free lunch shrinking models thus far with regards to capability vs parameter count.

Contradicting that trend takes more than "It simply.. doesn't."

There's plenty of room for RAM sizes to double along with bus speed. It idled for a long time as a result of limited need for more.


The gap between SOTA models and open / local models continues to diminish as SOTA is seeing diminishing returns on scaling (and that seems to be the main way they are "improving"), whereas local models are making real jumps. I'm actually more optimistic local models will catch up completely than I am SOTA will be taking any great leaps forward.

Would the model even need that breath of knowledge? Humans just look things up in books or on Wikipedia, which you can store on a plain old HDD, not VRAM. All books ever written fit into about 60TB if you OCR them, and the useful information in them probably in a lot less, that's well within the range of consumer technology.

Pretty sure there’s at least a couple orders of magnitude in purely algorithmic areas of LLM inference; maybe training, too, though I’m less confident here. Rationale: meat computers run on 20W, though pretraining took a billion years or so.

I've worked in the space. That sort of stuff is true (incredibly), but it's also not consumers' problem.

No, there's approximately just as much technical and interesting content on Twitter as there used to be. Lots of people left, lots of different people joined.

It's just that this content is outnumbered some 100,000:1 now instead of the mere 1000:1 it used to be (ratios made up, but directionally correct.)

From my point of view, HN is trending in that same direction. It's just that the ratios aren't nearly as dramatic.


It's the ratio that counts the most. You seem to be implying TwiX is getting an increasingly bad ratio. That would imply, to me, an increasingly limited lifespan for encouraging quality.

I would mind far less if the political comments were only the political posts. I just avoid clicking into those.

It's when I click into an interesting topic, and it's steered into being an offtopic retread of every other thread about US politics. The upvote/downvote system simply no longer works to squelch it as it once did, because there are enough people here who believe "everything is political" and therefore it's always "on-topic".

That is their prerogative, but it has dramatically lessened my enjoyment and engagement on this platform in the last 5 years. And it's gone into overdrive in the last 6 months.


HN is my top candidate for a solution like this, too. Because there's a ton of high quality content here, increasingly buried beneath a small number of sentiments and topics I don't care to see rehashed constantly.

I'd like to see it, too, but for the opposite[1] reason: Others can use this curation (which only affects their own view of HN) instead of flagging (which affects my view and everyone else's too).

1: https://news.ycombinator.com/item?id=47744253


I use the flag functionality as per the guidelines:

> Off-Topic: Most stories about politics, or crime, or sports, or celebrities, unless they're evidence of some interesting new phenomenon. If they'd cover it on TV news, it's probably off-topic.

> If a story is spam or off-topic, flag it. Don't feed egregious comments by replying; flag them instead. If you flag, please don't also comment that you did.

Flagging is a way to shape what types of content takes up the finite amount of attention available on HN. If everyone used it (only) in the way the guidelines ask you to, the front page would look very different on a given day.

https://news.ycombinator.com/newsguidelines.html


You should figure out how to fix the way it appears in the MAS listing, it's going to cost you a lot of downloads among a savvy audience. I always check that IAP section on free apps before I bother downloading.

I get why the previous subscription option would still appear, but I'm not sure why the one-time option wouldn't be appearing. Maybe not enough transactions on it yet?


I feel the need to say it shouldn't be this way, to avoid an onslaught of replies, but:

It would be dramatically easier to discover and exploit vulnerabilities/glitches in their multiplayer experience, which is their cash cow.


On the other hand, maybe the community could submit bug fixes for loading times.

https://news.ycombinator.com/item?id=26296339


I may be misremembering a drunken conversation with a developer but IIRC the root cause was choice of cross-platform APIs available in early 2010s & the JSON file was tiny when introduced.

The problem was not in delivering JSON. There were better ways, but it was good enough.

The failure is that loading times had been a complaint for years, and nobody involved lifted a finger. It would be impossible to use the platform without feeling the pain.


I don't really disagree.

The software was released on 7 platforms, not counting multiple Windows versions. I don't know the risks or what platforms changes impact today or the test effort involved. I expect "it's still functioning as expected" was the default.


That was the state of play in 2015 as well. In the absence of a claim from the group otherwise, I wouldn't be surprised if they simply couldn't get it to stop (on a technical level.)

Way back when, it was a pretty common screwup to accidentally saturate the nodes you were packeting from. So then your C&C couldn't get them to respond, either. Oops.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: