Hacker Newsnew | past | comments | ask | show | jobs | submit | mikenew's commentslogin

GLM 5.1 was the model that made me feel like the Chinese models had truly caught up. I cancelled my Claude Max subscription and genuinely have not missed it at all.

Some people seem to agree and some don't, but I think that indicates we're just down to your specific domain and usage patterns rather than the SOTA models being objectively better like they clearly used to be.


It seems like people can't even agree which SOTA model is best at any given moment anymore, so yeah I think it's just subjective at this point.

Perhaps not even necessarily subjective, just performance is highly task-dependent and even variable within tasks. People get objectively different experiences, and assume one or another is better, but it's basically random.

Unless you're looking at something like a pass@100 benchmark, the benchmarks are confounded heavily by a likelihood of a "golden path" retrieval within their capabilities. This is on top of uncertainties like how well your task within a domain maps to the relevant test sets, as well as factors like context fullness and context complexity (heavy list of relevant complex instructions can weigh on capabilities in different ways than e.g. having a history where there's prior unrelated tasks still in context).

The best tests are your own custom personal-task-relevant standardized tests (which the best models can't saturate, so aiming for less than 70% pass rate in the best case).

All this is to say that most people are not doing the latter and their vibes are heavily confounded to the point of being mostly meaningless.


The pass@100 is such a weird critique angle that is surprisingly mainstream; guess what, no one cares if the correct answer is in the top 100, it needs to be the top 1. A model with a better answer in the top 1 is a better model, full stop.

This. Plus if you want to even attempt measuring real 'intelligence' you want to run a neuro-symbolic, de-lexicalized benchmark (e.g. DL-ReasonSuite, SoLT, GSM-Symbolic) - which none of the providers releasing new models showcase.

>just performance is highly task-dependent and even variable within tasks. People get objectively different experiences, and assume one or another is better, but it's basically random.

You are right that this is not exactly subjectivity, but I think for most people it feels like it. We don't have good benchmarks (imo), we read a lot about other people's experiences, and we have our own. I think certain models are going to be objectively better at certain tasks, it's just our ability to know which currently is impaired.


SOTA models war is the new console war.

But more seriously, I can't help but be amused by how emotionally invested in their AI brand of choice people are getting.


AI is a complete commodity

One model can replace another at any given moment in time.

It's NOT a winner-takes-all industry

and hence none of the lofty valuations make sense.

the AI bubble burst will be epic and make us all poorer. Yay


Staying power is probably the most important factor, which is why I'm thinking Google eventually takes the crown.

They might be converging somewhat. The ultimate limiting factor is training data. Eventually I think they will converge and then the competition will be on memory and compute efficiency, with the best being the smallest maximally capable model.

And the subjectivity is bidirectional.

People judge models on their outputs, but how you like to prompt has a tremendous impact on those outputs and explains why people have wildly different experiences with the same model.


I had one occasion where GLM 5.1 did about 95% of the implementation that I needed but couldn't progress form there. And Codex (free quota) solved the remaining 5% on the spot. I'm super happy with both. I don't touch anything Anthropic with a 10 foot pole.

>GLM 5.1 was the model that made me feel like the Chinese models had truly caught up. I cancelled my Claude Max subscription and genuinely have not missed it at all.

GLM 5.1 is pretty good but there are some "buts".

They hiked the prices 2 times this year. I subscribed to the pro coding plan just before the last hike. At the start of the year, they had only 5 hours quota and no weekly quota. And I hit the weekly quota hard. I can't upgrade the subscription to get a higher weekly quota because they jacked up the prices a lot recently.

My $30 subscription costs now $72. Previously was $15. Max was $49,then $80 and now $160.


I used GLM 5.1 and it was bad, I have no clue why people claim it is good

What hardware do you run it on? Trying to consider the cost of subscription + API vs new HW..

I feel like it's Sonnet level for implementation, but not matching up to Opus for planning.

But I agree it's close enough that it's worth using heavily. I've not cancelled my Claude Max subscription, but I've added a z.ai subscription...


My combo is codex and claude basic subscription for planing the hard tasks (if any) opencode with GLM 5.1 (z.ai coding plan) for the actual coding.

opencode is awesome I don't miss cluade or codex cli at all, and the z.ai plan is way more generous in compression.

I was lucky to subscribe to z.ai coding plan pro when it costed 30$/month, I was surprised now it costs 70$/month.

In case anyone wants to subscribe to z.ai with 10% discount [1] * here is the credit campaign rules * [2]

- [1] https://z.ai/subscribe?ic=MW6H74HAZ0

- [2] https://docs.z.ai/devpack/credit-campaign-rules


The value in Claude Code is its harness. I've tried the desktop app and found it was absolutely terrible in comparison. Like, the very nature of it being a separate codebase is already enough to completely throw off its performance compared to the CLI. Nuts.

> The value in Claude Code is its harness

If this was the case then Anthropic would be in a very bad spot.

It's not, which is why people got so mad about being forced to use it rather than better third party harnesses.

Pi is better than CC as a harness in almost every respect.


Anthropic limiting Claude subs to Claude code is what pushed me away in the end because I wanted to keep using Pi.

Just sign up for an AWS account and use the Anthropic models through Bedrock which Pi can use.

API costs are really high compared to subs.

Then you aren't the target market.

Why use tricks to support a company that is hostile to your use case?

What advantage are you saying this has compared to just directly going through the Anthropic provider? They are the same price.

Can you enumerate why?

- Claude Code has repeatedly had enormous token wastage bugs. Its agent interactions are also inefficient. These are the cause of many of the reports of "single prompt blew through 5-hour quota" even though it's a reasonable prompt.

- It still lacks support for industry standards such as AGENTS.md

- Extremely limited customization

- Lots of bugs including often making it impossible to view pre-compaction messages inside Claude Code.

- Obvious one: can't easily switch between Claude and non-Claude models

- Resource usage

More than anything, I haven't found a single thing that Pi does worse. All of it is just straight up better or the same.


I thought the desktop app used the cli app in the background?

Hmm

Will try it out. Thanks for sharing!


What is your workflow? Do you use Cursor or another tool for code Gen?

I use Opencode, both directly and through Discord via a little bridge called Kimaki.

https://github.com/remorses/kimaki


I've found GLM 5.1 to be extremely good, as long as you keep the context under 100k or so. But I do agree; the actual service is rough. It's often unusable during peak hours.

I imagine they're trying to slow down the acquisition of new customers because they're so overloaded. But yeah at double the price it doesn't seem worth it anymore.


I can't find an announcement on this, but here's the archived subscription page from a couple days ago: https://web.archive.org/web/20260410092340/https://z.ai/subs...

The Max plan has doubled, and the Lite and Pro plans have more than doubled. No change in usage limits.


Even if this is largely due to a change in how PCs in China are being counted, it's still amazing to watch Linux usage continue to climb like this.

It's really the only opposing force to Microsoft's enshittification of Windows.


Linux’s ecosystem has also improved significantly over the past two years, especially in China. Due to the influence of “Xinchuang” (that is, domestically produced Linux rebranded under another shell), many Chinese desktop applications have been reworked in the past couple of years, switching from Windows-specific tech stacks to cross-platform ones—mostly Electron, basically browser wrappers—and now support the Linux platform. The commonly used software is basically all there.

In addition, the development of LLMs has greatly lowered the barrier to using the Linux command line. Problems that used to take a full day to solve can now be handled easily by anyone who can write a prompt—just ask, copy, and paste. This has even made Windows’ command line unfriendly by comparison, despite its own major improvements in recent years, turning it into a significant drawback.


This feels like an existential threat to HN, and to the general concept of anonymous online discourse. Trust in the platform is foundational, and without it the whole thing falls down.

Requiring proof of identity is the only solution I can think of, despite how unappealing it is. And even then, you'll still have people handing their account over to an LLM.

I really struggle to imagine a way around it. It could be that the future is just smaller, closed groups of people you know or know indirectly.


> Requiring proof of identity is the only solution I can think of, despite how unappealing it is

Same. I agree that it is unappealing but it can be done in a way that respects anonymity.

I built this and talk about it here: https://blog.picheta.me/post/the-future-of-social-media-is-h...

I think we’re on the precipice of this being a requirement to have any faith you’re talking to another human. As a side effect it also helps avoid state actors from influencing others.


> I think we’re on the precipice of this being a requirement to have any faith you’re talking to another human.

Except that it doesn't prove you're talking to a human - it just increases the hurdles for bot operators (buy or steal verified accounts).


It adds enough of a barrier to be worth it. In the way I have implemented it, you can only have one account per ID (for example passport). Yes, you can buy fake passports, but it's prohibitively expensive. Read my blog post for more info.


This is not a technical issue - it's a societal one. Do we want online ID verification? Are the trade-offs worth it? Do we want to make the internet a place that requires an ID everywhere for age verification or to prove that you're human? What would the implications be?

Regarding your implementation: Most people don't have a passport, so it's a non-starter - but again, this topic is not a technical issue.


I think that it is a technical issue to a certain extent. Governments could make it very easy to prove humanity (and age) in a secure manner that doesn't leak your personal details to the third party that wants to perform the verification.

I don't see that as "requiring ID".

I think the real question is how much do we care that our online spaces are composed of not just AI bots, but also sock puppet accounts controlled by various people (from governments, rich people, all the way to harassers that use alt accounts) wanting to trick us.


You're still arguing from a technical perspective while not addressing the societal issues that online ID verification leads to. Do we as society really want an internet that resembles a gated community where you can only enter with an ID? What about the people we exclude? Should we abandon the free internet just because of bots and sock puppet accounts? What about other ways to address the issue?


I mean, reddit accounts are valued based on the identity they have built. Its not farfetched to imagine uninterested users making and selling a single account each.


There's lots of alternatives. Others have mentioned invites and proof of work, and I'll mention a third alternative: a voucher system.

E.g. I make a new hackernews account, and say "just ask wikipedia, they will vouch for my new hackernews account". Then wikipedia checks if any of their accounts vouch for this new hackernews account. If a user with enough reputation on Wikipedia (e.g. your friends or one of your own wikipedia accounts) vouches for this new hackernews account then wikipedia tells hackernews "yes, that account is legit".

Hackernews knows the minimum amount possible about the new account. And while wikipedia knows something, they know WAY LESS than a full ID check. People can have multiple Wikipedia accounts.

And its a two way street; Wikipeida could ask hackernews about new accounts. Both sites would benefit from the collaboration.

Karma could actually become meaningful/useful for reputation checks.

The only unfortunate aspect is I'm not aware of any software tooling for such a system.


Removing anonymity is not a solution, just a different problem.


I don't feel like using HN anymore, I hope the just add invites, last time I said this someone replied it's just the same as some other site then, but it's not... hn is hn...this situation is really bumming me out.


Looks like http://lobste.rs is it. I haven't been invited, and I' not really sure I should be, but I'm having a very nice time just reading.


Another option instead of using identity is to use proof of work or hashcash such that anyone who thinks a comment is valuable can use some hash rate to upvote it. It doesn't matter how the content was generated, only that someone thought it was important, and you can independently verify this by checking how much hash effort went into hashing for that comment. This also does not require any identity either.


Advertisers are more willing to spend money to promote content than an individual is willing to do the same...


Having multiple different distribution channels can solve that problem. Advertisers cannot monopolize all distribution channels simultaneously because of the costs involved (it would be like someone trying to buy the whole economy).


Using a real identity doesn't fix that problem either though: advertisers just pay real people in India to do ID checks.


I don't think that's true at all.

One of the things HN does is not let you interact in certain ways until you've earned sufficient karma. This is a basic proof-of-work. If your bot can't average a positive karma, then it'll never get certain privileges.

Not to say the system is perfectly tuned for bots, because it's not. The point is that proof of identity is not the only option.


HN is almost entirely about the comments. Voting is useful as a tool for loosely sorting content but otherwise, HN could easily do without it. Some of the most valuable comments come from people with barely any karma. And that’s why HN is great! The restrictions on voting and flagging for new users could be removed without impacting the quality of HN. I can’t imagine any scenario in which HN’s current system could survive the same slopification that is happening on reddit.

HN is doing okay at the moment because nobody is yet publishing ebooks and videos on how to astroturf HN to launch your SaaS. Unfortunately, Reddit hasn’t escaped that fate.


They get the privilege of immediately polluting the website with LLM-generated comments.

Many of them sound and look completely normal and have others on here interacting with them. They don't use em dashes, sometimes they'll use all lowercase text, sometimes the owner of the bot will come out and start commenting to throw you off.

All examples I've witnessed here.

HN should immediately start implementing at least some basic bot detection methods without requiring us to email them every time. I've discovered multiple bots make detailed comments within 30 seconds of each other in different threads, something a normal human wouldn't be able to do. That should be at least flagging the account for review. Obviously they'll get smarter and not do that soon but it would help in the short term.

I'd say it's not an issue but everything I described above has happened in less than a month and every day now I'm discovering bots here.


I do agree that bots are or will be an existential risk for every online forum. But I also think that an attempt to fix it that takes away anonymity is a cure that's worse than the disease.

My best understanding is yes -- there are signal that somebody is a bot (like how quickly they post), but if HN bans based on those signals then whoever made the bot will keep tweaking the code.

I feel like I rarely see bots in the top 5 comments of any article I read, or otherwise causing major disruption.

I think we just need to get creative about ways a platform can prove somebody is an invested human without tying it back to any personally identifiable information.


Prevent pasting comments. Implement a naive check for time spent typing the comment, and shadowban posts that don't pass the criteria. Add a 1 minute wait and captcha for posting.

That'd drastically reduce the amount of low effort posts, both human-written and generated.


Preventing pasting would drastically reduce how often people cite their sources; no one wants to hand copy a long url.


I've been working on this tool to address this same issue in other communities: https://www.ityped.it/

It's certainly not perfect, but similar to what you mention.

p: https://www.ityped.it/p/WIiTYfdxQ5ww


Have the output of the LLM sent to a headless browser that "types" and submits the comment as necessary, with some randomness added for authenticity.

Or, since this would need to be done in javascript, just block or rewrite the javascript and fake the output in the sent request.

Simplistic solutions like this stopped being meaningful decades ago.


> And even then, you'll still have people handing their account over to an LLM.

Exactly. So what's proof of identity good for?


Invitation only is a reasonably successful alternative for niche communities, especially with the ability to banish an invite "tree".

My conspiracy theory: Campaign money, from the last few elections (I think "Correct the record" [1] was the first "disclosed" push), resulted in a bunch of bot accounts being made/bought all across social media. These are being lightly used to maintained some reasonably realistic usage statistics, and are "activated" to respond to key political topics/times. This is on top of spam accounts to push products and, of course, the probably higher-than-average bot number of accounts, made for fun, by HN users.

[1] https://en.wikipedia.org/wiki/Correct_the_Record


invitation tree. lobste.rs already has it, works great.


The side effect of trying to enforce this kind of sensitivity is that you make certain things taboo to talk about. And this is a good example of something that should be easy for someone to talk or even joke about because it makes dipping into that conversation much easier.


Is there a name for this? I think about this all the time. I've always had a theory that some offensive words may actually be persisting longer solely because we essentially calcify their definitions and never allow them to evolve into new less offensive meanings.


Douglas Crockford nearly got cancelled because he qualified JavaScript as "promiscuous". People not knowing what the word means plus having a sense of urgency about sensitivity can be a dangerous combination.


This is well researched. See the Werther Effect. Casual, trivial, glamorized, or humorous framing behaves like contagion exposure.


The Werther Effect seems to be all about media reporting? All the reputable sources I could easily find suggest that talking about suicide casually does not inspire it.


How about Rogue-like Linux?


Ironman Linux.


Ultimate Ironman Linux: you can't save anything to the disk.


Pretty much every software engineer I've talked to sees it more or less like you do, with some amount of variance on exactly where you draw the line of "this is where the value prop of an LLM falls off". I think we're just awash in corporate propaganda and the output of social networks, and "it's good for certain things, mixed for others" is just not very memetic.


I wish this was true. My experience is co-workers who do lip service as to treating LLM like a baby junior dev, only to near-vibe every feature and entire projects, without spending so much as 10 mins to think on their own first.


I played with it extensively for three days. I think there are a few things it does that people are finding interesting:

1. It has a lot of files that it loads into it's context for each conversation, and it consistently updates them. Plus it stores and can reference each conversation. So there's a sense of continuity over time.

2. It connects to messaging services and other accounts of yours, so again it feels continuous. You can use it on your desktop and then pick up your phone and send it an iMessage.

3. It hooks into a lot of things, so it feels like it has more agency. You could send it a voice message over discord and say "hey remember that conversation about birds? Send an email to Steve and ask him what he thinks about it"

It feels more like a smart assistant that's always around than an app you open to ask questions to.

However, it's worth stressing how terrible the software actually is. Not a single thing I attempted to do worked correctly, important issues (like the discord integration having huge message delays and sometimes dropping messages) get closed because "sorry we have too many issues", and I really got the impression that the whole thing is just a vibe coded pile of garbage. And I don't like to be that critical about an open source project like this, but I think considering the level of hype and the dramatic claims that humans shouldn't be writing code anymore, I think it's worth being clear about.

Ended up deleting it and setting up something much simpler. I installed a little discord relay called kimaki, and that lets me interact with instances of opencode over discord when I want to. I also spent some time setting up persistent files and made sure the llm can update them, although only when I ask it to in this case. That's covered enough of what I liked from OpenClaw to satisfy me.


> You could send it a voice message over discord and say "hey remember that conversation about birds? Send an email to Steve and ask him what he thinks about it"

if one of my friends sent me an obviously AI-written email, I think that I would cease to be friends with them...


> You could send it a voice message over discord and say "hey remember that conversation about birds? Send an email to Steve and ask him what he thinks about it"

Ah, so it's a device for irritating Steve, got it.


> “hey remember that conversation about birds? Send an email to Steve and ask him what he thinks about it”

Isn’t the “what he thinks about it” part the hardest? Like, that’s what I want to phrase myself - the part of the conversation I’d like to get their opinion on and what exactly my actual request is. Or are people really doing the meme of sending AI text back and forth to each other with none the wiser?


I think in the context of business communication; yeah a lot of people are doing that. Which, to be honest, I don't think it the worst thing ever. Most corporate communication is some basic information padded out with feigned personal interest and rehearsed politeness, so it's hardly a huge loss.

For personal communication between friends it would be horrible. Authenticity has to be one of the things I value most about the people I know. Didn't mean to imply from that example that I did or would communicate that way.


I like how Xcode installs a bunch of gigantic, multi-gigabyte artifacts for like ios runtimes or whatever, fills up the hard drive, can't update because it's out of space, and then tells me I'm not allowed to delete them because of SIP.


The dozens and dozens of simulators it installs without asking... which kill your system's audio capabilities for some reason: https://discussions.apple.com/thread/256140785

But the best part is what it DOESN'T install when you think you've updated. You get on a plane and settle in for some work, only to be prompted to download and install a bunch of required crap you weren't told about. OH WELL, says Apple, your time is FREE!


I've been using it for the past couple days. Like most AI products right now, it is both incredible and incredibly stupid.

Virtually everything I've tried (starting with just getting it running) was broken in some way. Most of those things I was able to use an LLM to resolve, which is cool, but also why doesn't it just work to begin with?

I still haven't gotten it to successfully create a cron job. Also messages keep getting lost between the web GUI and discord. Trying to enable the matrix integration broke the whole thing. It seems to be able to recall past sessions, but only sometimes.

I've been using OpenCode with various models, often times running several instances in tmux that I can connect to and switch between over ssh. It feels like the hype around openclaw is mostly from bringing the multi-instance agentic experience to non-developers, and providing some nice hooks to integrate with email, twitter, etc. But given that I have a nice setup running opencode in little firejail-isolated containers, I'll probably drop openclaw. Way too janky, and I can't get over the thought of "if this is so amazing, why doesn't it work?"


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: