More

PunchTornado · 2026-04-24T13:03:48 1777035828

When i signed up as a volunteer they assured me it was not going to be public, only veted researchers allowed to access it.

PunchTornado · 2026-04-24T10:35:40 1777026940

it's just sad

PunchTornado · 2026-04-20T19:27:28 1776713248

what are you on about? most people i know have 4 year old phones which are just fine and want only the battery changed. my phone was 6 years old this year and it hurt me that I had to change it because of the battery. otherwise it was a perfectly fine phone.

PunchTornado · 2026-04-16T15:35:01 1776353701

neah, I believe most people here, which immediately brag about codex, are openai employees doing part of their job. otherwise I couldn't possibly phantom why would anyone use codex. In my company 80% is claude and 15% gemini. you can barely see openai on the graph. and we have >5k programmers using ai every day.

EQmWgw87pw · 2026-04-16T15:46:02 1776354362

I’m thinking the same thing, Codex literally ruined the codebases that I experimented with it on.

muyuu · 2026-04-16T17:07:25 1776359245

Currently GPT just works much better, and so does Gemini but it's more expensive right now. Going through Opencode stats, their claim is that Gemini is the current best model followed by GPT 5.4 on their benchmarks, but the difference is slim.

My personal experience is best with GPT but it could be the specific kind of work I use it for which is heavy on maths and cpp (and some LISP).

scottyah · 2026-04-16T16:56:09 1776358569

OpenAI replaced its founding engineers with Meta PMs. The shift towards consumer engagement metrics and marketing is apparent.

Klayy · 2026-04-16T16:08:40 1776355720

You can believe whatever you want. I found claude unusable due to limits. Codex works very well for my use cases.

PunchTornado · 2026-04-15T18:07:43 1776276463

It's the fact that is not task specific.

PunchTornado · 2026-04-14T19:20:05 1776194405

Jesus, I don't want to be mean, but some things that Google creates are completeyl useless...

ugh123 · 2026-04-15T05:12:07 1776229927

Speak for yourself. I've actually been waiting for something like this to be part of their chrome extension for a long time

PunchTornado · 2026-04-13T19:43:53 1776109433

Look at those graphs another time. Claude beats gpt.

superfrank · 2026-04-13T22:10:28 1776118228

Can you explain where you're seeing that? From what I see, the first two graphs have OpenAI models above Claude models (including Mythos) on the Technical Non-Expert and the Practitioner evals. Mythos now beats Codex 5.3 on the Expert eval and Opus was already on top for the Apprentice one although now Mythos leads there.

So, even including Mythos, OpenAI still has 2 models on top for the 4 evals listed.

bonsai_spool · 2026-04-13T23:43:18 1776123798

> From what I see, the first two graphs have OpenAI models above Claude

That's just in that final graph, and that graph is perhaps the least instructive - they talk about ranges of outcomes but they don't show whether all of the models besides Mythos / Opus 4.6 overlap

Take a look at all three graphs together and it's clear Anthropic are doing better in this arena

superfrank · 2026-04-14T02:48:01 1776134881

Yes. I know. That was exactly what I said in my first comment.

On individual tasks Claude and GPT are comparable (as shown in the first two graphs), but on multiple step problems that require more autonomy Mythos is far better (as shown in the third graph).

This is the exact wording from my original comment

> So with that said, I think the graph under the "Cyber range results" is the important one. The ones at the top show that, yes, Mythos isn't too much better than any of the existing models on well constrained problems, but when the models are given ambiguous challenges that require multiple steps it's much, much better than anything on the market.

bonsai_spool · 2026-04-14T05:28:06 1776144486

> On individual tasks Claude and GPT are comparable

That is not what the first graphs show - the Anthropic models cluster at 'better' positions on the graph, and I imagine you could show that the values are significantly different.

PunchTornado · 2026-04-10T20:13:30 1775852010

An impostor is an impostor, no matter what the media makes them. Tbh, it's ok that the plates brake into his head since he has done so many bad things previously, he deserves it.

PunchTornado · 2026-04-09T09:47:07 1775728027

He is a billionaire who should pay taxes.

eru · 2026-04-09T10:30:58 1775730658

Depends on jurisdiction.

PunchTornado · 2026-04-01T08:34:03 1775032443

everywhere, but most important in ethics

serf · 2026-04-01T10:32:01 1775039521

your ethics.

let's not forget that these major LLMs are all the children of corporate hyper-piracy en masse, none of them are ethical even in origin unless you're talking about the pre-product company charter kind of ethics, like google .

PunchTornado · 2026-04-01T15:34:40 1775057680

You can't put anthropic and openai in the same basket regarding ethics. One accepted Department of War's conditions and the other not.

boppo1 · 2026-04-02T02:00:34 1775095234

Last I heard, claude was the model powering maven when it bombed that school. Most aren't up-to date on that because anthropic launders their culpability through palanntir. Anthropic is better at optics not ethics.

PunchTornado · 2026-04-02T20:24:48 1775161488

No matter what you say, you know yourself the truth that the DoW wanted to go over the red lines of anthropic and they said no, while openai said yes. This is as clear as day to everyone and you are just lying yourself to believe something else.

Teelo · 2026-04-01T18:29:44 1775068184

How is anthropic training their models? Surely they're not using other people's work without their permission, right?

Agentus · 2026-04-01T15:42:22 1775058142

What origins of ethics?

You use the term piracy, which potentially hints at ur biases.

American IP laws aren’t universal, and last I checked neither is it popular in Silicon Valley.

Institutions surrounding dealing with IP Piracy is an American strong arm attempt to own the unownable and to use Russel conjugates to make the flagrant attempt seem just.