what are you on about? most people i know have 4 year old phones which are just fine and want only the battery changed. my phone was 6 years old this year and it hurt me that I had to change it because of the battery. otherwise it was a perfectly fine phone.
neah, I believe most people here, which immediately brag about codex, are openai employees doing part of their job. otherwise I couldn't possibly phantom why would anyone use codex. In my company 80% is claude and 15% gemini. you can barely see openai on the graph. and we have >5k programmers using ai every day.
Currently GPT just works much better, and so does Gemini but it's more expensive right now. Going through Opencode stats, their claim is that Gemini is the current best model followed by GPT 5.4 on their benchmarks, but the difference is slim.
My personal experience is best with GPT but it could be the specific kind of work I use it for which is heavy on maths and cpp (and some LISP).
Can you explain where you're seeing that? From what I see, the first two graphs have OpenAI models above Claude models (including Mythos) on the Technical Non-Expert and the Practitioner evals. Mythos now beats Codex 5.3 on the Expert eval and Opus was already on top for the Apprentice one although now Mythos leads there.
So, even including Mythos, OpenAI still has 2 models on top for the 4 evals listed.
> From what I see, the first two graphs have OpenAI models above Claude
That's just in that final graph, and that graph is perhaps the least instructive - they talk about ranges of outcomes but they don't show whether all of the models besides Mythos / Opus 4.6 overlap
Take a look at all three graphs together and it's clear Anthropic are doing better in this arena
Yes. I know. That was exactly what I said in my first comment.
On individual tasks Claude and GPT are comparable (as shown in the first two graphs), but on multiple step problems that require more autonomy Mythos is far better (as shown in the third graph).
This is the exact wording from my original comment
> So with that said, I think the graph under the "Cyber range results" is the important one. The ones at the top show that, yes, Mythos isn't too much better than any of the existing models on well constrained problems, but when the models are given ambiguous challenges that require multiple steps it's much, much better than anything on the market.
> On individual tasks Claude and GPT are comparable
That is not what the first graphs show - the Anthropic models cluster at 'better' positions on the graph, and I imagine you could show that the values are significantly different.
An impostor is an impostor, no matter what the media makes them. Tbh, it's ok that the plates brake into his head since he has done so many bad things previously, he deserves it.
let's not forget that these major LLMs are all the children of corporate hyper-piracy en masse, none of them are ethical even in origin unless you're talking about the pre-product company charter kind of ethics, like google .
Last I heard, claude was the model powering maven when it bombed that school. Most aren't up-to date on that because anthropic launders their culpability through palanntir. Anthropic is better at optics not ethics.
No matter what you say, you know yourself the truth that the DoW wanted to go over the red lines of anthropic and they said no, while openai said yes. This is as clear as day to everyone and you are just lying yourself to believe something else.
You use the term piracy, which potentially hints at ur biases.
American IP laws aren’t universal, and last I checked neither is it popular in Silicon Valley.
Institutions surrounding dealing with IP Piracy is an American strong arm attempt to own the unownable and to use Russel conjugates to make the flagrant attempt seem just.
reply