Hacker Newsnew | past | comments | ask | show | jobs | submit | KronisLV's commentslogin

> This is key. In my experience, asking an LLM why it did something is usually pointless. In a subsequent round, it generally can't meaningfully introspect on its prior internal state, so it's just referring to the session transcript and extrapolating a plausible sounding answer based on its training data of how LLMs typically work.

Yep, I've gotten used to treating the model output as a finished, self-contained thing.

If it needs to be explained, the model will be good at that, if it has an issue, the model will be good at fixing it (and possibly patching any instructions to prevent it in the future). I'm not getting out the actual reason why things happened a certain way, but then again, it's just a token prediction machine and if there's something wrong with my prompt that's not immediately obvious and perhaps doesn't matter that much, I can just run a few sub-agents in a review role and also look for a consensus on any problems that might be found, for the model to then fix.


> There's not a lot of churn in Unity

Deprecating entire languages (probably the right call long term, oddly enough Godot is keeping GDScript around).

Render pipelines (URP and HDRP to be merged, built in likewise being deprecated).

Most of the things around DOTS.

Most of the things around networking.

The whole package management system (I mean, it’s a nice idea, still counts as churn).

Also multiple approaches to UI and input.

I would say that a lot of the new stuff is built on good ideas but I sometimes wish they’d slow down a bit and ship actually thought out and finished features.

And most of the above already had alternatives previously, this isn’t even getting into wholly new things like Sentis, if you are working on an old project thankfully it will mostly keep working, but if you need to keep up to date with the current mechanisms and practices, there’s a lot of churn.

Maybe not as much as in Godot, but to be honest that other engine is going through a lot of evolution so instability and a feature explosion is to be expected (despite terrain still being a plugin, while Unity’s own terrain implementation is oddly dated and abandoned, while their water system felt like a tech demo and more often than not the asset store is expected to do the heavy lifting), while Unity tries to appeal to everyone and capture headlines it seems. Just look at the marketing around DOTS (and how long it took for it to be usable).


A lot of the things you mention there have been in development for the better part of 10 years already, and still haven't reach a stable, mature and production-ready state yet. Unity have also kept deprecated stuff around for much longer than they should have, which sounds great on paper for backward compatibility, but it just means they're lugging years of technical debt with them and it's slowed them down immensely.

Looking at the past year of Unity updates, since 6.1 or so, it seems that most of the focus is now going to refactoring major parts of the engine to facilitate backporting HDRP's feature set to URP. It's all good work and high time they did some cleanup and committed to a single standardized render pipeline, but it's not exactly moving the needle forward very much yet.


> (probably the right call long term, oddly enough Godot is keeping GDScript around).

Even in Unity 4, it was like "You can write scripts in C#, or you can also write them in Unityscript or Boo, I suppose..."

In comparison for Godot, GDScript is very often the happy path. Things like web publishing work more seamlessly with GDScript than with C# (and miles more than GDNative/GDExtension). So I don't think it's nearly as likely to get deprecated as Unity's scripting languages.


> oddly enough Godot is keeping GDScript around

Godot community is practically all GDScript. If they removed GDScript it'd be more akin to Unity removing C# while keeping Boo.


GDScript is great, what do you mean? And I am a language snob that would be quite happy writing games in Lisp rather than Python-likes.

> GDScript is great, what do you mean?

They do have a pretty nice page talking a bit more about GDScript: https://docs.godotengine.org/en/stable/about/faq.html#what-i...

The obvious additional good from this approach is that you no longer couple everything so tightly with just one scripting language, so in some ways it's a bit easier to introduce bindings for other languages. The downside, however, is that it leads to fragmentation in the community, duplication of effort (suddenly you need the double amount of docs), more difficulty and friction in development with new mechanisms demanding that you take the limitations of both languages in mind, oh and if you dare to write your own language then suddenly you have to support both it, as well as any dev tools for it. For what was and still is a communtiy project without strong financial backing (just look at how much money Unity burns on useless stuff, and how much more the people behind Godot could do with that money), that's quite the gamble.

Maybe if they focused on the core engine more, then deltaV: Rings of Saturn wouldn't have odd performance issues for a 2D game and wouldn't get confused with multi-monitor setups. Maybe Road To Vostok wouldn't crash when loading into the Village map on an Intel Arc GPU, and also start on the wrong monitor same as deltaV. Maybe even demos like "Realistic Jungle Demo" on itch.io wouldn't have all white textures on said Intel Arc GPU. Maybe we'd be two or three whole releases of features ahead by now, if all of the hours spent on GDScript to date would have been spent elsewhere.

On the other hand, there's no guarantee that any of those devs would fix the other issues if their efforts were to be redirected. Similarly, if they didn't try with GDScript, the community would be smaller, due to its ease of prototyping, and being simpler and more approachable for the folks who don't know C# yet, even if it's also unnecessary to the folks who like tools like Rider/Visual Studio or are coming from Unity or engines with scripting in C# or just C++. I'm pretty split on it overall.


> SQL is not hard enough to require an LLM to think about for you

As someone who's seen queries that are hundreds of lines long, involve a bunch of CTEs, nested SELECTs as well, upwards of a dozen joined tables with OTLT and EAV patterns all over the place (especially the kind of polymorphic links where you get "type" not "table_name" so you also need to look at the app code to understand it), I'd say that SQL can be too hard for people to reason about well.

Bonus points for having to manually keep like 5 Oracle package contents in your working memory cause that's where the other devs on the 10 year old project stored some of the logic, while the remainder is sort-of-dynamic codegen in the app.

Same as with most app code, it shouldn't be like that, but you sometimes get stuff that is really badly developed and the cognitive load (both to inherent and accidental complexity) will increase until people will just miss things and not have the full picture.


> It was easy to get comfortable with using the best model at the highest setting for everything for a while, but as the models continue to scale and reasoning token budgets grow, that's no longer a safe default unless you have unlimited budgets.

For a while I used Cerebras Code for 50 USD a month with them running a GLM model and giving you millions of tokens per day. It did a lot of heavy lifting in a software migration I was doing at the time (and made it DOABLE in the first place), BUT there were about 10 different places where the migration got fucked up and had to manually be fixed - files left over after refactoring (what's worse, duplicated ones basically), some constants and routes that are dead code, some development pages that weren't removed when they were superseded by others and so on.

I would say that Claude Code with throwing Opus at most problems (and it using Sonnet or Haiku for sub-agents for simple and well specified tasks) is actually way better, simply because it fucks things up less often and review iterations at least catch when things are going wrong like that. Worse models (and pretty much every one that I can afford to launch locally, even ones that need around ~80 GB of VRAM in the context of an org wanting to self-host stuff) will be confidently wrong and place time bombs in your codebases that you won't even be aware of if you don't pay enough attention to everything - even when the task was rote bullshit that any model worth its salt should have resolved with 0 issues.

My fear is that models that would let me truly be as productive as I want with any degree of confidence might be Mythos tier and the economics of that just wouldn't work out.


I have this exact same fear as an IC.

I wonder if Engineering Managers have this same fear, or they’re used to having to distribute complex tasks to senior engineers and gamble with seeming less risky tasks to juniors that may leave ticking time bombs in their code. Just the nature of code written by agents or humans?


Yes, that definitely happens as an EM. You want your Senior/Staff engineers to architect out the new high-risk functionality into a doc for review. Then that Staff engineer either implements or has a junior/senior under their wing helping implement some of the scaffolding.

In this [common] paradigm the Staff Engineer acts as a architect/programmer and project manager in one. The EM should be there to guide and unblock.


Yes, that is absolutely a dynamic in managing an engineering team, and I'd argue that knowing the right person to give a particular task to, and how much detail they're going to need to get it done, is what separates good engineering managers from bad ones.

The GLM-4.7 model isn't that great. I was on their $200/month plan for a while. It was really hard to keep up with how fast it works. Going back to Claude seems like everything takes forever. GLM got much better in 5.1 but Cerebras still doesn't offer that yet (it's a bit heavier). I have a year of Z.ai that I got as a bargain and I use GLM-5.1 for some open source stuff but I am a bit nervous about sending data into their API.

The new one is quite a bit heavier!

GLM 4.7 is 358B parameters: https://huggingface.co/zai-org/GLM-4.7

GLM 5.1 is 754B parameters: https://huggingface.co/zai-org/GLM-5.1

That said, 5.1 is indeed a bunch better and I could definitely see myself using it for some tasks! Sadly all of the stuff I can actually run locally is still trash (I appreciate the effort behind Qwen 3.6, Gemma 4 and Mistral Small 4 though, alongside others).


Good points. I was speaking from a position of using an LLM in a pair programming style where I'm interactive with each request.

For handing work off to an LLM in large chunks, picking the best model available is the only way to go right now.


I'm on the 100 USD plan with Anthropic, I hit the 5 hour limits about 75% of the time during working hours, but almost never the weekly ones - by the time they're reset I've usually used up between 50% - 75% of the quota. There are periods of more intense usage ofc, but this is the approx. situation I'm in (also it doesn't work on tasks while I'm asleep, because I occasionally like having a look at WIP stuff and intervene if needed).

The Anthropic 20 USD plan would more or less be a non-starter for agentic development, at least for the projects that I work on, even while only working on a single codebase or task at a time (I usually do 1-3 at a time).

I would be absolutely bankrupt if I had to pay per-token. That said, I do mostly just throw Opus at everything (though it sometimes picks Sonnet/Haiku for sub-agents for specific tasks, which is okay), so probably not a 100% optional approach, but I've wasted too much time and effort in the past on sub-optimal (non-SOTA) models anyways. I wonder which is closer to the actual cost and how much subsidizing there is going on.


The $200 openai plan feels like 10x the limit as the $100 claude plan.

But Opus is both smarter and faster than GPT, so I can get a lot more done during the Claude limits.


for now... right now you are getting 2x usage as a promo

Concur, re the ratio of weekly vs hourly limits: I hit the hourly one much more often than weekly.

I wonder how this one compares to Qwen3 Coder Next (the 80B A3B model), since you'd think that even though it's older, it having more parameters would make it more useful for agentic and development use cases: https://huggingface.co/collections/Qwen/qwen3-coder-next

Random test site for the consumer side: https://test-ipv6.com/

0/10 in Latvia with a local ISP, fun times.


You can try out a bunch of models on OpenRouter and see what works for you. Paying per token might be too expensive long term, but definitely a good way to figure out which models you like, and then look at providers.

The other big ones would be OpenAI with Codex and Google with their Gemini and their CLI or Antigravity. Or various IDE plugins or something like OpenCode on the tooling side. GitHub Copilot is pretty cheap and gives you basically unlimited autocomplete and generous monthly quotas that let you try out the most popular models. Also GLM 5.1 is pretty decent if you want to look at other subscriptions. Cerebras Code gave you a lot of tokens but their service wasn’t super stable last I tried and they also don’t give you the latest models.

Personally I just stick with Claude and the 100 USD Max subscription cause it still works really well, even the latest update today to the desktop app made it better (was slow and buggy a month ago, has been gradually getting better) and the Chrome plugin lets me get fully autonomous loops working.


> we have a custom .yaml spec for data pipelines in our product and the agent follows it as well as anything in the training data.

Doesn't this end up being way more expensive, because you don't pay for model parameter activations but for the tokens in/out, meaning that anything not in the training data (and therefore, the model) will cost you. I could make Opus use a new language I came up with if I wanted to and it'd do an okay job with enough information... but it'd be more expensive and wasteful than just telling it to write the same algorithms in Python, and possibly a bit more error prone. Same with frameworks and libraries.


I've used Garage to some success, the garage.toml configuration file could be a bit more user friendly https://garagehq.deuxfleurs.fr/documentation/reference-manua... but with some tweaks there I could get it working nicely both for HDD and SSD use cases, e.g. storing a bunch of Sentinel-2 satellite tiles, alongside thumbnails and some metadata.

SeaweedFS and RustFS both look nice though, last I checked Zenko was kinda abandoned.

Also pretty simple when I can launch a container of whatever I need (albeit bootstrapping, e.g. creating buckets and access keys SHOULD probably done with env variables the same way how you can initialize a MySQL/MariaDB/PostgreSQL instance in containers) and don't have to worry too much about installing or running stuff direcetly on my system. As for unsupported features - I just don't do the things that aren't supported.


I’ve recently switched from Minio and Localstack to Garage. For my needs (local testing) Garage seems to be fine. It’s a bit more heavyweight and capable than I need now, but I like that it may give me the option of having an on-premises alternative to S3-compatible stores hosted in the cloud. The bootstrapping is a pain in the ass (having to assign nodes to storage and gateway roles, applying the new roles, etc). It would be great to be able to bootstrap at least a simple config using environment variables. However, now that I have figured out the quirks of bootstrapping, it just works (so far; again, I’m not doing anything complicated).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: