Hacker Newsnew | past | comments | ask | show | jobs | submit | zihotki's commentslogin

I wonder if this benchmark brings any value. Models are already quite capable and reach high scores in it.

Check out the "The JSON-pass vs Value-Accuracy gap" section in the blog. That was an eye opener.

While most models were great at producing JSON schema, they were pretty bad at producing accurate values.

In the graph you'll is almost a 20%-30% drop between the JSON schema pass vs the value accuracy.


Looks like Microsoft has run out of compute and can't scale it fast enough to serve copilot users and Azure AI Foundry needs, given that the customer base is growing there as well.

No worries, chats soon will catch up with the ads!

No numbers/measurements/benchmarks and you dare call it "a working" one? Any real proofs that this 'works'?

Productivity per dollar doesn't increase because for maturity levels 1&2 the costs for inference and extra team load (PRs quantity and size) eat up all gains. Only on level 3 one can see actual productivity impact. Most companies are between levels 1 and 2, that's where only costs are rising.

Levels: 0 - no AI, 1 - AI enabled (copilots), 2 - AI assisted (autonomous agent pipelines not on your PC) , 3 - AI measured.


Did they take into account aging and depreciation of the vehicle battery, which is crazy expensive? It makes negative sense to use v2h with current limited cycles batteries of cars. These batteries are optimized for charging speed and power density.

There are much more cheaper and better suited batteries for houses built using other chemistries, they are bigger and heavier and that's fine for a house as long as they live 10y+.


Most of these batteries are on full warranty for 8-10 years. You should definitely make full use of it during that period.

Read the fine print - there is usually a limitation on charging cycles. So battery can be out of warranty even if it's 3 years old but reached limit on charging cycles.

It’s not. For my ford it is 8 years or 100,000 miles whichever comes first. It’s not about cycles.

How are cycles counted if the battery is not drained fully?

Does that warranty still apply if the battery is used for other applications besides it's core function of powering the car?

The car battery warranty is often for X years or Y cycles, whatever comes first.

Yeah, I guessed so. Using it as a home battery with incur a lot more cycles I suppose. Although if the battery is large enough so that a day of powering a home only drains the battery eg. 10%, how does that factor into the cycle count? Is that somewhere in the small print maybe?

I would look at your warranty, mine is 8 years or 100,000 miles. It doesn’t have a cycles stipulation.

For coding it makes no sense to use any quantization worse than Q6_K, from my experience. More quantized models make more mistakes and if for text processing it still can be fine, for coding it's not.


I don't think most people realize that. Quality of tokens beats quantity of token. I always tell folks to go as high a quant as you can only go lower if you just don't have the memory capacity.


what do you mean with that, I’m not sure I understood what you said


AI models like gemma4 are available in different quant "sizes", think about it as an image available in various compression levels.

The best image is the largest, takes up the most memory when loading, and while it is large and looks the best, it uses up much of your system resources.

On the other end of the spectrum there is a smaller much more compressed version of that same image. It loads quickly, uses less resources, but is lacking detail and clarity of the original image.

AI models are similar in that fashion, and the parent poster is suggesting you use the largest version of the AI model your system can support, even if it runs a little slower than you like.


Thank you!


Better go for a less-quantized model even if it's slower than go for a faster, quantized one.


Thank you!


On mobile the Q4 vs Q6 tradeoff flips. Gemma 4 E2B at Q4_K_M barely fits in RAM on a 6GB Android, so Q6 isn't on the table. In practice the Q4 hit shows up in tool-call reliability more than general reasoning, which is usually fine for a constrained skill surface.


For those who's looking into a good homelab servers - better look at the refurbrished/used mini-pc based on 5th gen of Intel, like i5 11500T (HP ProDesk 400 G5 Mini for example), or ryzen. You'll get better thermals, better CPU, more expansion slots for cheaper than you can get out of NUC.

On top of that, resellers also often have upgrades for RAM and NVME available. WD-Red OEM 1Tb for less than 100 dollars sounds like bargain.


> 5th gen of Intel, like i5 11500T

That's an 11th gen Intel Core CPU, not 5th.


Misconception? That's a playbook, not a misconception.


This is false, you can make many plastics without fossil sources (pla, bio-pet, bio-abs, etc). The only challenge is cost and scale - it's cheaper and easier to use existing processes.


And how exactly do you think all of those agricultural products are produced? They require an insane amount of diesel fuel and nitrogen fertilizers…


> challenge is cos

So you can’t.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: