Looks like Microsoft has run out of compute and can't scale it fast enough to serve copilot users and Azure AI Foundry needs, given that the customer base is growing there as well.
Productivity per dollar doesn't increase because for maturity levels 1&2 the costs for inference and extra team load (PRs quantity and size) eat up all gains. Only on level 3 one can see actual productivity impact. Most companies are between levels 1 and 2, that's where only costs are rising.
Levels: 0 - no AI, 1 - AI enabled (copilots), 2 - AI assisted (autonomous agent pipelines not on your PC) , 3 - AI measured.
Did they take into account aging and depreciation of the vehicle battery, which is crazy expensive? It makes negative sense to use v2h with current limited cycles batteries of cars. These batteries are optimized for charging speed and power density.
There are much more cheaper and better suited batteries for houses built using other chemistries, they are bigger and heavier and that's fine for a house as long as they live 10y+.
Read the fine print - there is usually a limitation on charging cycles. So battery can be out of warranty even if it's 3 years old but reached limit on charging cycles.
Yeah, I guessed so. Using it as a home battery with incur a lot more cycles I suppose. Although if the battery is large enough so that a day of powering a home only drains the battery eg. 10%, how does that factor into the cycle count? Is that somewhere in the small print maybe?
For coding it makes no sense to use any quantization worse than Q6_K, from my experience. More quantized models make more mistakes and if for text processing it still can be fine, for coding it's not.
I don't think most people realize that. Quality of tokens beats quantity of token. I always tell folks to go as high a quant as you can only go lower if you just don't have the memory capacity.
AI models like gemma4 are available in different quant "sizes", think about it as an image available in various compression levels.
The best image is the largest, takes up the most memory when loading, and while it is large and looks the best, it uses up much of your system resources.
On the other end of the spectrum there is a smaller much more compressed version of that same image. It loads quickly, uses less resources, but is lacking detail and clarity of the original image.
AI models are similar in that fashion, and the parent poster is suggesting you use the largest version of the AI model your system can support, even if it runs a little slower than you like.
On mobile the Q4 vs Q6 tradeoff flips. Gemma 4 E2B at Q4_K_M barely fits in RAM on a 6GB Android, so Q6 isn't on the table. In practice the Q4 hit shows up in tool-call reliability more than general reasoning, which is usually fine for a constrained skill surface.
For those who's looking into a good homelab servers - better look at the refurbrished/used mini-pc based on 5th gen of Intel, like i5 11500T (HP ProDesk 400 G5 Mini for example), or ryzen. You'll get better thermals, better CPU, more expansion slots for cheaper than you can get out of NUC.
On top of that, resellers also often have upgrades for RAM and NVME available. WD-Red OEM 1Tb for less than 100 dollars sounds like bargain.
This is false, you can make many plastics without fossil sources (pla, bio-pet, bio-abs, etc). The only challenge is cost and scale - it's cheaper and easier to use existing processes.
reply