As Ops (DevOps/Sysadmin/SREish) person here, excellent article. However, as alwa...

binarylogic · 2026-01-14T17:26:22 1768411582

100% accurate. It is very much political. I'd also add that the problem is perpetuated by a disconnection between engineers who produce the data and those who are responsible for paying for it. This is somewhat intentional and exploited by vendors.

Tero doesn't just tell you how much is waste. It breaks down exactly what's wrong, attributes it to each service, and makes it possible for teams to finally own their data quality (and cost).

One thing I'm hoping catches on: now that we can put a number on waste, it can become an SLO, just like any other metric teams are responsible for. Data quality becomes something that heals itself.

stackskipton · 2026-01-14T18:28:46 1768415326

I'd be shocked if you can accurately identify waste since you are not ultimately familiar with the product.

Sure, I've kicked over what I thought was waste but told it's not or "It is but deal Ops"

binarylogic · 2026-01-14T18:39:25 1768415965

You're right, it's not always binary. That's why we broke it down into categories:

https://docs.usetero.com/data-quality/logs/malformed-data

You'd be shocked how much obviously-safe waste (redundant attributes, health checks, debug logs left in production) accounts for before you even get to the nuanced stuff.

But think about this: if you had a service that was too expensive and you wanted to optimize the data, who would you ask? Probably the engineer who wrote the code, added the instrumentation, or whoever understands the service best. There's reasoning going on in their mind: failure scenarios, critical observability points, where the service sits in the dependency graph, what actually helps debug a 3am incident.

That reasoning can be captured. That's what I'm most excited about with Tero. Waste is just the most fundamental way to prove it. Each time someone tells us what's waste or not, the understanding gets stronger. Over time, Tero uses that same understanding to help engineers root cause, understand their systems, and more.

nextaccountic · 2026-01-14T19:49:04 1768420144

I would like to just have a storage engine that can be very aggressive at deduplicating stuff. If some data is redundant, why am I storing it twice?

HumanOstrich · 2026-01-14T20:34:13 1768422853

That's already pretty common, but the goal isn't storing less data for its own sake.

nextaccountic · 2026-01-15T01:44:34 1768441474

> the goal isn't storing less data for its own sake.

Isn't it? I was under impression that the problem is the cost storing all this stuff

HumanOstrich · 2026-01-15T06:04:37 1768457077

Nope, you can't just look at cost of storage and try to minimize it. There are a lot of other things that matter.

nextaccountic · 2026-01-15T07:54:23 1768463663

What I am asking is, what are the other concerns other than literally the cost? I have interest in this area and I am seeing everyone saying that observability companies are overcharging their consumers.

HumanOstrich · 2026-01-15T08:21:07 1768465267

We're currently discussing the cost of _storage_, and you can bet the providers already are deduplicating it. You just don't get those savings - they get increased margins.

I'm not going to quote the article or other threads here to you about why reducing storage just for the sake of cost isn't the answer.

nextaccountic · 2026-01-16T01:49:39 1768528179

Well, that's a weirdly confrontational reply. But thanks

xmprt · 2026-01-14T20:22:07 1768422127

The first step to solving this is correct cost attribution. And then once you do that, it's easy to go to org leads and tell them that their logs are costing them $X and you can save them 40% by applying these suggestions. They'll be happy to accept your help at that point. But if the costs are all on the Ops team, then why would the product teams care about any cost optimizations which just takes away development time from them.