Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a point of reference, I routinely do fast-twitch analytics on tens of TB on a single, fractional VM. Getting the data in is essentially wire speed. You won't do that on Spark or similar but in the analytics world people consistently underestimate what their hardware is capable of by something like two orders of magnitude.

That said, most open source tools have terrible performance and efficiency on large, fast hardware. This contributes to the intuition that you need to throw hardware at the problem even for relatively small problems.

In 2024, "big data" doesn't really start until you are in the petabyte range.



> "most open source tools have terrible performance and efficiency on large, fast hardware."

What do you use?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: