Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Parquet is columnar storage, which is much faster for querying. And typically for protobuf you deserialize each row, which has a performance cost - you need to deserialize the whole message, and can't get just the field you want.

So, of you want to query a giant collection of protobufs, you end up reading and deserializing every record. For parquet, you get much closer to only reading what you need.



Thank you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: