Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would suggest to look onto Delta Lake (https://delta.io/) - it's built on top of the Parquet, but has advantages over the plain parquet:

- transactions - you don't get a garbage in your table if your write failed

- supports update/delete/merge operations (in some implementations)

- metadata allows faster discovery of data, for example, if you have a lot of partitions on the cloud storage.

- metadata also allow to support features like data skipping, when you can filter out files that doesn't contain necessary data

- time travel - you can get back to previous versions of the tables



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: