Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I save 2 billion rows of timeseries data every year. I use a regular btree index for "hot data" that is less than 6 months old and BRIN index for older data. You can do this by writing a functional index.

You also have to spend some time to tune the query cost settings to avoid sequential scans if you're only gonna work with a subset of the data. Another optimization could be implementing table inheritance so you have a table for every year. If you work with data sets for a specific year you would get a big performance boost with sequential scans.

PostgreSQL's biggest weakness at the moment is aggregating data by using several cpu workers/cores. This is coming in PostgreSQL 9.6

Oh and I run PostgreSQL on ZFS with LZ4 compression,



> Oh and I run PostgreSQL on ZFS with LZ4 compression

I'm generally a big advocate of ZFS, but I heard that COW file systems (ZFS and btrfs) are generally not good choice for a database workload.

How does it perform for you?


It performs as well as any other file system, but ZFS shines when you are IO bound as compression really helps when you do sequential scans.

I have also tested ZFS with Microsoft SQL Server by exporting a ZVOL over iSCSI(FreeBSD) over 10G ethernet. But without compression as it has no benefit on 4k blocks. Performance was similar to what you would get with the same drives striped on Windows Server 2012. The big win here is of course ZFS's data checksumming. Not sure about snapshot as backup though, I need to figure out how to talk to the Windows SQL Writer Service so it can tell SQL Server to flush and lock so I can take a consistent snapshot. Microsoft really needs to improve their documentation, because this would be really helpful for several enterprises when it comes to backup speed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: