I’m not understanding the implication of this, could you explain?

DasIch · on Jan 6, 2020

When you write to a file, you generally don't write to physical storage. Instead the writes get buffered in memory and written to physical storage in batches. This substantially improves performance but creates a risk: If there is some sort of outage before the data is flushed to disk, you might lose data.

In order to address that risk, you can explicitly force data to be written to disk by calling fsync. Databases generally do this to ensure durability and only signal success after fsync succeeded and the data is safely stored.

So ClickHouse not calling fsync implies that it might lose data in case of a power outage or a similar event.

hodgesrm · on Jan 8, 2020

Most ClickHouse installations run replication for availability and read scaling. If you do get corrupted data for some reason, you can read it back from another replica. That's much more efficient than trying to fsync transactions, especially on HDD. The performance penalty for fsyncs can be substantial and most users seem to be pleased with the trade-off to get more speed.

This would obviously be a poor trade-off for handling financial transactions or storing complex objects that depend on referential integrity to function correctly. But people don't use ClickHouse to solve those problems. It's mostly append-only datasets for analytic applications.

slt2021 · on Jan 7, 2020

is this really that important, thought, since all servers feed power from uninterruptible power supply and most data centers have multiple power sources.

DasIch · on Jan 7, 2020

It’s a significant deviation from what I would expect from a disk oriented database. So I would definitely expect it to be well documented, along with the reason for it, why the developers believe it is a reasonable (or even safe) choice, what assumptions went into that (such as availability of uninterruptible power supply) etc.

Additionally keep in mind that with EBS most people probably use network attached storage and fsync involves the network. Outage doesn’t just mean power outage, it could also be a network issue.

caust1c · on Jan 6, 2020

The implication is that clickhouse can't easily support transactional queries. That's why it's an OLAP not OLTP database. (On-Line Analytics Processing vs On-Line Transaction Processing).

codexon · on Jan 7, 2020

This is not the implication at all.

Clickhouse can easily add fsync, they just choose not to do it.

Mongodb also did not use fsync and was ridiculed for it, yet no one mentions this about clickhouse.

pritambaral · on Jan 7, 2020

> Mongodb also did not use fsync and was ridiculed for it, yet no one mentions this about clickhouse.

MongoDB claimed to be a replacement for RDBMS-es (which includes OLTP). ClickHouse is explicit about being OLAP-only. MongoDB also hid the fact that they weren't doing fsync, especially when showing off "benchmarks" against OLTP RDBMS-es, while ClickHouse has not tried to show themselves as a replacement for OLTP RDBMS-es.

> Clickhouse can easily add fsync, they just choose not to do it.

For good reason. It's not a simple matter of choosing one of two options. The choice has consequences: performance.

codexon · on Jan 7, 2020

I can't find any evidence showing that OLAP means it is okay to lose data from unexpected shutdowns. How can you have correct analytics without a complete set of data?

> For good reason. It's not a simple matter of choosing one of two options. The choice has consequences: performance.

It is a simple matter though. They can choose to sacrifice performance for data durability which I suspect would not be impacted very much since clickhouse acts like an append log. It just seems that Yandex doesn't care much for durability since they are just using the database to store people's web traffic. They wouldn't care if some of that data is lost so they don't use fsync.

pritambaral · on Jan 20, 2020

> I can't find any evidence showing that OLAP means it is okay to ...

OLAP also doesn't mean "be the source of truth of the data". You can have a separate source of truth of the "complete set of data" outside of your OLAP engine and load (and reload) data into your OLAP engine any time you're not sure if you have the "complete set of data" in it.

The important difference lies in how often one finds themselves in that situation. In OLAP, the sheer majority of the time is spent querying (i.e., reading) data than loading (i.e., writing) data and waiting for it to be durably saved (i.e., fsync-ed). Because of this imbalance, it makes sense to prioritise for one scenario and handle the other sub-optimally.

> They wouldn't care if some of that data is lost so they don't use fsync.

Or, they can still care about data correctness and simply re-load data they suspect is/may not consistent in the rare case of an improper shutdown. It's not like they use ClickHouse as their primary data store.

jayleeg · on Jan 7, 2020

To add to pritambaral comments.

The top commercial high performance timeseries databases, which ClickHouse can usually best, used by banks to make decisions on your money also don't use fsync. You can literally quit the software and watch your transaction data be written out 5 seconds later.

Edit: a word

FridgeSeal · on Jan 7, 2020

Oh that’s not too bad, they’re very explicit about not having transaction support, thanks for explaining.