Many of the NoSQL essentially takes us back to 60s before Codd came up with relational model[1] These ideas tend to come back once in a while [2][3], but so far nothing is better than relational model.
NoSQL still makes sense in many cases (generally when your specific use case does not need all guarantees of ACID), you can get in return higher performance or horizontal scalability.
MongoDB is aim to be generic database but rides on the wave of NoSQL "coolness". It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.
Currently MongoDB is outperformed by Postgres. In fact there is an application called ToroDB which provides protcol compatibility for Postgres that emulates MongoDB and even that outperforms Mongo. Mongo also doesn't scale well horizontally, so essentially you don't really don't get any significant advantage.
[4] For example they started with mmap'ed memory regions to store the data. Did not initially use fsync() to make sure data is saved on disk. In a way it reminds me of MySQL several years ago. It's much better now than it was in the past, but it has a lot of warts left of from the past.
MongoDB is really, really bad. I've never come across another product that was so horrible and yet so widely used.
MongoDB: for when you don't need consistency, availability, or partition tolerance.
There are some really good NoSQL products out there. I seriously think RethinkDB is on par with Postgres. I've also used Cassandra and BerkeleyDB and they're both decent. But unless some core part of your business logic is pathological to implement in SQL (like Reddit's comment trees) you should go with Postgres.
Neither HAVING nor GROUP BY are part of the relational model, though. Really, aggregation isn't part of the traditional relational model. People just tend to associate it with RDBMSes.
I hate hearing absolutist dogma like this. Some things are faster in Postgres, some things are faster in Mongo. We are migrating our analytics db from Mongo to PG but we hit a wall because SELECT COUNT(DISTINCT x)) performs abysmally in Postgres. Mongo's aggregation framework is an immature PITA but it performs this little trick well enough that we're pretty much stuck keeping Mongo around unless we want to use MSSQL or Oracle or something else with a better (and much more expensive) query planner.
We still love (and prefer) Postgres but it is not a pareto improvement. There are always tradeoffs, and this kind of fanboyism just speaks to inexperience.
> There's no reason for SELECT COUNT(DISTINCT x)) to perform badly in Postgres, as long as you have an appropriate table design and indexes.
Meh. Postgres' planner doesn't know how to generate a skip-scan/loose index scan for DISTINCT. You can write it yourself, but it's a bit painful:
https://wiki.postgresql.org/wiki/Loose_indexscan
If you have a low cardinality that can be a huge efficiency difference.
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=10713.04..11381.06 rows=10 width=5) (actual time=1010.383..1073.263 rows=11 loops=1)
Group Key: calc
-> Sort (cost=10713.04..10935.68 rows=89056 width=5) (actual time=1010.321..1049.189 rows=89041 loops=1)
Sort Key: calc
Sort Method: external merge Disk: 1392kB
-> Seq Scan on price_history (cost=0.00..3391.56 rows=89056 width=5) (actual time=0.007..20.516 rows=89041 loops=1)
Planning time: 0.074 ms
Execution time: 1076.521 ms
(8 rows)
With Index:
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.29..2804.82 rows=10 width=5) (actual time=0.117..47.381 rows=11 loops=1)
Group Key: calc
-> Index Only Scan using price_history_calc_idx on price_history (cost=0.29..2359.52 rows=89041 width=5) (actual time=0.054..18.579 rows=89041 loops=1)
Heap Fetches: 83
Planning time: 0.208 ms
Execution time: 47.416 ms
(6 rows)
Actually that is called a index only scan, and happens when you have a data type that is inside your index. Which means if you need a aggregate you could try to index everything you need. Mostly a aggregate only contains some values of a row so a index is mostly not a problem.
I didn't say an index couldn't be used at all. Just not to actually make the query fast. This will get all duplicates for a value from the index, before going to the next value. If you have a couple thousand or more of each to be counted value that'll make the query rather slow.
We are migrating our analytics db from Mongo to PG but we hit a wall because SELECT COUNT(DISTINCT x)) performs abysmally in Postgres.
An alternative is PipelineDB, which is built on Postgres (and drop-in compatible, supposedly) and should provide an efficient implementation for those kinds of queries: https://www.pipelinedb.com/
> It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.
This sounds exactly like what MySQL development looked like to me early in its rise to popularity.
I conjecture that it's rare that someone just needs a document DB.
The main problem someone is thinking about may be solved by a document DB, but then they also have 1000 other problems they didn't think about that are a horrible fit for a document DB.
And due to the nature of databases, where data has greater value when combined, using a new database system for every application isn't a great option.
SQL is great for business data, and passable-to-good for everything else. So the only way to beat it is by being way better at some specific thing, and finding users where that specific thing is so important that it's OK if the data is on an island.
Can't speak for the original poster, but at this point Postgres is a very capable and performant document-store database as well with the addition of the JSONB type and associated indexing and modification (in 9.5) functions. It's also more mature, has generally better durability than MongoDB, and it is easy to fall back/extend into relational queries on your document-store data if need be.
IMHO, unless you have a situation such that Mongo's horizontal scaling is actually required, it's better to use Postgres even if you're doing document-store stuff.
Because MongoDB is chasing a moving target. Hype, built in sharding and built in automatic failover are what it has now. The PostgreSQL project was strong when Mongo was conceived and has been accelerating. PostgreSQL has already gobbled up and improved on other MongoDB features like BSON and is where you go when you grow out of MongoDBs more primitive document model. Easy sharding and failover will fall too, years before MongoDB catches up with other must have features like transactions (you will need them eventually) or a half decent query language so you can actually maintain your code rather than having to plan your own queries and encode them as a sequence of JSON documents.
The only time I would recommend Mongo is for storing geo-spatial data, as it has several built ins that make it much easier to work with. Even then I find it a lot more convenient to keep that data in Postgres and replicate it to Mongo.
Apologies, when I was talking about horizontal scaling I meant compared to other NoSQL databases[1].
The biggest selling point of NoSQL databases really is increasing performance and horizontal scalability due to removing some guarantees that RDBS provides. With MongoDB you get neither of the advantages.
Generally with Postgres you can't scale easily, unless you only do reads, but postgres outperforms MongoDB. You can get some marginal improvement by running multiple MongoDB nodes, but a single Postgres will still outperform it.
Also with benchmarks like this, it is often overlooked that in a relational database you often can store data in a smarter way and get extra boost of performance.
For example in my previous job we had 3 MongoDB which were used for mapping IP address to a zip code. Those databases were 12GB each. They run on a beefy instances in AWS because they wanted to make sure all the data could fit in RAM.
I did a POC and put the same data in Postgres and it essentially was just 600MB of data. All queries were sub milisecond, on smallest instance.
How come? Well in Postgres I stored ranges of IPs using ip4r extension. And put an GiST index which works with ranges. Mongo did not understand IP addresses so what they essentially did is they computed every possible IPv4 address and stored it into 64 bit integer, then they placed an index which probably was bigger than the data itself.
Hm, fast bulk write is an area where PostgreSQL generally handily beats MongoDB. I do not see why it would be different for geodata. MongoDB beats PostgreSQL for some things, but not bulk load.
Why? Especially after point #1 and assuming the document-store was a good fit for the data model.