Postgres has been my DB of choice for nearly a decade. The
only times I wind up working with another db are because:
(1 it is a better technical fit for a very specific problem
(2 there is already a legacy db in place
I have been voted down at a couple of startups that wanted to run a "MEAN" stack, invariably all of those startups moved from MongoDB or shutdown.
The only time I will advocate for anything other than Postgres is when Wordpress is involved. If the data model is simple enough then MySQL is more than up for the task, and it avoids an additional database dependency.
Thankfully all the ORM's that are worth using support MySQL and Postgres, so using both is very doable.
### Useful Postgres (or SQL in general) tools/libraries :
I have been using it a lot lately, and do like it.
It isn't a popular opinion on HN, but I will still advise for Oracle or SQL Server in terms of tooling, cluster scaling, server side programming and DB drivers.
Then again, we work with customers whose Oracle and SQL Server licenses costs aren't an issue.
Have an upvote, I'm not sure why you're being down-voted.
If you take the issues of open-source and licensing out of the equation (both important issues in their own right, but not related to the point at hand) then Oracle and SQL Server are both ridiculously good.
I personally try to avoid them (due to the cost, and the lock-in) but they are astoundingly performant and featureful RDBMSs with a huge amount of support and documentation behind them.
Downvotes originate from open source fanatics. Odd it results in downvotes though; my first thought is also a fairly retaliatory "commercial offerings, why?!", but I'd never downvote for it. tips hat
I would also like to mention Objection[1] it is very minimalistic and doesn't get into your way. Most of the time you are writing queries with Knex[2] or you can write raw sql if you feel like it. I tried a lot of ORMs for node and this is the one I liked the most. The guy working on it is also very responsive and have superb documentation.
We've been using Objection over bookshelf recently and we really like it. One thing that caused us to choose Objection over bookshelf is because bookshelf doesn't support composite keys:
The only issue that I have with this top-most comment is that it presents PG as a silver bullet. But, there are a lot of different types of databases for a reason.
For instance, at my current startup, we employ at least 7 different databases (including PG). And, I don't say that to brag - each has a specific use for the problem at hand.
You have to consider the needs and trade offs of your specific project. And, if you can, try to isolate your storage behind some interface. Because often your needs will change.
(I say this with 20+ years of experience and over a dozen commercially-successful products in my belt)
Not disagreeing necessarily, but having many different databases has a cost.
Data has more value when combined, so a wise database choice also depends on what you already have. You can combine data from different systems (e.g. FDWs, which postgres has great support for), but something is usually lost along the way.
I think postgres is a good default, because it is good for a lot of things. But sure, if you make an informed decision otherwise, there's nothing wrong with that.
It is true, that you should not chose blindly, but a majority of use cases do require a relational database. You use one of specialized databases when you have a specialized use case.
The reason why Postgres is so popular here is because if you want to have an open source solution, Postgres is the best available. I'm sure there are better proprietary solutions, but even though it is free, PG is pretty darn close to them.
Many of the NoSQL essentially takes us back to 60s before Codd came up with relational model[1] These ideas tend to come back once in a while [2][3], but so far nothing is better than relational model.
NoSQL still makes sense in many cases (generally when your specific use case does not need all guarantees of ACID), you can get in return higher performance or horizontal scalability.
MongoDB is aim to be generic database but rides on the wave of NoSQL "coolness". It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.
Currently MongoDB is outperformed by Postgres. In fact there is an application called ToroDB which provides protcol compatibility for Postgres that emulates MongoDB and even that outperforms Mongo. Mongo also doesn't scale well horizontally, so essentially you don't really don't get any significant advantage.
[4] For example they started with mmap'ed memory regions to store the data. Did not initially use fsync() to make sure data is saved on disk. In a way it reminds me of MySQL several years ago. It's much better now than it was in the past, but it has a lot of warts left of from the past.
MongoDB is really, really bad. I've never come across another product that was so horrible and yet so widely used.
MongoDB: for when you don't need consistency, availability, or partition tolerance.
There are some really good NoSQL products out there. I seriously think RethinkDB is on par with Postgres. I've also used Cassandra and BerkeleyDB and they're both decent. But unless some core part of your business logic is pathological to implement in SQL (like Reddit's comment trees) you should go with Postgres.
Neither HAVING nor GROUP BY are part of the relational model, though. Really, aggregation isn't part of the traditional relational model. People just tend to associate it with RDBMSes.
I hate hearing absolutist dogma like this. Some things are faster in Postgres, some things are faster in Mongo. We are migrating our analytics db from Mongo to PG but we hit a wall because SELECT COUNT(DISTINCT x)) performs abysmally in Postgres. Mongo's aggregation framework is an immature PITA but it performs this little trick well enough that we're pretty much stuck keeping Mongo around unless we want to use MSSQL or Oracle or something else with a better (and much more expensive) query planner.
We still love (and prefer) Postgres but it is not a pareto improvement. There are always tradeoffs, and this kind of fanboyism just speaks to inexperience.
> There's no reason for SELECT COUNT(DISTINCT x)) to perform badly in Postgres, as long as you have an appropriate table design and indexes.
Meh. Postgres' planner doesn't know how to generate a skip-scan/loose index scan for DISTINCT. You can write it yourself, but it's a bit painful:
https://wiki.postgresql.org/wiki/Loose_indexscan
If you have a low cardinality that can be a huge efficiency difference.
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=10713.04..11381.06 rows=10 width=5) (actual time=1010.383..1073.263 rows=11 loops=1)
Group Key: calc
-> Sort (cost=10713.04..10935.68 rows=89056 width=5) (actual time=1010.321..1049.189 rows=89041 loops=1)
Sort Key: calc
Sort Method: external merge Disk: 1392kB
-> Seq Scan on price_history (cost=0.00..3391.56 rows=89056 width=5) (actual time=0.007..20.516 rows=89041 loops=1)
Planning time: 0.074 ms
Execution time: 1076.521 ms
(8 rows)
With Index:
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.29..2804.82 rows=10 width=5) (actual time=0.117..47.381 rows=11 loops=1)
Group Key: calc
-> Index Only Scan using price_history_calc_idx on price_history (cost=0.29..2359.52 rows=89041 width=5) (actual time=0.054..18.579 rows=89041 loops=1)
Heap Fetches: 83
Planning time: 0.208 ms
Execution time: 47.416 ms
(6 rows)
Actually that is called a index only scan, and happens when you have a data type that is inside your index. Which means if you need a aggregate you could try to index everything you need. Mostly a aggregate only contains some values of a row so a index is mostly not a problem.
I didn't say an index couldn't be used at all. Just not to actually make the query fast. This will get all duplicates for a value from the index, before going to the next value. If you have a couple thousand or more of each to be counted value that'll make the query rather slow.
We are migrating our analytics db from Mongo to PG but we hit a wall because SELECT COUNT(DISTINCT x)) performs abysmally in Postgres.
An alternative is PipelineDB, which is built on Postgres (and drop-in compatible, supposedly) and should provide an efficient implementation for those kinds of queries: https://www.pipelinedb.com/
> It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.
This sounds exactly like what MySQL development looked like to me early in its rise to popularity.
I conjecture that it's rare that someone just needs a document DB.
The main problem someone is thinking about may be solved by a document DB, but then they also have 1000 other problems they didn't think about that are a horrible fit for a document DB.
And due to the nature of databases, where data has greater value when combined, using a new database system for every application isn't a great option.
SQL is great for business data, and passable-to-good for everything else. So the only way to beat it is by being way better at some specific thing, and finding users where that specific thing is so important that it's OK if the data is on an island.
Can't speak for the original poster, but at this point Postgres is a very capable and performant document-store database as well with the addition of the JSONB type and associated indexing and modification (in 9.5) functions. It's also more mature, has generally better durability than MongoDB, and it is easy to fall back/extend into relational queries on your document-store data if need be.
IMHO, unless you have a situation such that Mongo's horizontal scaling is actually required, it's better to use Postgres even if you're doing document-store stuff.
Because MongoDB is chasing a moving target. Hype, built in sharding and built in automatic failover are what it has now. The PostgreSQL project was strong when Mongo was conceived and has been accelerating. PostgreSQL has already gobbled up and improved on other MongoDB features like BSON and is where you go when you grow out of MongoDBs more primitive document model. Easy sharding and failover will fall too, years before MongoDB catches up with other must have features like transactions (you will need them eventually) or a half decent query language so you can actually maintain your code rather than having to plan your own queries and encode them as a sequence of JSON documents.
The only time I would recommend Mongo is for storing geo-spatial data, as it has several built ins that make it much easier to work with. Even then I find it a lot more convenient to keep that data in Postgres and replicate it to Mongo.
Apologies, when I was talking about horizontal scaling I meant compared to other NoSQL databases[1].
The biggest selling point of NoSQL databases really is increasing performance and horizontal scalability due to removing some guarantees that RDBS provides. With MongoDB you get neither of the advantages.
Generally with Postgres you can't scale easily, unless you only do reads, but postgres outperforms MongoDB. You can get some marginal improvement by running multiple MongoDB nodes, but a single Postgres will still outperform it.
Also with benchmarks like this, it is often overlooked that in a relational database you often can store data in a smarter way and get extra boost of performance.
For example in my previous job we had 3 MongoDB which were used for mapping IP address to a zip code. Those databases were 12GB each. They run on a beefy instances in AWS because they wanted to make sure all the data could fit in RAM.
I did a POC and put the same data in Postgres and it essentially was just 600MB of data. All queries were sub milisecond, on smallest instance.
How come? Well in Postgres I stored ranges of IPs using ip4r extension. And put an GiST index which works with ranges. Mongo did not understand IP addresses so what they essentially did is they computed every possible IPv4 address and stored it into 64 bit integer, then they placed an index which probably was bigger than the data itself.
Hm, fast bulk write is an area where PostgreSQL generally handily beats MongoDB. I do not see why it would be different for geodata. MongoDB beats PostgreSQL for some things, but not bulk load.
How would you handle replicating a DB to mobile devices? This is the reason why I've been using CouchDB, but if Postgres or a plugin offered something comparable I'd have gone for it for sure.
Out of curiosity, how often do your detached DBs update old records? The reason I ask is nostalgia. A company I worked with used MySQL as an embedded DB (laptops under their control). Yesterday, I thought about how absurd this was. The DB only inserted new records and read static records. An embedded SQL would have worked great here. Would that be true for you too?
Honestly that's not something I would handle at the db layer. I would build a service that is responsible for keeping client db's in sync and abstract that away from my database entirely.
Interesting. Tbh. I'm not looking for solutions at the moment, I just know that back when we decided on the technology, CouchDB was pretty much the only good player in town. I was just curious what's out there today. RethinkDB is being mentioned a lot, I'll have to check it out some time.
Relared ORM is another orm supporting postgres and mysql. It builds the models by itself by loading the required definitions from the database. https://www.npmjs.com/package/related
Wordpress does not support anything other than MySQL, which is somewhat unique as far as CMSes go. There seem to be a few Drupal addons that don't mix with Postgres, but other than that it should be fine. Same with all others.
Generally, if you can, you should consider supporting Postgres in your framework. It's a much saner and robuster database from an Ops point of view (replication doesn't fail as often) and more flexible from a Dev view (performance is generally more predictable, much wider feature set with document storage / PostGIS / etc., …).
I like Postgresql, but it's far from superior to MySQL from an ops monitoring point of view. MySQL exposes a lot more information and run-time stats than Postgresql does. Simple things in MySQL are also impossible in Postgresql. How do you simply and effectively guarantee that your Postgresql replicas are not lagged and are indeed connected to the master and successfully writing down new logs in a streaming hot standby replica? You can, but it requires that all of your replication accounts also have superuser access on the master. Fun stuff, that.
Which is IMO preferable to MySQL servers randomly losing sync for no reason every few weeks. I don't care how many tuning knobs MySQL has if none of them makes it work as smoothly as Postgres.
I think he meant that MySQL is the only supported DB for Wordpress, and while you might be able to get it running on another DB after jumping through some hoops, it's probably not worth the effort.
(1 it is a better technical fit for a very specific problem
(2 there is already a legacy db in place
I have been voted down at a couple of startups that wanted to run a "MEAN" stack, invariably all of those startups moved from MongoDB or shutdown.
The only time I will advocate for anything other than Postgres is when Wordpress is involved. If the data model is simple enough then MySQL is more than up for the task, and it avoids an additional database dependency.
Thankfully all the ORM's that are worth using support MySQL and Postgres, so using both is very doable.
### Useful Postgres (or SQL in general) tools/libraries :
Bookshelf ORM http://bookshelfjs.org/
PostgREST automated REST API https://github.com/begriffs/postgrest
Sqitch git style migration management http://sqitch.org/