Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Postgres has been my DB of choice for nearly a decade. The only times I wind up working with another db are because:

(1 it is a better technical fit for a very specific problem

(2 there is already a legacy db in place

I have been voted down at a couple of startups that wanted to run a "MEAN" stack, invariably all of those startups moved from MongoDB or shutdown.

The only time I will advocate for anything other than Postgres is when Wordpress is involved. If the data model is simple enough then MySQL is more than up for the task, and it avoids an additional database dependency.

Thankfully all the ORM's that are worth using support MySQL and Postgres, so using both is very doable.

### Useful Postgres (or SQL in general) tools/libraries :

Bookshelf ORM http://bookshelfjs.org/

PostgREST automated REST API https://github.com/begriffs/postgrest

Sqitch git style migration management http://sqitch.org/



I have been using it a lot lately, and do like it.

It isn't a popular opinion on HN, but I will still advise for Oracle or SQL Server in terms of tooling, cluster scaling, server side programming and DB drivers.

Then again, we work with customers whose Oracle and SQL Server licenses costs aren't an issue.


Have an upvote, I'm not sure why you're being down-voted.

If you take the issues of open-source and licensing out of the equation (both important issues in their own right, but not related to the point at hand) then Oracle and SQL Server are both ridiculously good.

I personally try to avoid them (due to the cost, and the lock-in) but they are astoundingly performant and featureful RDBMSs with a huge amount of support and documentation behind them.


Thanks for the upvote.

I guess some took it personally, although I mentioned I do like Postgres.

I just happen to like the other ones even more, since I was lucky to be able to use them in a few projects.


Downvotes originate from open source fanatics. Odd it results in downvotes though; my first thought is also a fairly retaliatory "commercial offerings, why?!", but I'd never downvote for it. tips hat


1) Coming to a discussion on Postgres and saying Oracle is better is clearing trolling

2) "server side programming" - as you have the source code with Postgresql and their a plugins for most major programming languages I don't buy this.

3) If you of spent the same on Enterprisedb or CitusDB I can guarantee you would get similar polish and support. People don't.


Can you explain how the comment was trolling? It seemed to be a legitimate comparison.


As others have mentioned, with Citus Data open sourcing CitusDB, Postgres is now capable of fairly easily handling Mongo sized records.

Something else you might be interested in:

https://www.linkedin.com/pulse/mongodb-32-now-powered-postgr...

I have always been a Postgres fan, but now I find it very difficult to imagine a problem MongoDB would be better suited for.


Also check Postgraphql: https://github.com/calebmer/postgraphql

Inspired from Postgrest linked above, it automatically builds a GraphQL API by reflecting on postgres schema.


I was looking at that the other day, really exciting project.


There is also Sequelize [1] which has more activity

1 - http://docs.sequelizejs.com/en/latest/


I would also like to mention Objection[1] it is very minimalistic and doesn't get into your way. Most of the time you are writing queries with Knex[2] or you can write raw sql if you feel like it. I tried a lot of ORMs for node and this is the one I liked the most. The guy working on it is also very responsive and have superb documentation.

1 - http://github.com/Vincit/objection.js

2 - http://knexjs.org


We've been using Objection over bookshelf recently and we really like it. One thing that caused us to choose Objection over bookshelf is because bookshelf doesn't support composite keys:

https://github.com/tgriesser/bookshelf/issues/720


I haven't worked with Sequelize personally, but a friend has been recently and curses the day it was born - he wishes he'd used bookshelf.


Can you ask your friend about the exact details? From what I've seen it seemed OK.


The only issue that I have with this top-most comment is that it presents PG as a silver bullet. But, there are a lot of different types of databases for a reason.

For instance, at my current startup, we employ at least 7 different databases (including PG). And, I don't say that to brag - each has a specific use for the problem at hand.

You have to consider the needs and trade offs of your specific project. And, if you can, try to isolate your storage behind some interface. Because often your needs will change.

(I say this with 20+ years of experience and over a dozen commercially-successful products in my belt)


Not disagreeing necessarily, but having many different databases has a cost.

Data has more value when combined, so a wise database choice also depends on what you already have. You can combine data from different systems (e.g. FDWs, which postgres has great support for), but something is usually lost along the way.

I think postgres is a good default, because it is good for a lot of things. But sure, if you make an informed decision otherwise, there's nothing wrong with that.


I think it's more because Postgresql is a good starting point. You can start with it and gradually introduce special/niche databases where required.

It doesn't matter whether you have your needs clearly defined, Postgresql is a jack of all trades and a master of many.


I agree with you 100%.

An abstracted interface for your datalayer is a must have.

Many startups and projects begin with a single db and grow into new dbs as the business requirements change.

Obviously if you know your data model well enough you can foresee a lot of these requirements and pick the right tool for the job.

However, you will usually need to pick a db to start with and hope that it will accommodate as many of those unknowns as possible.

I believe that Postgres is the best choice in that scenario, as it is extremely mature and has an incredible amount of flexibility.


Totally agree. In 95% of the cases, you won't go wrong starting with PG.


Funny. I just said the same thing before reading your comment.

Somehow that makes me feel better that at least, I'm on the right path.


It is true, that you should not chose blindly, but a majority of use cases do require a relational database. You use one of specialized databases when you have a specialized use case.

The reason why Postgres is so popular here is because if you want to have an open source solution, Postgres is the best available. I'm sure there are better proprietary solutions, but even though it is free, PG is pretty darn close to them.


> invariably all of those startups moved from MongoDB

Why? Especially after point #1 and assuming the document-store was a good fit for the data model.


Because NoSQL is a hype.

Many of the NoSQL essentially takes us back to 60s before Codd came up with relational model[1] These ideas tend to come back once in a while [2][3], but so far nothing is better than relational model.

NoSQL still makes sense in many cases (generally when your specific use case does not need all guarantees of ACID), you can get in return higher performance or horizontal scalability.

MongoDB is aim to be generic database but rides on the wave of NoSQL "coolness". It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.

Currently MongoDB is outperformed by Postgres. In fact there is an application called ToroDB which provides protcol compatibility for Postgres that emulates MongoDB and even that outperforms Mongo. Mongo also doesn't scale well horizontally, so essentially you don't really don't get any significant advantage.

[1] https://en.wikipedia.org/wiki/Hierarchical_database_model

[2] https://en.wikipedia.org/wiki/Object_database

[3] https://en.wikipedia.org/wiki/XML_database

[4] For example they started with mmap'ed memory regions to store the data. Did not initially use fsync() to make sure data is saved on disk. In a way it reminds me of MySQL several years ago. It's much better now than it was in the past, but it has a lot of warts left of from the past.


MongoDB is really, really bad. I've never come across another product that was so horrible and yet so widely used.

MongoDB: for when you don't need consistency, availability, or partition tolerance.

There are some really good NoSQL products out there. I seriously think RethinkDB is on par with Postgres. I've also used Cassandra and BerkeleyDB and they're both decent. But unless some core part of your business logic is pathological to implement in SQL (like Reddit's comment trees) you should go with Postgres.


No no no, you don't understand at all. Mongo DB is webscale. /slashdot-comment


Right, absolutely true. Mongo is just such a bad piece of software.


> so far nothing is better than relational model

my favorite quote: "the relational model is rooted depth in the set theory and math is hard to fool"

it was on a post on how all NoSQL are ending up having to reinvent having, group and join clauses on their API because that's what apps do with data


A recent candidate said basically this during his interview, he got the job.


Neither HAVING nor GROUP BY are part of the relational model, though. Really, aggregation isn't part of the traditional relational model. People just tend to associate it with RDBMSes.


Currently MongoDB is outperformed by Postgres

I hate hearing absolutist dogma like this. Some things are faster in Postgres, some things are faster in Mongo. We are migrating our analytics db from Mongo to PG but we hit a wall because SELECT COUNT(DISTINCT x)) performs abysmally in Postgres. Mongo's aggregation framework is an immature PITA but it performs this little trick well enough that we're pretty much stuck keeping Mongo around unless we want to use MSSQL or Oracle or something else with a better (and much more expensive) query planner.

We still love (and prefer) Postgres but it is not a pareto improvement. There are always tradeoffs, and this kind of fanboyism just speaks to inexperience.


There's no reason for SELECT COUNT(DISTINCT x)) to perform badly in Postgres, as long as you have an appropriate table design and indexes.

Also MSSQL's query planner isn't better than Postgres', I work with both. Postgres does have its quirks though, especially with the MVCC row expiry.


> There's no reason for SELECT COUNT(DISTINCT x)) to perform badly in Postgres, as long as you have an appropriate table design and indexes.

Meh. Postgres' planner doesn't know how to generate a skip-scan/loose index scan for DISTINCT. You can write it yourself, but it's a bit painful: https://wiki.postgresql.org/wiki/Loose_indexscan

If you have a low cardinality that can be a huge efficiency difference.


Actually that's wrong:

    EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
                                                          QUERY PLAN                                                           
    -------------------------------------------------------------------------------------------------------------------------------
     GroupAggregate  (cost=10713.04..11381.06 rows=10 width=5) (actual time=1010.383..1073.263 rows=11 loops=1)
       Group Key: calc
       ->  Sort  (cost=10713.04..10935.68 rows=89056 width=5) (actual time=1010.321..1049.189 rows=89041 loops=1)
             Sort Key: calc
             Sort Method: external merge  Disk: 1392kB
             ->  Seq Scan on price_history  (cost=0.00..3391.56 rows=89056 width=5) (actual time=0.007..20.516 rows=89041 loops=1)
     Planning time: 0.074 ms
     Execution time: 1076.521 ms
    (8 rows)

With Index:

    EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
                                                                         QUERY PLAN                                                                          
    -------------------------------------------------------------------------------------------------------------------------------------------------------------
     GroupAggregate  (cost=0.29..2804.82 rows=10 width=5) (actual time=0.117..47.381 rows=11 loops=1)
       Group Key: calc
       ->  Index Only Scan using price_history_calc_idx on price_history  (cost=0.29..2359.52 rows=89041 width=5) (actual time=0.054..18.579 rows=89041 loops=1)
             Heap Fetches: 83
     Planning time: 0.208 ms
     Execution time: 47.416 ms
    (6 rows)

Actually that is called a index only scan, and happens when you have a data type that is inside your index. Which means if you need a aggregate you could try to index everything you need. Mostly a aggregate only contains some values of a row so a index is mostly not a problem.


I didn't say an index couldn't be used at all. Just not to actually make the query fast. This will get all duplicates for a value from the index, before going to the next value. If you have a couple thousand or more of each to be counted value that'll make the query rather slow.


An index only scan isn't the same as a loose index scan. They're orthogonal tricks.


You would need to add support for skip scans to the query planner, and adding feature to the query planner is rarely simple.


We are migrating our analytics db from Mongo to PG but we hit a wall because SELECT COUNT(DISTINCT x)) performs abysmally in Postgres.

An alternative is PipelineDB, which is built on Postgres (and drop-in compatible, supposedly) and should provide an efficient implementation for those kinds of queries: https://www.pipelinedb.com/

I've never used it, though.


"We are migrating our analytics db from Mongo to PG but we hit a wall because SELECT COUNT(DISTINCT x)) performs abysmally in Postgres."

https://github.com/aggregateknowledge/postgresql-hll ?

or even

https://www.periscopedata.com/blog/hyperloglog-in-pure-sql.h... ?


Try:

    SELECT COUNT(*) FROM (SELECT DISTINCT x FROM table) AS temp;

The

    SELECT COUNT(DISTINCT x))
Is slow because it performs sort.


> It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.

This sounds exactly like what MySQL development looked like to me early in its rise to popularity.


I conjecture that it's rare that someone just needs a document DB.

The main problem someone is thinking about may be solved by a document DB, but then they also have 1000 other problems they didn't think about that are a horrible fit for a document DB.

And due to the nature of databases, where data has greater value when combined, using a new database system for every application isn't a great option.

SQL is great for business data, and passable-to-good for everything else. So the only way to beat it is by being way better at some specific thing, and finding users where that specific thing is so important that it's OK if the data is on an island.


Can't speak for the original poster, but at this point Postgres is a very capable and performant document-store database as well with the addition of the JSONB type and associated indexing and modification (in 9.5) functions. It's also more mature, has generally better durability than MongoDB, and it is easy to fall back/extend into relational queries on your document-store data if need be.

IMHO, unless you have a situation such that Mongo's horizontal scaling is actually required, it's better to use Postgres even if you're doing document-store stuff.


http://instagram-engineering.tumblr.com/post/10853187575/sha... it certainly can be done with postgres, but isn't as simple.


Because MongoDB is chasing a moving target. Hype, built in sharding and built in automatic failover are what it has now. The PostgreSQL project was strong when Mongo was conceived and has been accelerating. PostgreSQL has already gobbled up and improved on other MongoDB features like BSON and is where you go when you grow out of MongoDBs more primitive document model. Easy sharding and failover will fall too, years before MongoDB catches up with other must have features like transactions (you will need them eventually) or a half decent query language so you can actually maintain your code rather than having to plan your own queries and encode them as a sequence of JSON documents.



I don't like this one too much, because Mongo is wrong choice here domain wise, but this one was really convincing for me:

http://cryto.net/~joepie91/blog/2015/07/19/why-you-should-ne...


diaspora was such an interesting project, I dream about resurrecting it often.


Postgres is a better document store than Mongo.

The only time I would recommend Mongo is for storing geo-spatial data, as it has several built ins that make it much easier to work with. Even then I find it a lot more convenient to keep that data in Postgres and replicate it to Mongo.


Why not use PostGIS for geo-spatial data?


For processing, 100% agree.

However if the write load is very high then Mongo is better suited as the intial store. I then replicate to other dbs.


Mongo actually has worse performance than postgres, including inserting data.

You can of course improve Mongo's speed by using weaker write concern, but you can also disable WAL in Postgres too.


For a single server, I believe you are correct.

However I found scaling write load past a single server's limits more difficult using Postgres than MongoDB.

Fortunately that is part of what the OP addresses in his post.


Apologies, when I was talking about horizontal scaling I meant compared to other NoSQL databases[1].

The biggest selling point of NoSQL databases really is increasing performance and horizontal scalability due to removing some guarantees that RDBS provides. With MongoDB you get neither of the advantages.

Generally with Postgres you can't scale easily, unless you only do reads, but postgres outperforms MongoDB. You can get some marginal improvement by running multiple MongoDB nodes, but a single Postgres will still outperform it.

Also with benchmarks like this, it is often overlooked that in a relational database you often can store data in a smarter way and get extra boost of performance.

For example in my previous job we had 3 MongoDB which were used for mapping IP address to a zip code. Those databases were 12GB each. They run on a beefy instances in AWS because they wanted to make sure all the data could fit in RAM.

I did a POC and put the same data in Postgres and it essentially was just 600MB of data. All queries were sub milisecond, on smallest instance.

How come? Well in Postgres I stored ranges of IPs using ip4r extension. And put an GiST index which works with ranges. Mongo did not understand IP addresses so what they essentially did is they computed every possible IPv4 address and stored it into 64 bit integer, then they placed an index which probably was bigger than the data itself.

Their solution also won't scale with IPv6.

[1] http://www.datastax.com/wp-content/themes/datastax-2014-08/f...


Hm, fast bulk write is an area where PostgreSQL generally handily beats MongoDB. I do not see why it would be different for geodata. MongoDB beats PostgreSQL for some things, but not bulk load.


Seems like the poster is talking about high write load and not necessarily bulk load.


How would you handle replicating a DB to mobile devices? This is the reason why I've been using CouchDB, but if Postgres or a plugin offered something comparable I'd have gone for it for sure.


Out of curiosity, how often do your detached DBs update old records? The reason I ask is nostalgia. A company I worked with used MySQL as an embedded DB (laptops under their control). Yesterday, I thought about how absurd this was. The DB only inserted new records and read static records. An embedded SQL would have worked great here. Would that be true for you too?


completely depends on the app. It's a PaaS, not just a specific app.


Honestly that's not something I would handle at the db layer. I would build a service that is responsible for keeping client db's in sync and abstract that away from my database entirely.


I'm not a fan of rolling our own sync code if I don't have to.


RethinkDB's new project Horizon could be a solution. But of course, that's not PostgreSQL.


Interesting. Tbh. I'm not looking for solutions at the moment, I just know that back when we decided on the technology, CouchDB was pretty much the only good player in town. I was just curious what's out there today. RethinkDB is being mentioned a lot, I'll have to check it out some time.


Are you H2database author ?


Nah, my project is called Protogrid. [1]

[1] http://protogrid.com


Relared ORM is another orm supporting postgres and mysql. It builds the models by itself by loading the required definitions from the database. https://www.npmjs.com/package/related


I'll have to give that a shot, seems like a nice compromise between fully automated solutions like Postgrest and a conventional ORM.


Just out of curiosity, why when wordpress is involved? Doesn't wordpress support PG? What about drupal and other frameworks, would you say the same?

Our framework only has a mysql adapter for now, but it should be pretty easy to add a postgres one.


Wordpress does not support anything other than MySQL, which is somewhat unique as far as CMSes go. There seem to be a few Drupal addons that don't mix with Postgres, but other than that it should be fine. Same with all others.

Generally, if you can, you should consider supporting Postgres in your framework. It's a much saner and robuster database from an Ops point of view (replication doesn't fail as often) and more flexible from a Dev view (performance is generally more predictable, much wider feature set with document storage / PostGIS / etc., …).


I like Postgresql, but it's far from superior to MySQL from an ops monitoring point of view. MySQL exposes a lot more information and run-time stats than Postgresql does. Simple things in MySQL are also impossible in Postgresql. How do you simply and effectively guarantee that your Postgresql replicas are not lagged and are indeed connected to the master and successfully writing down new logs in a streaming hot standby replica? You can, but it requires that all of your replication accounts also have superuser access on the master. Fun stuff, that.


Which is IMO preferable to MySQL servers randomly losing sync for no reason every few weeks. I don't care how many tuning knobs MySQL has if none of them makes it work as smoothly as Postgres.


What do you mean by MySQL avoiding an additional database dependency?


I think he meant that MySQL is the only supported DB for Wordpress, and while you might be able to get it running on another DB after jumping through some hoops, it's probably not worth the effort.


IIRC PHP has built-in support for MySQL out of the box.


Well, technically it's a "PHP extension", just one that several distributions include by default.


^^^




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: