Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It looks like the NoSQL movement was just a fad and plenty of startups got burned by it and some still stuck with this tech , writing inefficient workarounds for something that comes OTB with the regular SQL databases.


I worked with one of the largest ad publishing companies. They wanted to track data about every client served an ad. This generated over 1.2 Terabyte per hour of data when the MySQL master started to max out. We had the largest possible multiple core system. It was going to cost my client $30k to upgrade to SSD drives to get more out of MySQL. Also note we had to store this data on an expensive SAN in order to feed the data at a reasonable rate to MySQL or PostgreSQL.

I had just learned of MongoDB and went to school at 10gen for their sys admin class. I talked to the developer about storing the data in NoSQL using a small sharded cluster on a Friday. Monday morning he asked me to setup a MongoDB cluster. Tuesday we moved over from MySQL. They ended up using much smaller servers, got rid of the 3par and epsilon SAN's and saving tons of money.

My point is there are certain situations where NoSQL is still the answer unless you can cluster your SQL write server. I've moved on from working with Ad publishing clients but, I'm sure there are other places where SQL databases are not adequate.

NoSQL might be or, have been, a fad but, like any tool when used for the right job it works.


A production large scale ad. system system that was doing 1.2 TB per hour in writes, was migrated over from MySQL to MongoDB in roughly 24 hour window cool story Bro.


Good point. How is it possible to move that much data from SAN to mongo cluster in such a short time window?


and rewrite and validate all the code that was accessing MySQL to use MongoDB


If more comments were like their story then this would be a way higher value comments section.


how can more made up stories result in higher value comments section?


sounds like the wrong tool. what you are producing was probably logs data, which is immutable. There are far more efficient (storage, cpu) write-only stores.


Such as?


Concider what a database does. It provides ACID properties and ability to query data. If all you need is writing data, the fastest you can do it write it directly to the disk, without the overhead a database comes with.

Using a loadbalancer in front of a farm of cheap logging machines, and aggreate the data you need for analysis to a suitable machine.


they'd still need to query their stuff, I guess, so you'd need to trow in there somewhere something to aggregate logs and get the metrics they're tracking out of it - which can totally be done in streaming, without the need of going trough the logs every time, for most metrics.


With that amount of data, streaming and only saving aggregated data is the only sane way. With 1.2TB/hour there is a limit to how much historical data that can be saved anyway, and we're talking about 30% utilization of a 10gbps network interface, so it's beyond using single machines for most usecases.


Query? Most likely not. At least not in the traditional "lets on the fly create a dashboard" sense.


query as in 'how much I bill this guy for it's click' - doesn't have to be sql nor on the fly of course


Was $30k a lot of money for one of the largest add publishers?


OTOH, if you're a consultant, and you can say 'hey guys I can save you 30k, my fee is only 15k', and you can do it in a week - I mean, there are weeks where I bill less than 15k...


3k a day consulting. I'm in!


How can MongoDB use that much less in such a situation? Especially prior to WireTiger?

An optimised schema in a relational database should be close to the minimum possible storage.


The largest ad system ever, Google AdSense, used MySQL until circa 2014 when it moved to a completely custom DB backend, F1. F1 is also a SQL database, however.

Google does use bigtable and such where appropriate, but for anything more complex than dumb key/value you can't do much better than regular DBs. Some people think they can, but 99% of the time they're mistaken.


Startups got burnt by NoSQL because they wanted to use it for everything instead of thinking over use cases that fit. It's a good thing that this fad went away and now people would be more likely to educate themselves about relational vs. non-relational db before jumping into the fire pit.


There are a good reason to use NoSQL, for example to replace EAV. But definitely not a good use to do everything on NoSQL so most companies opted out for 2 database solution = SQL + NoSQL and did not get burned.

The thing however is that since Postgres released indexed BSON support (which is actually faster then MongoDB) there is absolutely no point in opting for 2 database solution and making things harder for no reason.

TL;DR

Use Postgres.


Or, get the best of both worlds. ToroDB puts a mongo wire protocol in front of Postgres, which outperforms mongo significantly on the same hardware. Plus, you can get read-only views on the Pg side to join with traditional relational data.


> Or, get the best of both worlds. ToroDB puts a mongo wire protocol in front of Postgres

Postgres alone is already the best of both worlds. With ToroDB, I am restricted to the MongoDB way of dealing with my data; with Postgres, I can mix SQL and NoSQL however I like, even in a single, simple SELECT query.


I have to agree that there was a lot of noise in the nosql world and a lot of people got carried away by the ease of use metric. I guess people were lured by the prospect of not having to write SQL joins.

That said I found nosql databases extremely helpful for storing and querying large unstructured data. Mostly because it was really hard to build relations and to store this data in tables. Think Wikipedia for instance. Since then my way of choosing databases is to try and model data into a relational db as much as possible and if that doesn't work out choose a nosql equivalent.


With Postgre you don't need to use a different database unless your workload is "bigdata". You can store relational data alongside unstructures json and query both.


99% of "Big Data" workloads will run happily in Postgres. There are petabyte Postreses (Postgri?) in Production, have been for a decade now...


Which amazes me how RethinkDB never eclipsed all of the other "nosql" offerings. A relational, realtime, document store. You get the benefits from both sides.



well, there are so many... using mongodb for CRUD apps and realizing they can't do joins.


Sadly some do not come to that realization and manage to implement a join "solution" anyways.


that comment is over 2000 days old. wow


Depends on what you're doing. We've used it on at least one production product at work where it works fine and probably will not be replaced. (If it's not broken don't fix it TM)


DynamoDB seems to be the flavour of the day for this, people are using it even in cases where a non-sharded SQL system would work fine.


Can we get revived WebSQL support?

We can thank it a rough Mozilla devs that long moved one, that WebSQL got stopped and is support in Webkit and Blink (100% mobile devices, 80% notebook), but not on Firefox. Instead this NoSQL IndexedDB was introduced. Let's get over NoSQL fad, and support SQL in web browser!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: