at which point, if you've got to use a DB to track status, really why bother wit...

eternalban · on April 12, 2023

You do not need a database. It is trivial (and correct) to create a ~'<x>-status' topic. In the forward arc you are reliably propagating job requests (acked). In the backflow the processing status of job is posted for anyone interested. You can even propagate retries, etc. It is an MQ and RabbitMQ shines in defining complex dispatch toplogies.

marcus_holmes · on April 12, 2023

Yeah but they already have a database, so it's not like they're adding a database to the system. And (as the article says) the database already contains state, so it makes way more sense to remove RMQ and hold all the state in the database (like they did).

valenterry · on April 12, 2023

Because a queuing system offers a different thing than a (relational) database.

You can build a queuing system with a database, but you have to do that. Some of the features and constraints of the database might even make your life harder than it has to be.

Instead, view it like that: there is a need for a queuing system and a job system. Either or both can be implemented using a database for certain concersn, but it can also be a custom implementation. It's not a great idea to mix the two things unless the operational and infrastructure costs and complexity outweigh the benefits of a clear separation.

nicoburns · on April 12, 2023

There are libraries that implement queueing on top of databases that require very little setup by the user. For example https://github.com/timgit/pg-boss

valenterry · on April 13, 2023

Right. However, please don't forget that there are very often inherit limitations regarding scaling or availability and also that many fully fledged message queues come with a lot of perks like access management and administration / debugging tooling and interfaces.

I'm not saying that libraries like pg-boss and co. cannot sometimes replace a full queue implementation. But the tradeoffs need to be clear.

scrollaway · on April 12, 2023

I’ve had a positive experience with Procrastinate.

https://procrastinate.readthedocs.io/en/stable/

TedDoesntTalk · on April 11, 2023

When you’re dealing with billions of messages, i think queuing systems may be tuned more for it?

I’d like to hear why people chose Kafka over some RDBMS tables.

jjeaff · on April 12, 2023

Billions of jobs with hours long time to complete seems like something no one would have the resources for.

OOPMan · on April 12, 2023

Just wait till a start convinces someone to give them money for a system that allows anyone, anywhere to queue up infinite loops XD

raverbashing · on April 12, 2023

I'm going to be honest, I think some 80% of Kafka users are overengineering it.

Kafka has specific use cases but it seems a lot of people just go "ok use Kafka here" and wait for the load that rarely comes

eurasiantiger · on April 12, 2023

In an online environment, having your system go down during those rare times will eventually cost you your business.

raverbashing · on April 12, 2023

Sure, then kafka stays online and the rest doesn't because you forgot some details.

RhodesianHunter · on April 11, 2023

Performance and distributed nature.

4dayworkweek4u · on April 13, 2023

Queues are good for connecting separated systems (like 2 or more separate companies).

giovannibonetti · on April 11, 2023

It's all a matter of how much throughput you need. A queuing system can handle, in the same hardware, orders of magnitude more than a traditional SQL database that writes rows to disk in a page-oriented fashion.

If your load is, say, a few hundred writes/second, stick with the database only, and it will be much simpler.

arcticfox · on April 11, 2023

how does that help if you still have to have a DB tracking status? you still need the same order-of-magnitude of DB throughput

RhodesianHunter · on April 11, 2023

No, because you only need to read and write ids and maybe timestamps to your db, both of which are trivially indexed, rather than the whole blob of your message payload.

Footkerchief · on April 11, 2023

In many cases, the message payload is (or should be) an ID anyway. It's seldom desirable for the message payload to include a copy of an external source of truth, because it can cause sync issues. There are exceptions, of course.

Closi · on April 12, 2023

I don’t think it should be an ID - these platforms are really made for creating distributed event-driven systems.

jwmoz · on April 12, 2023

The idea is your task should just run off an ID, no point passing all that data around.

Closi · on April 12, 2023

How do you get the data relevant to that ID?

If the answer is a call to a shared database, you might as well not have RabbitMQ.