So if Redis is purely in-memory, then (and I'm asking in a positive way), how can it be used in any production system effectively? If something like a spontaneous reboot or Redis crash occurs, or some other event, don't you lose everything?
Redis is an in memory but persistent on disk data store. So memory is used to serve data, but disk is used in order to persist. This is the Redis model since the start. What I was questioning in the linked group message is to go for disk not just for persistence but also to take ready-to-serve organized data.
Actually in the persistence side we plan to do more work to make Redis better, for instance Redis 2.4 that is entering release candidate can save/load most databases on disk ten times faster.
In the future we plan to explore a new AOF format that is more compact and faster to process, and the ability to rewrite the log without a background process (BGREWRITEAOF).
BGSAVE in cron is not needed, you can configure Redis (and this is the default config) to automatically save every N seconds if there are at least M changes in the dataset. It is possible to configure multiple save points. More info: http://redis.io/topics/persistence
Even though this has nothing to do with persistence (though I can see how you'd think it does from the OP title), I just want to give you a different perspective (again, this doesn't really have anything to do with what Antirez is actually saying)"
What's more reliable? Data persisted to disk on a single server in a RAID 1 array, or data stored in memory of 10 machines which are completely isolated from each other? What about 100 machines?
It's wrong to think of disks as reliable/persistent and memory as not. You should think of both as stores where neither is 100% reliable. Memory is far less reliable than disk, much in the same way that a single disk is less reliable than RAID-6. However, just like you can make disks more reliable by replicating across other disks (and then even more reliable by replicating across multiple servers...in multiple locations), so too can you achieve reliability with memory.
EDIT (clarify what they are talking about):
They aren't talking about persistence, though I can see why you think that from the OP title. They are talking about loading the data set into virtual memory when it doesn't fit into memory.
This has nothing to do with persistence. Antirez isn't saying he doesn't trust Redis' persistence implementation. He's specifically talking about how the Redis VM handles more data than available memory.
He's saying: always have enough memory for all your data.
The OP's title is a bit sensationalist, so I can understand why you'd ask this. Redis stores the in memory database to disk asycronously. If a crash occurs, Redis will read the database from disk on startup and copy it into memory.
So the short answer is "no", you don't lose everything. You just lose the data that was written to memory since the last sync to disk.
And if you're thinking "this is amateur" let me tell you that at least 2 Very Big E-Commerce Companies I know of run their (trad) dbs in the same mode.
I.e. a commit can complete successfully before the data has actually been persisted? Or just the part about all data being kept in memory as well? If the former, seems like a bad idea, big e-commerce company or no.
The former. I think most sane serious dbs cache and manipulate recently-used data in memory (in addition to the logical log).
Bad idea or no, it's what they do - performance gains are large. There is an ersatz logical log in the form of application logs, and these have been used to piece together transactional information before.
Yes. As the GP stated, all data added/updated/deleted since last snapshot will be lost. If your data is that critical you can use AOF with "appendfsync always" which will fsync every write operation into a log that will replayed on startup (at a speed cost). Setting it to fsync every second (instead of always) is a good compromise.
Hmm, the thing I'm not really clear on is how this helps performance. The database has to do the same amount of work whether you commit before or after the sync, so why not just make all the commits wait until the sync has happened?
fsync() is a blocking call, which requires confirmation from the disk before your process continues. roughly, this means that redis can't do anything while the fsync() happens. while this isn't any more work (cpu) it is significantly slower because of all of the time spent waiting.
Hmm . . . as long as the requests are not being handled on the same thread as the one doing the fsync, this shouldn't stop work in the process or make anything appreciably slower.
You can run a slave Redis replica very easily, and save db snapshots from that without impacting the master. When Salvatore says "redis on disk", he's referring to attempts to mix in-memory & on-disk portions of the data, rather than the background saving feature of Redis.
I've been using the AOF (append-only file mode) described in the doc [1], with an older version of Redis (1.2.x) in production for a year and a couple of months, specifically to allow a reboot or a crash without loosing a job. It works well for us.
My use-case is processing XML files that are pushed by a third-party (see [2] for a blog post describing the setup).
One drawback is that the AOF file will keep growing (at least on my version) and could reach the maximum filesize of your system, if any - there's the BGREWRITEAOF command available to work-around that issue (not tested).
Redis is not meant to be run on a single node, but on multiple computers - and you can decide how much to trade of in terms of stability versus redundancy.
In addition you can make Redis asynchronously save to the disk every so often.