The issue with a client app backing up dropbox and onedrive folders on your computer is the files on demand feature, you could sync a 1tb onedrive to your 250gb laptop but it's OK because of smart/selective sync aka files on demand. Then backblaze backup tries to back the folder up and requests a download of every single file and now you have zero bytes free, still no backup and a sick laptop.
You could oauth the backblaze app to access onedrive directly, but if you want to back your onedrive up you need a different product IMO.
Shoutout to Arq backup which simply gives you an option in backup plans for what to do with cloud only files:
- report an error
- ignore
- materialize
Regardless, if you make it back up software that doesn’t give this level of control to users, and you make a change about which files you’re going to back up, you should probably be a lot more vocal with your users about the change. Vanishingly few people read release notes.
I honestly didn't even realize Backblaze had a clientside app. Very happy user of Arq - been running a daily scheduled dual backup of my HDD to an external NAS and Backblaze B2 for years with zero issues.
I for one would pay for Arq on Linux as I now do on Mac. It would be fantastic to be able to use the same "it just works" backup solution on all my computers.
And I've used enough "gold standard" commercial applications, like the one being discussed in this very article, that I don't trust those either. If you recoil in horror at code written by LLMs, I'm afraid that the vendors you're already working with have some really bad news for you. You can get over it now or get over it later. You will get over it.
I can audit and verify Claude's output. Code running at BackBlaze, not so much. Take some responsibility for your data. Rest assured, nobody else will.
You are not wrong, but I just don't have time. My choices are pay someone or throw my hands up. I have been paying backblaze. But I recently had a drive die, and discovered the backups are missing .exe and .dll files, and so that part of the restore was worthless.
What time I do have, I've been using to try and figure out photo libraries. Nothing is working the way I need it to. The providers are a mess of security restrictions and buggy software.
My favorite Peanuts comic was always the one where Linus is standing at an intersection next to a 'Push Button To Cross Street' sign. He is sucking his thumb and clutching his blanket despondently.
In the last panel, Charlie Brown tells him, "You have to move your feet, too."
That seems like a pretty straightforward issue to solve, to simply backup only those files that are actually on the system, not the stubs. If it's on your computer, it should able to get backed up. If it's just a shadow, a pointer, it doesn't.
Making the change without making it clear though, that's just awful. A clear recipe for catastrophic loss & drip drip drip of news in the vein of "How Backblaze Lost my Stuff"
Not what I meant: The other cloud storage services connected to the computer, eg OneDrive. Those files, when they are just stubs. I'm saying that Backblaze could simply not backup stubs, if the person isn't syncing the actual file to their drive. If they are, backblaze should back it up.
The point of the stubs is so you don’t have to know which cloud files are actually on your device at any given moment, because they will be fetched automatically. Marking files as “always local” is a niche feature, and in any case has nothing to do with whether you want those files backed up.
To have this thing you’re not supposed to need to worry about affect whether your files got backed up is exactly the problem here. The goal is to back up your files, whether they’re in the cloud or not.
I sympathize with Backblaze’s problem with their file change monitor, but then they should considee implementing connectors for OneDrive, Dropbox, etc. and back up files directly from the cloud.
“We only back up stuff from your computer” has always been their stance, and clearly it’s a way to reduce their costs. I can understand not wanting to engage in the “please backup these 10TB I have in Dropbox” requests.
I think backing up the materialized files is appropriate. That’s what they (used to) promise.
Of course it is. So *you* don't have to know which cloud files are actually there. Doesn't mean backblaze can't know, and should work within that paradigm rather than not backup anything.
As for your (the user, not necessary you personally) expectation that Backblaze would backup the stubs (im not sure that would really matter, as you said in your own comment) regardless of its stub status, that's unreasonable-- that Backblaze would travers the stubs and... why? temporarily download them, upload to backblaze? That's not what they ever stated would happen and is a big stretch to expect what amounts to the extra service of backing up cloud drives simply because a user decides to have what amounts to a an 'ln <soft link>' to to a network drive. The do explicitly exclude that.
What is not reasonable on their part is to change any service at all that had previously been happening, regardless of whether it was or was not within the ToS, and likely contract law wouldn't support a claim on their part that there was an implied contract through ambiguity which courts will typically resolve in favor of an injured party, especially one in a position of lesser power in the relationship. I'm not claiming that's what happened here, my reading of any ToS has the same legitimacy as this comment on that. I'm saying they do claim they'll backup whatever is on the computer and unexcluded. and its wrong as a matter of basic provisioning of service to a customer of what was offered. That's the limit of my claim.
Imagine if they could detect stab or real file huh? Space technology, I know! Or just fucking copy them as stubs and what's actually downloaded as actually downloaded! Boggles the mind!
Or maybe just do what they do now, but WARN about that in HUGE RED LETTERS, in the website and the app, instead of burying it in an update note like weasels!
The whole "just sync everything, and if you can't seek everything, pretend to sync everything with fake files and then download the real ones ad-hoc" model of storage feels a bit ill-conceived to me. It tries to present a simple facade but I'm not sure it actually simplifies things. It always results in nasty user surprises and sometimes data loss. I've seen Microsoft OneDrive do the same thing to people at work.
Same. I lost a lot of photos this way. I've recently moved over to Immich + Borg backup with a 3-2-1 backup between a local synology NAS and BorgBase. Painful lesson, but at least now I feel much more confident. I've even built some end-to-end monitoring with Grafana.
Thanks... hence, 3-2-1 backups with offsite :) appreciate it though. Will definitely be rolling my own NAS in the future, I just needed something easy at the time.
My own approach to simplicity generally means "hide complexity behind a simple interface" rather than pushing for simple implementations because I feel that too much emphasis on simplicity of implementations often means sacrificing correctness.
This particular example is a useful one for me to think about, because it's a version of hiding complexity in order to present a simple interface that I actually hate. (WYSIWYG editors is another one, for similar reasons: it always ends up being buggy and unpredictable.)
That would make sense for online-only files, but I have my Dropbox folder set to synchronize everything to my PC, and Backblaze still started skipping over it a few months ago. I reached out to support and they confirmed that they are just entirely skipping Dropbox/OneDrive/etc folders entirely, regardless of if the files are stored locally or not.
That doesn't really make a lot of sense, though. Reading a file that's not actually on disk doesn't download it permanently. If I have zero of 10TB worth of files stored locally on my 1TB device, read them all serially, and measure my disk usage, there's no reason the disk should be full, or at least it should be cache that can be easily freed. The only time this is potentially a problem is if one of the files exceeds the total disk space available.
Hell, if I open a directory of photos and my OS tries to pull exif data for each one, it would be wild if that caused those files to be fully downloaded and consume disk space.
Right, but even if that’s working it breaks the user experience of services like this that ‘files I used recently are on my device’.
After a backup, you’d go out to a coffee shop or on a plane only to find that the files in the synced folder you used yesterday, and expected to still be there, were not - but photos from ten years ago were available!
That shouldn't be seen as Backblaze's problem. It's Dropbox's problem that they made their product too complicated for users to reason about. The original Dropbox concept was "a folder that syncs" and there would be nothing problematic about Backblaze or anything else trying to back it up like any other folder.
Today's Dropbox is a network file system with inscrutable cache behavior that seeks to hide from the users the information about which files are actually present. That makes it impossible for normal users to correctly reason about its behavior, to have correct expectations for what will be available offline or what the side effects of opening a file will be, and Backblaze is stuck trying to cope with a situation where there is no right answer.
There’s no reason to think that would happen - files you had from ten years ago would have been backed up ten years ago and would be skipped over today.
Good point (I’m assuming you’re right here and it trusts file metadata and doesn’t read files it’s already backed up?)
It would still happen with the first backup - or first connection of the cloud drive - though, which isn’t a great post-setup new user experience. It probably drove complaints and cancellations.
I feel like I’ve accidentally started defending the concept of not backing up these folders, which I didn’t really intend to. I’d also want these backed up. I’m just thinking out loud about the reasons the decision was made.
It's generally now handled decently well, but with three or four of these things it can make backups take annoying long as without "smarts" (which are not always present) it may force a download of the entire OneDrive/Box each time - even if it never crashes out.
This is a complexity that makes it harder, but not insurmountable.
It would be reasonable to say that if you run the file sync in a mode that keeps everything locally, then Backblaze should be backing it up. Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.
> Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.
When you have a couple terabytes of data in that drive, is it acceptable to cycle all that data and use all that bandwidth and wear down your SSD at the same time?
Also, high number of small files is a problem for these services. I have a large font collection in my cloud account and oh boy, if I want to sync that thing, the whole thing proverbially overheats from all the queries it's sending.
Reading your comments, it sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files. I know you haven’t technically said that, but that’s what it sounds like.
I assume you don’t think that, so I’m curious, what would you propose positively?
> I know you haven’t technically said that, but that’s what it sounds like.
Yes, I didn't technically said that.
> It sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files.
I don't argue neither, either.
What I said is with "on demand file download", traditional backup software faces a hard problem. However, there are better ways to do that, primary candidate being rclone.
You can register a new application ID for your rclone installation for your Google Drive and Dropbox accounts, and use rclone as a very efficient, rsync-like tool to backup your cloud storage. That's what I do.
I'm currently backing up my cloud storages to a local TrueNAS installation. rclone automatically hash-checks everything and downloads the changed ones. If you can mount Backblaze via FUSE or something similar, you can use rclone as an intelligent MITM agent to smartly pull from cloud and push to Backblaze.
Also, using RESTIC or Borg as a backup container is a good idea since they can deduplicate and/or only store the differences between the snapshots, saving tons of space in the process, plus encrypting things for good measure.
This. You should not try to backup your local cache of cloud files as if those were your local files. Use a tool that talks to the cloud storage directly.
Use tools with straightforward, predictable semantics, like rclone, or synching, or restic/Borg. (Deduplication rules, too.)
My understanding of Backblaze Computer Backup is it is not a general purpose, network accessible filesystem.[0] If you want to use another tool to backup specific files, you'd use their B2 object storage platform.[1] It has an S3 compatible API you can interact with, Computer Backup does not.
But generally speaking, I'd agree with your sentiment.
But if the files are only on the remote storage and not local, chances are they haven't been modified recently, so it shouldn't download them fully, just check the metadata cache for size / modification time and let them be if they didn't change.
So, in practice, you shouldn't have to download the whole remote drive when you do an incremental backup.
You can't trust size and modification time all the time, though mdate is a better indicator, it's not foolprooof. The only reliable way will be checksumming.
Interestingly, rclone supports that on many providers, but to be able to backblaze support that, it needs to integrate rclone, connect to the providers via that channel and request checks, which is messy, complicated, and computationally expensive. Even if we consider that you won't be hitting API rate limits on the cloud provider.
Sometimes modification time of a file which is not downloaded on computer A, but modified by computer B is not reflected immediately to computer A.
Henceforth, backup software running on computer A will think that the file has not been modified. This is a known problem in file synchronization. Also, some applications modifying the files revert or protect the mtime of the file for reasons. They are rare, but they're there.
The problem is, downloading files and disk management is not in your control, that part is managed by the cloud client (dropbox, google drive, et. al) transparently. The application accessing the file is just waiting akin to waiting for a disk spin up.
The filesystem is a black box for these software since they don't know where a file resides. If you want control, you need to talk with every party, incl. the cloud provider, a-la rclone style.
Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.
And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.
> Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.
The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.
The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.
I think you might be confusing Backblaze reading files and how Dropbox/OneDrive/Nextcloud/etc. work. NC doesn't enable this by default (I don't think), but Windows calls it virtual file support. There is no avoiding filling the upload buffer, because Backblaze has zero control over how Dropbox downloads files. When Backblaze requests that a file be opened and read, Windows will ask Dropbox or whatever to open the file for it, and to read it. How that is done is up to whatever handles the virtual files. To Backblaze, your Dropbox folder is a normal directory with all that that entails, so Backblaze thinks that it can just zip through the directory and it'll read data from disk, even though that isn't really what's happening. I had to exclude my Nextcloud directory from my Duplicati backups for precisely this reason -- my Nextcloud is hosted on my server, and Duplicati was sending it so many requests it would cause my server to start sending back error 500s.
And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).
Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.
If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.
The issue you’re missing is that the abstraction Dropbox/OneDrive/etc provide is not that of an NFS. When an application triggers the download of a file, it hydrates the file to the local file system and keeps it there. So if Backblaze triggers the download of a TB of files, it will consume a TB of local file system space (which may not even exist).
It does keep them permanently. Dropbox is not a NAS and does not pretend to be one.
> When you open an online-only file from the Dropbox folder on your computer, it will automatically download and become available offline. This means you’ll need to have enough hard drive space for the file to download before you can open it. You can change it back to online-only by following the instructions below.
Same exact behavior for OneDrive, though it apparently does have a Windows integration to eventually migrate unused files back to online-only if enabled.
> When you open an online-only file, it downloads to your device and becomes a locally available file. You can open a locally available file anytime, even without Internet access. If you need more space, you can change the file back to online only. Just right-click the file and select "Free up space."
Maybe it'll, maybe it won't, but it'll cycle all files in the drive and will stress everything from your cloud provider to Backblaze, incl. everything in between; software and hardware-wise.
That sounds very acceptable to get those files backed up.
It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.
I mean, cycling a couple of terabytes of data over a 512GB drive is at least full 4 writes, which is too much for that kind of thing.
> more ISPs need to improve upload.
I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.
Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.
4 writes out of what, 3000? For something you'll need to do once or twice ever? It's fine. You might not even eat your whole Drive Write Per Day quota for the upload duration, let alone the entire month.
> the technical limits of the technology that I had at home (PON for the time being)
Depends on your device capacity and how much is in actual use. Wear leveling things also wear things while it moves things around.
> For something you'll need to do once or twice ever?
I don't know you, but my cloud storage is living, and even if it's not living, if the software can't smartly ignore files, it'll pull everything in, compare and pass without uploading, causing churns in every backup cycle.
> Isn't that usually symmetrical? Is yours not?
GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.
How do you know how often those files need to be backed up without reading them? Timestamps and sizes are not reliable, only content hashes. How do you get a content hash? You read the file.
If timestamps aren’t reliable, you fall way outside the user that can trust a third party backup provider. Name a time when modification timestamp fails but a cloud provider will catch the need to download the file.
The issue really isn't that it's not backing up the folder (which I can see an argument for both sides and various ways to do it) - it's that they changed what they did in a surprising way.
Your backup solution is not something you ever want to be the source of surprises!