Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The issue with a client app backing up dropbox and onedrive folders on your computer is the files on demand feature, you could sync a 1tb onedrive to your 250gb laptop but it's OK because of smart/selective sync aka files on demand. Then backblaze backup tries to back the folder up and requests a download of every single file and now you have zero bytes free, still no backup and a sick laptop. You could oauth the backblaze app to access onedrive directly, but if you want to back your onedrive up you need a different product IMO.


Shoutout to Arq backup which simply gives you an option in backup plans for what to do with cloud only files:

- report an error

- ignore

- materialize

Regardless, if you make it back up software that doesn’t give this level of control to users, and you make a change about which files you’re going to back up, you should probably be a lot more vocal with your users about the change. Vanishingly few people read release notes.


I honestly didn't even realize Backblaze had a clientside app. Very happy user of Arq - been running a daily scheduled dual backup of my HDD to an external NAS and Backblaze B2 for years with zero issues.


That was their whole business originally. The block storage is a newer offering.


Why no linux support?


(Arq developer here) Haven't gotten many requests for it at all over the years. I presume it's because there are so many free options for Linux.


Just wanted to say that for many years, Arq has been my backup solution. It's amazing and I advise it to everyone I know.


I for one would pay for Arq on Linux as I now do on Mac. It would be fantastic to be able to use the same "it just works" backup solution on all my computers.


we already have restic which is really really great.


If it's open-source, Linux support is only a few hours with Claude away.

If it's not open-source, but the protocol is documented, see above.

If it's not open-source, and the protocol isn't documented, well... that makes the decision easy, doesn't it?


Backups software written by Claude? No thanks.

I've used enough Claude coded applications that I wouldn't trust that with a backup, unless it had extensive tests along with it.


And I've used enough "gold standard" commercial applications, like the one being discussed in this very article, that I don't trust those either. If you recoil in horror at code written by LLMs, I'm afraid that the vendors you're already working with have some really bad news for you. You can get over it now or get over it later. You will get over it.

I can audit and verify Claude's output. Code running at BackBlaze, not so much. Take some responsibility for your data. Rest assured, nobody else will.


You are not wrong, but I just don't have time. My choices are pay someone or throw my hands up. I have been paying backblaze. But I recently had a drive die, and discovered the backups are missing .exe and .dll files, and so that part of the restore was worthless.

What time I do have, I've been using to try and figure out photo libraries. Nothing is working the way I need it to. The providers are a mess of security restrictions and buggy software.


The choices are maybe eat shit, or spend your own time auditing and polishing shit into something edible before eating it?


That's the general conclusion you will draw after reading the comments on this story, yes.


>You can get over it now or get over it later. You will get over it

You're forgetting the third option:

You can remain blissfully unaware of it.


You can remain blissfully unaware of it.

And you can read many accounts of the outcome of that strategy in this very thread.


"Dear Claude, please create an eztensive testing suite for this app. Love, cobertos"


"Great idea wise customer, I will certainly mock one out just for you!"


My favorite Peanuts comic was always the one where Linus is standing at an intersection next to a 'Push Button To Cross Street' sign. He is sucking his thumb and clutching his blanket despondently.

In the last panel, Charlie Brown tells him, "You have to move your feet, too."



D'oh! Thanks, it's been a while.


Love Arq!


That seems like a pretty straightforward issue to solve, to simply backup only those files that are actually on the system, not the stubs. If it's on your computer, it should able to get backed up. If it's just a shadow, a pointer, it doesn't.

Making the change without making it clear though, that's just awful. A clear recipe for catastrophic loss & drip drip drip of news in the vein of "How Backblaze Lost my Stuff"


The OP’s complaint is that the files were not backed up. If they had discovered that only stubs were backed up, I don’t think they’d be any happier.


Not what I meant: The other cloud storage services connected to the computer, eg OneDrive. Those files, when they are just stubs. I'm saying that Backblaze could simply not backup stubs, if the person isn't syncing the actual file to their drive. If they are, backblaze should back it up.


The point of the stubs is so you don’t have to know which cloud files are actually on your device at any given moment, because they will be fetched automatically. Marking files as “always local” is a niche feature, and in any case has nothing to do with whether you want those files backed up.

To have this thing you’re not supposed to need to worry about affect whether your files got backed up is exactly the problem here. The goal is to back up your files, whether they’re in the cloud or not.

I sympathize with Backblaze’s problem with their file change monitor, but then they should considee implementing connectors for OneDrive, Dropbox, etc. and back up files directly from the cloud.


“We only back up stuff from your computer” has always been their stance, and clearly it’s a way to reduce their costs. I can understand not wanting to engage in the “please backup these 10TB I have in Dropbox” requests.

I think backing up the materialized files is appropriate. That’s what they (used to) promise.


>The point of the stubs

Of course it is. So *you* don't have to know which cloud files are actually there. Doesn't mean backblaze can't know, and should work within that paradigm rather than not backup anything.

As for your (the user, not necessary you personally) expectation that Backblaze would backup the stubs (im not sure that would really matter, as you said in your own comment) regardless of its stub status, that's unreasonable-- that Backblaze would travers the stubs and... why? temporarily download them, upload to backblaze? That's not what they ever stated would happen and is a big stretch to expect what amounts to the extra service of backing up cloud drives simply because a user decides to have what amounts to a an 'ln <soft link>' to to a network drive. The do explicitly exclude that.

What is not reasonable on their part is to change any service at all that had previously been happening, regardless of whether it was or was not within the ToS, and likely contract law wouldn't support a claim on their part that there was an implied contract through ambiguity which courts will typically resolve in favor of an injured party, especially one in a position of lesser power in the relationship. I'm not claiming that's what happened here, my reading of any ToS has the same legitimacy as this comment on that. I'm saying they do claim they'll backup whatever is on the computer and unexcluded. and its wrong as a matter of basic provisioning of service to a customer of what was offered. That's the limit of my claim.


The stubs are the thing on your computer?


Imagine if they could detect stab or real file huh? Space technology, I know! Or just fucking copy them as stubs and what's actually downloaded as actually downloaded! Boggles the mind!

Or maybe just do what they do now, but WARN about that in HUGE RED LETTERS, in the website and the app, instead of burying it in an update note like weasels!


The whole "just sync everything, and if you can't seek everything, pretend to sync everything with fake files and then download the real ones ad-hoc" model of storage feels a bit ill-conceived to me. It tries to present a simple facade but I'm not sure it actually simplifies things. It always results in nasty user surprises and sometimes data loss. I've seen Microsoft OneDrive do the same thing to people at work.


I’ve lost data not realizing I was backing up placeholder files (iCloud).

Hiding the network always ends in pain. But never goes out of style.


Same. I lost a lot of photos this way. I've recently moved over to Immich + Borg backup with a 3-2-1 backup between a local synology NAS and BorgBase. Painful lesson, but at least now I feel much more confident. I've even built some end-to-end monitoring with Grafana.


Careful w that Synology NAS, mine's now a brick that may also have led to permanent data loss.


Thanks... hence, 3-2-1 backups with offsite :) appreciate it though. Will definitely be rolling my own NAS in the future, I just needed something easy at the time.


My own approach to simplicity generally means "hide complexity behind a simple interface" rather than pushing for simple implementations because I feel that too much emphasis on simplicity of implementations often means sacrificing correctness.

This particular example is a useful one for me to think about, because it's a version of hiding complexity in order to present a simple interface that I actually hate. (WYSIWYG editors is another one, for similar reasons: it always ends up being buggy and unpredictable.)


That would make sense for online-only files, but I have my Dropbox folder set to synchronize everything to my PC, and Backblaze still started skipping over it a few months ago. I reached out to support and they confirmed that they are just entirely skipping Dropbox/OneDrive/etc folders entirely, regardless of if the files are stored locally or not.


at least you can fix (bandaid) that by cloning your Dropbox folder to anotherfolder. Double taken space for the greater goodx


The primary trouble I have with backblaze was that this change was not clearly communicated, even if perhaps it could be justified.


That doesn't really make a lot of sense, though. Reading a file that's not actually on disk doesn't download it permanently. If I have zero of 10TB worth of files stored locally on my 1TB device, read them all serially, and measure my disk usage, there's no reason the disk should be full, or at least it should be cache that can be easily freed. The only time this is potentially a problem is if one of the files exceeds the total disk space available.

Hell, if I open a directory of photos and my OS tries to pull exif data for each one, it would be wild if that caused those files to be fully downloaded and consume disk space.


Right, but even if that’s working it breaks the user experience of services like this that ‘files I used recently are on my device’.

After a backup, you’d go out to a coffee shop or on a plane only to find that the files in the synced folder you used yesterday, and expected to still be there, were not - but photos from ten years ago were available!


That shouldn't be seen as Backblaze's problem. It's Dropbox's problem that they made their product too complicated for users to reason about. The original Dropbox concept was "a folder that syncs" and there would be nothing problematic about Backblaze or anything else trying to back it up like any other folder.

Today's Dropbox is a network file system with inscrutable cache behavior that seeks to hide from the users the information about which files are actually present. That makes it impossible for normal users to correctly reason about its behavior, to have correct expectations for what will be available offline or what the side effects of opening a file will be, and Backblaze is stuck trying to cope with a situation where there is no right answer.


If I backup a file, I need to read that file. The rest is in the management layer underneath that file.

Seems simple enough to do for Backblaze, no?


Do you really want Backblaze to ignore all the side effects of scanning through the entire contents a badly-designed network filesystem?


What I actually want is not a backup. That is just an artefact of the process.

What i want is restores. The ability to restore anything from ideally any point back in time.

How that is achieved is not my concern.

Obviously Backblaze does not achieve that, today.


> How that is achieved is not my concern.

You're dodging the question. Wanting to ignore the side effects does not mean they won't affect you.


There’s no reason to think that would happen - files you had from ten years ago would have been backed up ten years ago and would be skipped over today.


Good point (I’m assuming you’re right here and it trusts file metadata and doesn’t read files it’s already backed up?)

It would still happen with the first backup - or first connection of the cloud drive - though, which isn’t a great post-setup new user experience. It probably drove complaints and cancellations.

I feel like I’ve accidentally started defending the concept of not backing up these folders, which I didn’t really intend to. I’d also want these backed up. I’m just thinking out loud about the reasons the decision was made.


It's generally now handled decently well, but with three or four of these things it can make backups take annoying long as without "smarts" (which are not always present) it may force a download of the entire OneDrive/Box each time - even if it never crashes out.


> it may force a download of the entire OneDrive/Box each time - even if it never crashes out.

I am not aware of any evidence supporting this.


This is a complexity that makes it harder, but not insurmountable.

It would be reasonable to say that if you run the file sync in a mode that keeps everything locally, then Backblaze should be backing it up. Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.


> Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.

When you have a couple terabytes of data in that drive, is it acceptable to cycle all that data and use all that bandwidth and wear down your SSD at the same time?

Also, high number of small files is a problem for these services. I have a large font collection in my cloud account and oh boy, if I want to sync that thing, the whole thing proverbially overheats from all the queries it's sending.


Reading your comments, it sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files. I know you haven’t technically said that, but that’s what it sounds like.

I assume you don’t think that, so I’m curious, what would you propose positively?


> I know you haven’t technically said that, but that’s what it sounds like.

Yes, I didn't technically said that.

> It sounds like you are arguing it is impossible to backup files in Dropbox in any reasonable way, and therefore nobody should backup their cloud files.

I don't argue neither, either.

What I said is with "on demand file download", traditional backup software faces a hard problem. However, there are better ways to do that, primary candidate being rclone.

You can register a new application ID for your rclone installation for your Google Drive and Dropbox accounts, and use rclone as a very efficient, rsync-like tool to backup your cloud storage. That's what I do.

I'm currently backing up my cloud storages to a local TrueNAS installation. rclone automatically hash-checks everything and downloads the changed ones. If you can mount Backblaze via FUSE or something similar, you can use rclone as an intelligent MITM agent to smartly pull from cloud and push to Backblaze.

Also, using RESTIC or Borg as a backup container is a good idea since they can deduplicate and/or only store the differences between the snapshots, saving tons of space in the process, plus encrypting things for good measure.


This. You should not try to backup your local cache of cloud files as if those were your local files. Use a tool that talks to the cloud storage directly.

Use tools with straightforward, predictable semantics, like rclone, or synching, or restic/Borg. (Deduplication rules, too.)


My understanding of Backblaze Computer Backup is it is not a general purpose, network accessible filesystem.[0] If you want to use another tool to backup specific files, you'd use their B2 object storage platform.[1] It has an S3 compatible API you can interact with, Computer Backup does not.

But generally speaking, I'd agree with your sentiment.

[0]: https://www.backblaze.com/computer-backup/docs/supported-bac...

[1]: https://www.backblaze.com/docs/cloud-storage-about-backblaze...


But if the files are only on the remote storage and not local, chances are they haven't been modified recently, so it shouldn't download them fully, just check the metadata cache for size / modification time and let them be if they didn't change.

So, in practice, you shouldn't have to download the whole remote drive when you do an incremental backup.


You can't trust size and modification time all the time, though mdate is a better indicator, it's not foolprooof. The only reliable way will be checksumming.

Interestingly, rclone supports that on many providers, but to be able to backblaze support that, it needs to integrate rclone, connect to the providers via that channel and request checks, which is messy, complicated, and computationally expensive. Even if we consider that you won't be hitting API rate limits on the cloud provider.


If you can’t trust modification time you are doing something so unusual that you probably need to be handling your backups privately anyway.


I don't think so.

Sometimes modification time of a file which is not downloaded on computer A, but modified by computer B is not reflected immediately to computer A.

Henceforth, backup software running on computer A will think that the file has not been modified. This is a known problem in file synchronization. Also, some applications modifying the files revert or protect the mtime of the file for reasons. They are rare, but they're there.


Then do it in memory, assuming those services allow you to read the files like that. It sounds like they do based on your other comments.


The problem is, downloading files and disk management is not in your control, that part is managed by the cloud client (dropbox, google drive, et. al) transparently. The application accessing the file is just waiting akin to waiting for a disk spin up.

The filesystem is a black box for these software since they don't know where a file resides. If you want control, you need to talk with every party, incl. the cloud provider, a-la rclone style.


Why would they do new backups of old files all the time? They would just skip those.


Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.

And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.


> Unless it does something very weird it won't trigger all those files to download at the same time. That shouldn't be a worry.

The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.

The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.


Why would it use either of those on all the files at once? It should only be opening enough files to fill the upload buffer.


I think you might be confusing Backblaze reading files and how Dropbox/OneDrive/Nextcloud/etc. work. NC doesn't enable this by default (I don't think), but Windows calls it virtual file support. There is no avoiding filling the upload buffer, because Backblaze has zero control over how Dropbox downloads files. When Backblaze requests that a file be opened and read, Windows will ask Dropbox or whatever to open the file for it, and to read it. How that is done is up to whatever handles the virtual files. To Backblaze, your Dropbox folder is a normal directory with all that that entails, so Backblaze thinks that it can just zip through the directory and it'll read data from disk, even though that isn't really what's happening. I had to exclude my Nextcloud directory from my Duplicati backups for precisely this reason -- my Nextcloud is hosted on my server, and Duplicati was sending it so many requests it would cause my server to start sending back error 500s.

And no, my server isn't behind cloudflare, primarily because I don't have $200 to throw at them to allow me to proxy arbitrary TCP/UDP ports through their network, and I don't know how to tell CF "Hey, only proxy this traffick but let me handle everything else" (assuming that's even possible given that the usual flow is to put your entire domain behind them).


No, I'm not confusing anything.

Dropbox and onedrive can handle backblaze zipping through and opening many files. The risk is getting too many gigabytes at once, but that shouldn't happen because backblaze should only open enough for immediate upload. If it does happen it's very easily fixed.

If it overloads nextcloud by hitting too many files too fast, that's a legitimate issue but it's not what OP was worried about.


The issue you’re missing is that the abstraction Dropbox/OneDrive/etc provide is not that of an NFS. When an application triggers the download of a file, it hydrates the file to the local file system and keeps it there. So if Backblaze triggers the download of a TB of files, it will consume a TB of local file system space (which may not even exist).


It won't keep it permanently. That would break under normal use.

Keeping recent files will work fine with a program that goes through them as fast as it can upload (which is not super fast).


It does keep them permanently. Dropbox is not a NAS and does not pretend to be one.

> When you open an online-only file from the Dropbox folder on your computer, it will automatically download and become available offline. This means you’ll need to have enough hard drive space for the file to download before you can open it. You can change it back to online-only by following the instructions below.

https://help.dropbox.com/sync/make-files-online-only

Same exact behavior for OneDrive, though it apparently does have a Windows integration to eventually migrate unused files back to online-only if enabled.

> When you open an online-only file, it downloads to your device and becomes a locally available file. You can open a locally available file anytime, even without Internet access. If you need more space, you can change the file back to online only. Just right-click the file and select "Free up space."

https://support.microsoft.com/en-us/office/save-disk-space-w...


Maybe it'll, maybe it won't, but it'll cycle all files in the drive and will stress everything from your cloud provider to Backblaze, incl. everything in between; software and hardware-wise.


That sounds very acceptable to get those files backed up.

It shouldn't stress things to spend a couple weeks relaying a terabyte in small chunks. The most likely strain is on my upload bandwidth and yeah that's the cost of cloud backup, more ISPs need to improve upload.


I mean, cycling a couple of terabytes of data over a 512GB drive is at least full 4 writes, which is too much for that kind of thing.

> more ISPs need to improve upload.

I was yelling the same things to the void for the longest time, then I had a brilliant idea of reading the technical specs of the technology coming to my home.

Lo and behold, the numbers I got were the technical limits of the technology that I had at home (PON for the time being), and going higher would need a very large and expensive rewiring with new hardware and technology.


4 writes out of what, 3000? For something you'll need to do once or twice ever? It's fine. You might not even eat your whole Drive Write Per Day quota for the upload duration, let alone the entire month.

> the technical limits of the technology that I had at home (PON for the time being)

Isn't that usually symmetrical? Is yours not?


> 4 writes out of what, 3000?

Depends on your device capacity and how much is in actual use. Wear leveling things also wear things while it moves things around.

> For something you'll need to do once or twice ever?

I don't know you, but my cloud storage is living, and even if it's not living, if the software can't smartly ignore files, it'll pull everything in, compare and pass without uploading, causing churns in every backup cycle.

> Isn't that usually symmetrical? Is yours not?

GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.


> I don't know you, but my cloud storage is living

But you're probably changing less than 1% each day. And new changes are likely already in the cache, no need to download them.

> if the software can't smartly ignore files, it'll

Backblaze checks the modification date.

> GPON (Gigabit PON) is asymmetric. Theoretical limits is 2.4Gbps down, 1.2Gbps up. I have 1000Mbit/75Mbit at home.

2:1 is fine. If you're getting worse than 10:1 then that does sound like your ISP failed you?


How do you know how often those files need to be backed up without reading them? Timestamps and sizes are not reliable, only content hashes. How do you get a content hash? You read the file.


If timestamps aren’t reliable, you fall way outside the user that can trust a third party backup provider. Name a time when modification timestamp fails but a cloud provider will catch the need to download the file.


Backblaze already trusts the modification date.


Why would it do that more than once unless you are modifying 4TB of data every day, in which case you are causing the problem.


I don't know how your client works, but reading metadata (e.g. requesting size) off any file causes some cloud clients to download it completely.

Of course I'm not modifying 4TB on a cloud drive, every day.


Can you name such a client? That sounds like a terrible experience.


The issue really isn't that it's not backing up the folder (which I can see an argument for both sides and various ways to do it) - it's that they changed what they did in a surprising way.

Your backup solution is not something you ever want to be the source of surprises!


Cloud placeholders have been a feature for years, plenty of programs have mitigations for this behavior.


The fault is with the PC manufacturers screwing you on disk space claiming 1TB, when its only 256gb. bait and switch




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: