FR version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
47% Positive
Analyzed from 13718 words in the discussion.
Trending Topics
#backblaze#backup#files#git#file#don#data#storage#backups#using

Discussion (404 Comments)Read Original on HackerNews
We discovered this change recently because my dad was looking for a file that Dropbox accidentally overwrote which at first we said “no problem. This is why we pay for backblaze”
We had learned that this policy had changed a few months ago, and we were never notified. File was unrecoverable
If anyone at backblaze is reading this, I pay for your product so I can install you on my parents machine and never worry about it again. You decided saving on cloud storage was worth breaking this promise. Bad bad call
I need it to capture local data, even though that local data is getting synced to Google Drive. Where we sync our data really has nothing to do with Backblaze backing up the endpoint. We don't wholly trust sync, that's why we have backup.
On my personal Mac I have iCloud Drive syncing my desktop, and a while back iCloud ate a file I was working on. Backblaze had it captured, thankfully. But if they are going to exclude iCloud Drive synced folders, and sounds like that is their intention, Backblaze is useless to me.
I have no clue why people still use it and I'd cut my losses if I were you, either backup to the cloud or pull from it, not both at the same time like an absolute tictac.
You can build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
It works perfectly fine as long as you keep how it works in mind, and probably most importantly don't have multiple users working directly on the same file at once.
I've been using these systems for over a decade at this point and never had a problem. And if I ever do have one, my real backup solution has me covered.
Now I need a new solution that will work for my parents
You can't connect to their Computer Backup service through third-party software.
- report an error
- ignore
- materialize
Regardless, if you make it back up software that doesn’t give this level of control to users, and you make a change about which files you’re going to back up, you should probably be a lot more vocal with your users about the change. Vanishingly few people read release notes.
If it's not open-source, but the protocol is documented, see above.
If it's not open-source, and the protocol isn't documented, well... that makes the decision easy, doesn't it?
Making the change without making it clear though, that's just awful. A clear recipe for catastrophic loss & drip drip drip of news in the vein of "How Backblaze Lost my Stuff"
Or maybe just do what they do now, but WARN about that in HUGE RED LETTERS, in the website and the app, instead of burying it in an update note like weasels!
Hiding the network always ends in pain. But never goes out of style.
This particular example is a useful one for me to think about, because it's a version of hiding complexity in order to present a simple interface that I actually hate. (WYSIWYG editors is another one, for similar reasons: it always ends up being buggy and unpredictable.)
Your backup solution is not something you ever want to be the source of surprises!
Hell, if I open a directory of photos and my OS tries to pull exif data for each one, it would be wild if that caused those files to be fully downloaded and consume disk space.
After a backup, you’d go out to a coffee shop or on a plane only to find that the files in the synced folder you used yesterday, and expected to still be there, were not - but photos from ten years ago were available!
Today's Dropbox is a network file system with inscrutable cache behavior that seeks to hide from the users the information about which files are actually present. That makes it impossible for normal users to correctly reason about its behavior, to have correct expectations for what will be available offline or what the side effects of opening a file will be, and Backblaze is stuck trying to cope with a situation where there is no right answer.
I am not aware of any evidence supporting this.
It would be reasonable to say that if you run the file sync in a mode that keeps everything locally, then Backblaze should be backing it up. Arguably they should even when not in that mode, but it'll churn files repeatedly as you stream files in and out of local storage with the cloud provider.
When you have a couple terabytes of data in that drive, is it acceptable to cycle all that data and use all that bandwidth and wear down your SSD at the same time?
Also, high number of small files is a problem for these services. I have a large font collection in my cloud account and oh boy, if I want to sync that thing, the whole thing proverbially overheats from all the queries it's sending.
I assume you don’t think that, so I’m curious, what would you propose positively?
So, in practice, you shouldn't have to download the whole remote drive when you do an incremental backup.
And, as a separate note, they shouldn't be balking at the amount of data in a virtualized onedrive or dropbox either considering the user could get a many-terabyte hard drive for significantly less money.
The moment you call read() (or fopen() or your favorite function), the download will be triggered. It's a hook sitting between you and the file. You can't ignore it.
The only way to bypass it is to remount it over rclone or something and use "ls" and "lsd" functions to query filenames. Otherwise it'll download, and it's how it's expected to work.
This is another example in disguise of two people disagreeing about what "unlimited" means in the context of backup, even if they do claim to have "no restrictions on file type or size" [2].
[1] https://www.reddit.com/r/backblaze/comments/jsrqoz/personal_... [2] https://www.backblaze.com/cloud-backup/personal
Always prefer businesses who are upfront and honest about what they can offer their users, in a sustainable way.
Or that they're targeting the mass retail market, where people are technically ignorant, and "unlimited" is required to compete.
And statistically-speaking, is viable as long as a company keeps its users to a normal distribution.
And there speaks marketing.
Doing a bait-and-switch on a percentage of your paying customers, no matter how small the percentage is, may be "viable" for the company, but it's a hostile experience for those users, and companies deserve to be called out for it.
So… Marketing has taken over, just as parent comment said. Got it.
It's a bit safer when you know your playbook - if there was unlimited (as it is now) and unlimited plus (where they backup "cloud storage cached files") and unlimited pro max premier (where they backup entire cloud storages) you'd at least know where you stand, and you'd change "holy shit my important file I though was backed up isn't and now it's gone forever" to "I have to pay $10 a more a month or take on this risk".
I understand this, many others do too, the only difference seems to be that we're not willing to play those games. Others are, and that's OK, just giving my point of view which I know is shared by many others who are bit stricter about where we host our backups. Instead of "statistical games" we prefer "upfront limitations", as one example.
But we'd always have a few people at the end of the semester print 493 blank pages using up all of their print quota they'd "paid for". No sir, you didn't pay for 500 pages of printing a semester, we'd let you print as much as you needed, we just had to put a quota in place to prevent some joker from wallpapering the lecture hall.
It was hard to express what we meant and "unlimited" didn't cut it.
Nobody has turned the moon into a hard drive yet.
The new and very interesting problem with their business model is that drive prices have doubled - and in some cases, more than doubled - in the last 12 months.
Backblaze has a lot of debt and at some point the numbers don't make sense anymore.
Oh well, I guess this is why we're given two kidneys.
If a company uses the word unlimited to describe their service, but then attempts to weasel out of it via their T&Cs, that doesn't constitute a disagreement over the meaning of the word unlimited. It just means the company is lying.
Unlimited however, they can offer. I don’t see how people get into mental block of thinking something is nefarious when a company offers you unlimited hosting or data. Yes, they know it’s impossible if everyone took full advantage of that. They also know most people won’t and so they don’t have to spend time worrying about it. It’s a simple actuarial exercise to work out the pricing that covers the use of your users.
Back in the early 2000s I ran a web hosting service that was predominantly a LAMP stack shared hosting environment. It had several unlimited plans and they were easy to estimate/price. The only times I had an issue of supporting a heavy user, it would turn out they were doing something unrestricted. Back then, it was usually something pron or mp3 related. So the user would get kicked off for that. I didn’t have any issues with supporting the usage load if it was within TOS. The margins were so high it was almost impossible to find a user that could give me any trouble from an economic standpoint.
I don't quite understand why it's still like this; it's probably the biggest reason why git tends to play poorly with a lot of filesystem tools (not just backups). If it'd been something like an SQLite database instead (just an example really), you wouldn't get so much unnecessary inode bloat.
At the same time Backblaze is a backup solution. The need to back up everything is sort of baked in there. They promise to be the third backup solution in a three layer strategy (backup directly connected, backup in home, backup external), and that third one is probably the single most important one of them all since it's the one you're going to be touching the least in an ideal scenario. They really can't be excluding any files whatsoever.
The cloud service exclusion is similarly bad, although much worse. Imagine getting hit by a cryptoworm. Your cloud storage tool is dutifully going to sync everything encrypted, junking up your entire storage across devices and because restoring old versions is both ass and near impossible at scale, you need an actual backup solution for that situation. Backblaze excluding files in those folders feels like a complete misunderstanding of what their purpose should be.
Why should a file backup solution adapt to work with git? Or any application? It should not try to understand what a git object is.
I’m paying to copy files from a folder to their servers just do that. No matter what the file is. Stay at the filesystem level not the application level.
It's that to back up a folder on a filesystem, you need to traverse that folder and check every file in that folder to see if it's changed. Most filesystem tools usually assume a fairly low file count for these operations.
Git, rather unusually, tends to produce a lot of files in regular use; before packing, every commit/object/branch is simply stored as a file on the filesystem (branches only as pointers). Packing fixes that by compressing commit and object files together, but it's not done by default (only after an initial clone or when the garbage collector runs). Iterating over a .git folder can take a lot of time in a place that's typically not very well optimized (since most "normal" people don't have thousands of tiny files in their folders that contain sprawled out application state.)
The correct solution here is either for git to change, or for Backblaze to implement better iteration logic (which will probably require special handling for git..., so it'd be more "correct" to fix up git, since Backblaze's tools aren't the only ones with this problem.)
When I backup my computer the .git folders are among the most important things on there. Most of my personal projects aren't pushed to github or anywhere else.
Fortunately I don't use Backblaze. I guess the moral is don't use a backup solution where the vendor has an incentive to exclude things.
That's a really important fact that's getting buried so I'd like to highlight it here.
This is a joke, but honestly anyone here shouldn't be directly backing up their filesystems and should instead be using the right tool for the job. You'll make the world a more efficient place, have more robust and quicker to recover backups, and save some money along the way.
See Fossil (https://fossil-scm.org/)
P.S. There's also (https://www.sourcegear.com/vault/)
> SourceGear Vault Pro is a version control and bug tracking solution for professional development teams. Vault Standard is for those who only want version control. Vault is based on a client / server architecture using technologies such as Microsoft SQL Server and IIS Web Services for increased performance, scalability, and security.
It's the same reason why the postgres autovacuum daemon tends to be borderline useless unless you retune it[0]: the defaults are barmy. git gc only runs if there's 6700 loose unpacked objects[1]. Most typical filesystem tools tend to start balking at traversing ~1000 files in a structure (depends a bit on the filesystem/OS as well, Windows tends to get slower a good bit earlier than Linux).
To fix it, running
> git config --global gc.auto 1000
should retune it and any subsequent commit to your repo's will trigger garbage collection properly when there's around 1000 loose files. Pack file management seems to be properly tuned by default; at more than 50 packs, gc will repack into a larger pack.
[0]: For anyone curious, the default postgres autovacuum setting runs only when 10% of the table consists of dead tuples (roughly: deleted+every revision of an updated row). If you're working with a beefy table, you're never hitting 10%. Either tune it down or create an external cronjob to run vacuum analyze more frequently on the tables you need to keep speedy. I'm pretty sure the defaults are tuned solely to ensure that Postgres' internal tables are fast, since those seem to only have active rows to a point where it'd warrant autovacuum.
[1]: https://git-scm.com/docs/git-gc
> git config --global gc.auto 1000
with the long option name, and no `=`.
I contacted the support asking WTF, "oh the file got deleted at some point, sorry for that", and they offered me 3 months of credits.
I do not trust my Backblaze backups anymore.
First thing I noticed is that if it can't download a file due to network or some other problem then it just skips it. But you can force it to retry by modifying its job file which is just an SQLite DB. Also it stores and downloads files by splitting them into small chunks. It stores checksums of these chunks, but it doesn't store the complete checksum of the file, so judging by how badly the client is written I can't be sure that restored files are not corrupted after the stitching.
Then I found out that it can't download some files even after dozens of retries because it seems they are corrupted on Backblaze side.
But the most jarring issue for me is that it mangled all non-ascii filenames. They are stored as UTF-8 in the DB, but the client saves them as Windows-1252 or something. So I ended up with hundreds of gigabytes of files with names like фикац, and I can't just re-encode these names back, because some characters were dropped during the process.
I wanted to write a script that forces Backblaze Client to redownload files, logs all files that can't be restored, fixes the broken names and splits restored files back into chunks to validate their checksums against the SQLite DB, but it was too big of a task for me, so I just procrastinated for 3 years, while keeping paying monthly Backblaze fees because it's sad to let go of my data.
I wonder if they fixed their client since then.
- - -
Hey, I tried restoring a file from my backup — downloading it directly didn't work, and creating a restore with it also failed – I got an email telling me contract y'all about it.
Can you explain to me what happened here, and what can I do to get my file(s?) back?
- - -
Hi Jan,
Thanks for writing in!
I've reached out to our engineers regarding your restore, and I will get back to you as soon as I have an update. For now, I will keep the ticket open.
- - -
Hi Jan,
Regarding the file itself - it was deleted back in 2022, but unfortunately, the deletion never got recorded properly, which made it seem like the file still existed.
Thus, when you tried to restore it, the restoration failed, as the file doesn't actually exist anymore. In this case, it shouldn't have been shown in the first place.
For that, I do apologize. As compensation, we've granted you 3 monthly backup credits which will apply on your next renewal. Please let me know if you have any further questions.
- - -
That makes me even more confused to be honest - I’ve been paying for forever history since January 2022 according to my invoices?
Do you know how/when exactly it got deleted?
- - -
Hi Jan,
Unfortunately, we don't have that information available to us. Again, I do apologize.
- - -
I really don’t want to be rude, but that seems like a very serious issue to me and I’m not satisfied with that response.
If I’m paying for a forever backup, I expect it to be forever - and if some file got deleted even despite me paying for the “keep my file history forever” option, “oh whoops sorry our bad but we don’t have any more info” is really not a satisfactory answer.
I don’t hold it against _you_ personally, but I really need to know more about what happened here - if this file got randomly disappeared, how am I supposed to trust the reliability of anything else that’s supposed to be safely backed up?
- - -
Hi Jan,
I'll inquire with our engineers tomorrow when they're back in, and I'll update you as soon as I can. For now, I will keep the ticket open.
- - -
Appreciate that, thank you! It’s fine if the investigation takes longer, but I just want to get to the bottom of what happened here :)
- - -
Hi Jan,
Thanks for your patience.
According to our engineers and my management team:
With the way our program logs information, we don't have the specific information that explains exactly why the file was removed from the backup. Our more recent versions of the client, however, have vastly improved our consistency checks and introduced additional protections and audits to ensure complete reliability from an active backup.
Looking at your account, I do see that your backup is currently not active, so I recommend running the Backblaze installer over your current installation to repair it, and inherit your original backup state so that our updates can check your backup.
I do apologize, and I know it's not an ideal answer, but unfortunately, that is the extent of what we can tell you about what has happened.
- - -
I gave up escalating at this point and just decided these aren’t trusted anymore.
The files in question are four year old at this point so it’s hard for me conclusively state, so I guess there might be a perfect storm of that specific file being deleted because it was due to expire before upgraded to “keep history forever”, but I don’t think it’s super likely, and I absolutely would expect them to have telemetry about that in any case.
If anyone from Backblaze stumbles upon it and wants to escalate/reinvestigate, the support ID is #1181161.
Especially if they allow them restoring all your data onto a drive and shipping it to you, they pretty clearly should have enough information available to them to test restorations of data, and the number of times I've heard that failure mode ("oh, we didn't track deletions well enough, so we only found out we deleted it when you tried restoring"), plus them saying they have made improvements to avoid this exact failure mode in newer client versions, makes me think they should have enough reports to investigate it.
...which makes me wonder if they did, and decided they would go bankrupt if they told people how much data they lost, so they decided to bet on people not trying restores on a lot of the lost data.
My experience using restic has been excellent so far, snapshots take 5 mins rather than 30 mins with backblaze's Mac client. I just hope I can trust it…
But if that's truly their stance, then they are being deceptive about their non-business offering at the point of sale.
EDIT - see my other comment where I found the actual email
According to support's reply just now, my backups are crippled just like every other customer. No git, no cloud synced folders, even if those folders are fully downloaded locally.
(This is also my personal backup strategy for iCloud Drive: one Mac is set to fully download complete iCloud contents, and that Mac backs up to Backblaze.)
The one thing they have to do is backup everything and when you see it in their console you can rest assured they are going to continue to back it up.
They’ve let the desktop client linger, it’s difficult to add meaningful exceptions. It’s obvious they want everyone to use B2 now.
Borg backup is a good tool in my opinion and has everything that I need (deduplication, compression, mountable snapshot.
Hetzner Storage Box is nothing fancy but good enough for a backup and is sensibly cheaper for the alternatives (I pay about 10 eur/month for 5TB of storage)
Before that I was using s3cmd [3] to backup on a S3 bucket.
[1] https://www.borgbackup.org/
[2] https://s3tools.org/s3cmd
[3] https://s3tools.org/s3cmd
[2] https://www.hetzner.com/storage/storage-box/
D'argh.
Just this weekend, my backup tool went rogue and exhausted quota on rsync.net (Some bad config by me on Borg.) Emailed them, they promptly added 100 GB storage for a day so that I could recover the situation. Plus, their product has been rock solid since a few years I've been using them.
Just to clarify - there are discounted plans that don't have free ZFS snapshots but you can still have them ... they just count towards your quota.
If your files don't change much - you don't have much "churn" - they might not take up any real space anyway.
There is 100% a difference between "dead data" (eg: movie.mp4) and "live data" (eg: a git directory with `chmod` attributes)- S3 and similar often don't preserve "attributes and metadata" without a special secondary pass, even though the `md5` might be the same.
</bzexclusions><excludefname_rule plat="mac" osVers="*" ruleIsOptional="f" skipFirstCharThenStartsWith="*" contains_1="/users/username/dropbox/" contains_2="*" doesNotContain="*" endsWith="*" hasFileExtension="*" />
That is the exact path to my Dropbox folder, and I presume if I move my Dropbox folder this xml file will be updated to point to the new location. The top of the xml file states "Mandatory Exclusions: editing this file DOES NOT DO ANYTHING".
.git files seem to still be backing up on my machine, although they are hidden by default in the web restore (you must open Filters and enable Show Hidden Files). I don't see an option to show hidden files/folders in the Backblaze Restore app.
That would be nice, they'd be able to get their history back!
Try checking bzexcluderules_editable.xml. A few years ago, Backblaze would back up .git folders for Mac but not Windows. Not sure if this is still the case.
Basically it works like this:
- I have syncthing moving files between all my devices. The larger the device, the more stuff I move there[2]. My phone only has my keepass file and a few other docs, my gaming PC has that plus all of my photos and music, etc.
- All of this ends up on a raspberry pi with a connected USB harddrive, which has everything on it. Why yes, that is very shoddy and short term! The pi is mirrored on my gaming PC though, which is awake once every day or two, so if it completely breaks I still have everything locally.
- Nightly a restic job runs, which backs up everything on the pi to an s3 compatible cloud[3], and cleans out old snapshots (30 days, 52 weeks, 60 months, then yearly)
- Yearly I test restoring a random backup, both on the pi, and on another device, to make sure there is no required knowledge stuck on there.
This is was somewhat of a pain to setup, but since the pi is never off it just ticks along, and I check it periodically to make sure nothing has broken.
[1] there is always weirdness with these tools. They don't sync how you think, or when you actually want to restore it takes forever, or they are stuck in perpetual sync cycles
[2] I sync multiple directories, broadly "very small", "small", "dumping ground", and "media", from smallest to largest.
[3] Currently Wasabi, but it really doens't matter. Restic encrypts client side, you just need to trust the provider enough that they don't completely collapse at the same time that you need backups.
Props for getting this implemented and seemingly trusted... I wish there was an easier way to handle some of this stuff (eg: tiny secure key material => hot syncthing => "live" git files => warm docs and photos => cold bulk movies, isos, etc)... along with selective "on demand pass through browse/fetch/cache"
They all have different policy, size, cost, technical details, and overall SLA/quality tradeoffs.
~ 5 years ago, I had a development flow that involved a large source tree (1-10K files, including build output) that was syncthing-ed over a residential network connection to some k8s stuff.
Desyncs/corruptions happened constantly, even though it was a one-way send.
I've never had similar issues with rsync or unison (well, I have in unison, but that's two-way sync, and it always prompted to ask for help by design).
Anyway, my decade-old synology is dying, so I'm setting up a replacement. For other reasons (mostly a decade of systemd / pulse audio finding novel ways to ruin my day, and not really understanding how to restore my synology backups), I've jumped ship over to FreeBSD. I've heard good things about using zfs to get:
saniod + syncoid -> zfs send -> zfs recv -> restic
In the absence of ZFS, I'd do:
rsync -> restic
Or:
unison <-> unison -> restic.
So, similar to what you've landed on, but with one size tier. I have docker containers that the phone talks to for stuff like calendars, and just have the source of the backup flow host my git repos.
One thing to do no matter what:
Write at least 100,000 files to the source then restore from backup (/ on a linux VM is great for this). Run rsync in dry run / checksum mode on the two trees. Confirm the metadata + contents match on both sides. I haven't gotten around to this yet with the flow I just proposed. Almost all consumer backup tools fail this test. Comments here suggest backblaze's consumer offering fails it badly. I'm using B2, but I haven't scrubbed my backup sets in a while. I get the impression it has much higher consistency / durability.
One particular issue I've encountered is that syncthing 2.x does not work well for systems w/o an SSD due to the storage backend switching to sqlite which doesn't perform as well as leveldb on HDDs, the scans of the 6TB folder was taking an excessively long time to complete compared to 1.x using leveldb. I haven't encountered any issues with mixing the use of 1.x and 2.x in my setup. The only other issues I've encountered are usually related to filename incompatibilites between filesystems.
The technical and performance implications of backing-up cloud mount-points are real, but that's zero excuse for the way this change was communicated.
This is a royal screw-up in corporate communications and I would not be surprised if it makes a huge negative impact in their bottom line and results in a few terminations.
I had no idea that it was such a good bargain. I used to be a Crashplan user back in the day, and I always thought Backblaze had tiered limits.
I've been using Duplicati to sync a lot of data to S3's cheapest tape-based long term storage tier. It's a serious pain in the ass because it takes hours to queue up and retrieve a file. It's a heavy enough process that I don't do anything nearly close to enough testing to make sure my backups are restorable, which is a self-inflicted future injury.
Here's the thing: I'm paying about $14/month for that S3 storage, which makes $99/year a total steal. I don't use Dropbox/Box/OneDrive/iCloud so the grievances mentioned by the author are not major hurdles for me. I do find the idea that it is silently ignoring .git folders troubling, primarily because they are indeed not listed in the exclusion list.
I am a bit miffed that we're actively prevented from backing up the various Program Files folders, because I have a large number of VSTi instruments that I'll need to ensure are rcloned or something for this to work.
Maybe there's something newer/better now (and I bought lifetime licenses of it long ago), but it works for me.
That said, I use Arq + Backblaze storage and I think my monthly bill is very low, like under $5. Though I haven't backed-up much media there yet, but I do have control over what is being backed-up.
I never trust them again with my data.
`git reflog` is your friend. You can recover from almost any mistake, including force-pushed branches.
You should try downloading one of your backed up git repos to see if it actually does contain the full history, I just checked several and everything looks good.
> Bob (Backblaze Help)
> Aug 5, 2021, 11:33 PDT
> Hello there,
> Thank you for taking the time to write in,
> Unfortunately .git directories are excluded by Backblaze by default. File
> changes within .git directories occur far too often and over so many files
> that the Backblaze software simply would not be able to keep up. It's beyond
> the scope of our application.
> The Personal Backup Plan is a consumer grade backup product. Unfortunately we
> will not be able to meet your needs in this regard.
> Let me know if you have any other questions.
> Regards,
> Bob The Backblaze Team
Not backing up .git folders however is completely unacceptable.
I have hundreds of small projects where I use git track of history locally with no remote at all. The intention is never to push it anywhere. I don't like to say these sorts of things, and I don't say it lightly when I say someone should be fired over this decision.
I know this is besides the point somewhat, but: Learn your tools people. The commit history could probably have been easily restored without involving any backup. The commits are not just instantly gone.
Indeed, the commits and blobs might even have still been available on the GitHub remote, I'm not sure they clean them on some interval or something, but bunch of stuff you "delete" from git still stays in the remote regardless of what you push.
Any suggestions for alternatives?
There are 2 components in my mind: the backup "agent" (what runs on your laptop/desktop/server) and the storage provider (which BB is in this context).
What do people recommend for the agent? (I understand some storage providers have their own agents) For Linux/MacOS/Windows.
What do people recommend for the storage provider? Let's assume there are 1TB of files to be backed up. 99.9% don't change frequently.
I’ve added restic to my backup routine, pointed at cloud files and other critical data
JottaCloud is "unlimited" for $11.99 a month (your upload speed is throttled after 5TB).
I've been using them for a few years for backing up important files from my NAS (timemachine backups, Immich library, digitised VHS's, Proxmox Backup Server backups) and am sitting at about 3.5TB.
But .git? It does not mean you have it synced to GitHub or anything reliable?
If you do anything then only backup the .git folder and not the checkout.
But backing up the checkout and not the .git folder is crazy.
They don't need to be in my case, I'm only using them now because of existing shortcuts and VM shares and programs configured to source information from them. That doesn't mean I don't want them backed up.
Same for OneDrive: Microsoft configured my account for OneDrive when I set it up. Then I immediately uninstalled it (I don't want it). But I didn't notice that my desktop and documents folders live there. I hate it. But by the time I noticed it, it was already being used as a location for multiple programs that would need to be reconfigured, and it was easier to get used to it than to fix it. Several things I've forgotten about would likely break in ways I wouldn't notice for weeks/months. Multiple self-hosted servers for connecting to my android devices would need to reindex (Plex, voidtools everything, several remote systems that mount via sftp and connected programs would decide all my files were brand new and had never been seen before)
No they are not. This is explicitly addressed in the article itself.
both services have internal backups to reduce the chance they lose data
both services allow some limited form of "going back to older version" (like the article states itself).
Just because the article says "sync is not backup" doesn't mean that is true, I mean it literally is backup by definition as it: makes a copy in another location and even has versioning.
It's just not _good enough_ backup for their standards. Maybe even standards of most people on HN, but out there many people are happy with way worse backups, especially wrt. versioning for a lot of (mostly static) media the only reason you need version rollback is in case of a corrupted version being backed up. And a lot of people mostly backup personal photos/videos and important documents, all static by nature.
Through
1. it doesn't really fulfill the 3-2-1 rules it's only 2-1-1 places (local, one backup on ms/drop box cloud, one offsite). Before when it was also backed up to backblaze it was 3-2-1 (kinda). So them silently stopping still is a huge issue.
2. newer versions of the 3-2-1 rule also say treat 2 not just as 2 backups, but also 2 "vendors/access accounts" with the one-drive folder pretty much being onedrive controlled this is 1 vendor across local and all backups. Which is risky.
You are using it to mean "maintaining full version history", I believe? Another important consideration.
No, they are using it to mean “backed up”. Like, “if this data gets deleted or is in any way lost locally, it’s still backed remotely (even years later, when finally needed)”.
I’m astonished so many people here don’t know what a backup is! No wonder it’s easy for Backblaze to play them for fools.
I've also configured encrypted cloud backups to a different geographic region and off-site backups to a friend's NAS (following the 3-2-1 backup rule). It does help having 2.5Gb networking as well, but owning your data is more important in the coming age of sloppy/degrading infrastructure and ransomware attacks.
(as a side note, it's funny to see see them promoting their native C app instead of using Java as a "shortcut". What I wouldn't give for more Java apps nowadays)
If you've got huge amounts of files in Onedrive and the backup client starts downloading everyone of them (before it can reupload them again) you're going to run into problems.
But ideally, they'd give you a choice.
Complete lack of communication (outside of release notes, which nobody really reads, as the article too states) is incompetence and indeed worrying.
Just show a red status bar that says "these folders will not be backed up anymore", why not?
So my idea is that it's a competency problem (lack of communication), not malice. But it's just a theory, based on my own experience.
In any case, this is a bad situation, however you look at it.
My understanding is that a modern, default onedrive setup will push all your onedrive folder contents to the cloud, but will not do the same in reverse -- it's totally possible to have files in your cloud onedrive, visible in your onedrive folder, but that do not exist locally. If you want to access such a file, it typically gets downloaded from onedrive for you to use.
If that's the case, what is Backblaze or another provider to do? Constantly download your onedrive files (that might have been modified on another device) and upload them to backblaze? Or just sync files that actually exist locally? That latter option certainly would not please a consumer, who would expect the files they can 'see' just get magically backed up.
It's a tricky situation and I'm not saying Backblaze handled it well here, but the whole transparent cloud storage situation thing is a bit of a mess for lots of people. If Dropbox works the same way (no guaranteed local file for something you can see), that's the same ugly situation.
Now, I:
- Put important stuff in a SyncThing folder and sync that out to 2 different nodes.
- Clone stuff to an encrypted external drive at home.
- Clone stuff to an encrypted external drive at work and hide it out in the datacenter (fire suppression, HVAC, etc).
It's janky but it works.
I used to use a safe deposit box but that got too tedious.
I know the post is talking about their personal backup product but it's the same company and so if they sneak in a reduction of service like this, as others have already commented, it erodes difficult-to-earn trust.
On macOS.
Edit: on top of that I've built a custom one-page monitoring dashboard, so I see everything in one place (https://imgur.com/B3hppIW) - I'll opensource, it's decent architecture, I just need to cleanup some secrets from Git history...
For stuff I care about (mostly photos), I back them up on two different services. I don't have TBs of those, so it's not very expensive. My personal code I store on git repositories anyway (like SourceHut or Codeberg or sometimes GitHub).
Trying to audit—let alone change—the finer details is a pain even for power users, and there's a non-zero risk the GUI is simply lying to everybody while undocumented rules override what you specified.
When I finally switched my default boot to Linux, I found many of those offerings didn't support it, so I wrote some systemd services around Restic + Backblaze B2. It's been a real breath of fresh air: I can tell what's going on, I can set my own snapshot retention rules, and it's an order of magnitude cheaper. [2]
____
[1] Along the lines of "We have your My Documents. Oh, you didn't manually add My Videos or My Music for every user? Too bad." Or in some cases, certain big-file extensions are on the ignore list by default for no discernible reason.
[2] Currently a dollar or two a month for ~200gb. It doesn't change very much, and data verification jobs redownload the total amount once a month. I don't backn up anything I could get from elsewhere, like Steam games. Family videos are in the care of different relatives, but I'm looking into changing that.
As for GUIs in general... Well, like I said, I just finished several years of bad experiences with some proprietary ones, and I wanted to see and choose what was really going on.
At this point, I don't think I'd ever want a GUI beyond a basic status-reporting widget. It's not like I need to regularly micromanage the folder-set, especially when nobody else is going to tweak it by surprise.
_____
[1] The downside to the dumb-store is a ransomware scenario, where the malware is smart enough to go delete my old snapshots using the same connection/credentials. Enforcing retention policies on the server side necessarily needs a smarter server. B2 might actually have something useful there, but I haven't dug into it.
Preferably cheap and rclone compatible.
Hetzner storagebox sounds good, what about S3 or Glacier-like options?
[0]: https://kopia.io/
[1]: https://blinkdisk.com/
I assume when asking such a question, you expect an honest answer like mine:
rclone is my favorite alternative. Supports encryption seamlessly, and loaded with features. Plus I can control exactly what gets synced/backed up, when it happens, and I pay for what I use (no unsustainable "unlimited" storage that always comes with annoying restrictions). There's never any surprises (which I experienced with nearly every backup solution). I use Backblaze B2 as the backend. I pay like $50 a month (which I know sounds high), but I have many terabytes of data up there that matters to me (it's a decade or more of my life and work, including long videos of holidays like Christmas with my kids throughout the years).
For super-important stuff I keep a tertiary backup on Glacier. I also have a full copy on an external harddrive, though those drives are not very reliable so I don't consider it part of the backup strategy, more a convenience for restoring large files quickly.
Don't even know why people rely on these guis which can show their magic anytime
* If your value your privacy, you need to encrypt the files on the client before uploading.
* You need to keep multiple revisions of each file, and manage their lifecycle. Unless you're fine with losing any data that was overwritten at the time of the most recent backup.
* You need to de-duplicate files, unless you want bloat whenever you rename a file or folder.
* Plus you need to pay for Amazon's extortionate egress prices if you actually need to restore your data.
I certainly wouldn't want to handle all that on my own in a script. What can make sense is using open source backup software with S3/R2/B2 as backing storage.
In terms of cloud storage... well I was using backblaze's b2 but the issues here are definitely making me reconsidering doing business with the company even if my use of it is definitely not impacted by any of them.
Most people (my mom) don't know what s3 and r2 is or how to use it.
I like how you can set multiple keys (much like LUKS) so that the key used by scheduled backups can be changed without messing with the key that I have memorized to restore with when disaster strikes.
It also means you can have multiple computers backing up (sequentially, not simultaneously) to the same repository, each with their own key.
also, you pay per-GB. the author is on backblaze's unlimited plan.
That's a good warning
> Backblaze had let me down. Secondly within the Backblaze preferences I could find no way to re-enable this.
This - the nail in the coffin
I still like backblaze, they've been nice for the days where I was running windows. Their desktop app is probably one of the best in the scene.
"The Backup Client now excludes popular cloud storage providers [...] this change aligns with Backblaze’s policy to back up only local and directly connected storage."
I guess windows 10 and 11 users aren't backing up much to Backblaze, since microsoft is tricking so many into moving all of their data to onedrive.
backup to real s3 storage.
llms on real api tokens.
search on real search api no adverts.
google account on workspace and gcp, no selling the data.
etc.
only way to stop corpos treating you like a doormat
I hope Backblaze responds to this with a "we're sorry and we've fixed this."
[1] https://www.backblaze.com/cloud-backup/personal
When trying to copy files from a OneDrive folder, the operation fails if the file must be sync'd first.
I, for one, do not think it is fair to blame Backblaze for the shortcomings of another application who breaks basic funtionality like copying files.
https://techcommunity.microsoft.com/discussions/onedriveforb...
Edit: spelling errors and cleanup
You have to give Apple credit, they nailed Time Machine. I have fully restored from Time Machine backups when buying new Macs more times than I can count. It works and everything comes back to an identical state of the snapshot. Yet, Microsoft can’t seem to figure this out.
It almost seems like they’re taking it personally as some kind of intentionally slight against them.
Most users would not want Backblaze to back up other cloud synced directories. This default is sensible.
I get that changing economics make it more difficult to honor the original "Backup Everything" promise but this feels very underhanded. I'll be cancelling.
The configuration and logging formats they use are absolutely nonsensical.
We're also seeing this play out in real time with Anthropic with their poop-splatter-llm. They've gone through like 4 rug-pulls, and people STILL pay $200/month for it. Every round, their unlimited gets worse and worse, like I outlined above.
Pay as you go is probably the more fair. But SaaS providers reallllllly hate in providing direct and easy to use tools to identify costs, or <gasp> limit the costs. A storag/backup provider could easily show this. LLM providers could show a near-realtime token utilization.
But no. Dark patterns, rug-pulls, and "i am altering the deal, pray i do not alter it further".
I’m only in my 40’s, I don’t require glasses (yet) and I have to actively squint to read your site on mobile. Safari, iPhone.
I’m pretty sure you’re under the permitted contrast levels under WCAG.
- ctrl-shift-. to show hidden files on macOS - pull down to see search box (iOS 18) - swipe from top right corner for flashlight button - swipe up from lower middle for home screen
Etc, etc
Use this command in the developer tools console to change the color.
Is this maybe a pixel density of iphone issue?
I wouldn't mind a darker and higher weight font though.
For desktop browsers, I also have a bookmarklet on the bookmarks bar with the following Javascript:
It doesn't darken the text on every webpage but it does work on this thread's article. (The Javascript code can probably be enhanced with more HTML heuristics to work on more webpages.)One day try throwing a pair on you'll be surprised. The small thin font is causing this not the text contrast. This and low light scenarios are the first things to go.
Whatever causes it, I do wear glasses (and on a recent prescription too) and the text is still very hard to read.
Firefox users: press F9 or C-A-R
What is it supposed to do?
There is no mention of F9 on this support page either:
https://support.mozilla.org/en-US/kb/keyboard-shortcuts-perf...
Am I missing something?
As for mentioning WCAG - so what if it doesn’t adhere to those guidelines? It’s his personal website, he can do what he wants with it. Telling him you found it difficult to read properly is one thing but referencing WCAG as if this guy is bound somehow to modify his own aesthetic preference for generic accessibility reasons is laughable. Part of what continues to make the web good is differing personal tastes and unique website designs - it is stifling and monotonous to see the same looking shit on every site and it isn’t like there aren’t tools (like reader mode) for people who dislike another’s personal taste.
If they start excluding random content (eg: .git) without effective notice, maybe they AREN'T backing up everything you think they are.
My naive idea: Download 100 TB every 3 month to a 2nd device, create a list of files restored, validate checksums with the original machine, make a list of files differing and missing, check which ones are supposed to be missing? That sounds like a full time job.
Pinning this squarely on user error. Backblaze could clearly have done better, but it's such a well known failure mode that it's not much far off refusing to test restores of a bunch of tapes left in the sun for a decade.
It isn't user error if it was working perfectly fine until the provider made a silent change.
Unless the user error you are referring to is not managing their own backups, like I do. Though this isn't free from trouble, I once had silent failures backing up a small section of my stuff for a while because of an ownership/perms snafu and my script not sending the reports to stderr to anywhere I'd generally see them. Luckily an automated test (every now & then it scans for differences in the whole backup and current data) because it could see the source and noticed a copy wasn't in the latest snapshot on the far-away copy. Reliable backups is a harder problem then most imagine.
Also consider e.g. ~/.cache/thumbnails. It's easy to understand as a cache, but if the thumbnails were of photos on an SD card that gets lost or immediately dies, is it still a cache? It might be the only copy of some once-in-a-lifetime event or holiday where the card didn't make it back with you. Something like this actually happened to me, but in that case, the "cache" was a tarball of an old photo gallery generated from the originals that ought to have been deleted.
It's just really hard to know upfront whether something is actually important or not. Same for the Downloads folder. Vendor goes bankrupt, removes old software versions, etc. The only safe thing you can really do is hold your nose and save the whole lot.