NAS and file backup

I have got a Synology DS720+ NAS, but to be honest I don’t really like Synology. Its software may appeal to amateurs, but as a professional, I feel that their software is very awkward to use. They don’t do things the way it is normally done in Linux, you have to follow their rules, and their hardware offers very poor value for the money. (I guess they want to be like “Apple”, but they just don’t have the same level of technical prowess and design sense to make great products.) I will not be buying another Synology NAS. I’d rather just buy a case and a motherboard and build my own NAS, with OS such as Rockstor. The only redeeming factor is that Synology supports Docker, so I was able to run any app I want on it.

I attached a small UPS to my NAS, so that it can safely shut down when the power goes out.

I have two 12TB hard drives, configured with Btrfs + RAID1. Make sure you don’t pair Btrfs with RAID5/6 as this combination can be very broken, but otherwise Btrfs is a great file system. I have periodic Btrfs data scrubbing enabled. This 12TB space I use it for storing my “personal library”: all the data that I wish to keep for life (Documents, Photos, Books, Music, Software, Backups, etc.).

I have got another 10TB hard drive, for storing unimportant data. For example, I use my NAS to seed on Private Trackers, this disk hosts all the data for that purpose. Even if I lost all the data on that drive, it wouldn’t matter.

Btrfs with scrubbing + RAID1 should be able to keep your data safe, but to be extra safe, we still need to do backup.

If a ransomware destroyed all your data, then RAID1 will not be able to help you recover it. (or if you accidentally rm -rf’d it)
Keeping all your data in one place is not safe, in case of fire, flood, etc. You may lose everything.

The first solution I tried, is to use duplicacy, which is a backup tool that supports incremental backup. I back up my data to Wasabi, which is a cloud storage that is S3-compatible (before using Wasabi, I tried Google Drive with Business G Suite subscription which gives you unlimited storage. But I found out that Google Drive is not very reliable, I backed up 2TB data then tried to restore from the backup, some files were corrupt. So I no longer trust Google Drive. Or, duplicacy just can’t work perfectly with Google Drive). Wasabi costs $6 per TB per month, it is in general quite reliable, so I was happy for 2 years, then I decided this solution isn’t very good:

As my collection of data grows, the cost of backup grows. Now I have 5TB data, it costs $360 every year, which is a lot of money.
I’m just not familiar with duplicacy’s code. I feel that catastrophic loss is possible with duplicacy. That is to say, if one chunk gets lost / corrupt, it might be possible that a lot of files will be lost. I don’t know if this is true, but I don’t want to take the risk. I want to rest assured that I can recover all my data.
Backing up to Wasabi is just very slow, considering the very bad network conditions in China.
Incremental backup doesn’t really guard against bit rot (though unlikely with Btrfs and RAID) / ransomware. Suppose, some of your files are actually corrupted by a ransomware, but you didn’t know it. You back up the corrupted file to Wasabi, then one year passed, all the old versions (working versions) of the file were auto pruned to save space. Then the next year you decide you need that file, now there’s no way to recover that file, all that you will ever have is a file corrupted by ransomware.

Then I decide to code my own backup solution, and I came up with https://github.com/KevinWang15/dirsync “An off-site data backup solution that is efficient, incremental and can prevent data degradation.”

The idea behind this tool is very simple:

A .dirmap will be generated for the directory you want to back up. (This .dirmap will contain a list of all the files in that directory, and their checksums)
A second .dirmap will be generated for the backup directory.
You will manually review the difference between the source directory and the backup directory. (A GUI is provided)
If the diff looks expected, you run a command to sync all the changes from the source directory to the backup directory.

So, this is basically data scrubbing and data backup combined into one tool. With this tool, you can rest assured that your data is always in good condition. (e.g. If a ransomware corrupted one of your file, you will see that unexpected diff instantly, then you will further investigate and find the root cause.)

I set up a reminder on my TODO app to do this backup twice every year. I have got two external disks, normally not plugged in and stored in protective cases. I back up my personal library to these two disks, and put one at home and the other at my workplace.

I feel very satisfied with this solution. It’s simple, it’s reliable, it gives you total control over your data, and it’s cheap.

Another tip is, if you are going to back up a lot of small files (e.g. all the pictures I took with a mobile phone I no longer use). You can tar the directory that contains all these small files first, so that .dirmap will not be bloated and performance will not degrade too much. Use tar, not zip or rar, because only tar is safe from catastrophic loss.