pickmy.org server backup

I like things simple. My years of being an electronics technician taught me that it was always the complex and over-engineered circuits that gave you the most trouble. The stuff that was simple, basic, and based on time-tested designs; usually worked the best. So when building the new pickmy.org and setting up the server backups, I could have taken the approach of finding something already set-up to do it. In fact many people recommended me all kinds of crazy looking packages for handling server backups.

I decided to do something simple, basic, and bash.


mysqldump -u root --all-databases > /var/www/full.sql
tar -czf /var/www/etc-nginx.tar.gz /etc/nginx
rsync -arvz -e 'ssh -p 1337' /var/www user@home:/media/remote/vpsbackup
rm /var/www/etc-nginx.tar.gz
rm /var/www/full.sql

The only thing I should have to explain here is that all my websites are served out of directories under /var/www; so dropping random stuff in /var/www doesn’t actually host it. I had one guy ask me why I was “doing something stupid” by “dropping your backups on your web directory”; then ranted on about how “that directory is only for web content, not backup storage or anything.”

I just ignored him, because it’s my system and I can put whatever I want, wherever I want it.

But it should be pretty simple to see what’s going on. I dump the contents of the mySQL server to an .sql file, tar/gzip my nginx configuration directory, and then rsync that over ssh to a system here at home. It actually copies it directly to my NAS, which is mounted under /media/remote. (At some point I will drop a link in to my home network diagram/description.)

From there…my home systems can make further copies/backups of the data. Using rsync also gives me the ability to pre-copy large files to the backup locally; saving me on bandwidth. The script runs every morning through cron; at about 0700 UTC. All my public-facing SSH servers use key authentication, so the server is setup to automatically log in to the home SSH.

Here’s How We Lost All The Data From The Last Server That Hosted Pickmy.org

For a couple of years, I had a very similar system in-place that was backing up my last VPS, an OpenVZ based one. There were a couple of differences there. For starters, it was only copying the www directories and sql server. But it was also simply tar/gzipping everything up and transferring it as a whole. At the time, I didn’t care; that server didn’t have much storage and had tons of bandwidth. I could do full disk backups every day in full and not worry about my quota. I have opposite problems here; I have a lot more storage but smaller bandwidth quota. I also hadn’t gotten familiar with rsync back when I first set it up; so I just did it the way that was “the easiest” for me at the time.

Everything worked until things broke on my end. The machine I was running SSH on here at the house got taken out by lightning; so the backups weren’t happening all the time. So rather than the weekly backups that had been occurring; I was doing them by hand “when I remembered”. But it was six months later that an event would occur that, at the time, didn’t seem like it would have any effect on my old server data; the almost simultaneous failure of two hard drives.

It started when the local NAS began to act up. I had gotten fed-up with the factory firmware and had read about the ability to install Debian on it. It was far from a standard installation procedure, requiring me to do things like setup a TFTP server, “break” into the uboot network console, specify where in memory to load the boot and setup images from FTP, tell the thing to boot, install the OS over a console, and finally get back in to the uboot console so I could modify the EEPROM to boot Linux. That sounds like a major pain in the butt; but surprisingly the instructions were written clear enough that I had no problem following them.

However…this process was going to wipe the drive. I had actually spent a couple of days in the weeks prior doing a *massive* organization of my “important media” (*cough*music*cough); so I already had a copy of that on a local drive. But I just backed the entire thing up to a 3TB portable USB drive…because I actually don’t mind redundant copies of the music archive. So with all my data copied off the thing I get Debian on it and start loading it back up.

But…there’s a problem. The performance sucked; it really sucked. I was convinced the factory firmware was doing some trickery by RAIDing LVMs on the drive (I kid you not; it was running a software RAID on partitions of a single drive). But that wasn’t the case. In doing debug on the thing I was seeing all kinds of problems, reading the SMART information just revealed horrors. I had just gotten the Dell PowerEdge R610 setup; so I yanked the drive out of it’s enclosure, plugged it in to a USB to SATA adapter, and let a VM crank out tests on it. It failed every single one. So I needed to get a new drive. That’s okay though; because I have all the data off it.

So, I get new drives, new NAS enclosure, get everything setup, and start copying data off the USB drive. I went to check on it a few hours later; and it wasn’t actually doing anything. Something happened…it didn’t give any data, and the whole thing went in to a panic. Windows wasn’t happy with it either….so now my backup drive has failed.

I did ultimately run some recovery off it and retrieved about 35% of the data. I’ve been meaning to let it do step-2 and run for a few months; I just haven’t gotten around to setting up my RPi to do that. But the main problem was the server backups were part of all that missing data. What I should have done is initiated a full backup to replace the data. What I didn’t do was just that.

So…as I explained in early posts on the blog; I was moving off that VPS. It was becoming a real pain bandwidth wise and was no longer worth the stupid low price I was paying for it. So I decided I would get another server before the previous was up for renewal; and just do a direct server-to-server transfer. But as I found out mid-August; that was a bad idea. The node in the datacenter suffered hardware failure. Everything was pretty much totally lost. There’s a chance I’ll get a backup-file after doing deeper-recovery; but I’m not holding my breath.