Backup, Replication and all that jazz …

I’ve been dealing with backup questions over the last little while and it seems that a lot of SMB operations have a lot of confusion about what constitutes a backup system.  I guess in very simple terms, a backup system is a aggregation of tools and devices that produce some sort of “copy” of your data in a location that is separate from where your data lives.

So, what constitutes such a system?  That’s a question that can get people in my industry pretty worked up in a very short period of time.  Maybe it would be a good exercise to define what it is we try to “protect” with backups and then classify what the various tools can do.

Backups perform two very simple yet very different functions: 1) They ensure you have copies of data to protect you from human error (eg: deletion of an important file) and; 2) They give you the needed data to recover from some form of disaster (eg: a server dies and takes the data with it).  These two functions can lead to very different methods of creating the backup.

Recovery from human error really requires that you have a copy (or copies) of your data that are a “snapshot” of how things were at a certain point in time.  Very often users will not discover they need to pull a file back from backup for a number of days so it is imperative for most companies that there is a backup rotation that covers a wide span of time.  This is the type of backup that is exemplified by the traditional backup tools such as BackupExec, Acronis, and the like.  These tools allow you to schedule and create many types of backups (full, incremental, differential, hourly, daily, weekly, and so forth).  And, of course, these tools also allow you to perform disaster recovery (assuming you have full backups of systems) although there is obviously a time factor involved that can be a real issue in a crisis.

Disaster recovery usually implies you are in “crisis mode” and you need to get back in operation FAST.  Fastest recovery is provided by replication technologies that replicate (copy) data in real time or close to real time from one server to another (or one NAS/SAN to another NAS/SAN).  Replication ensures that the replicated target is always as close to a “true” copy of the source as is possible.  The replicated target can be put online in place of the failed system in very short order and data loss is usually limited to whatever was “on the fly” when the original system failed.  But this also implies that there is no previous point-in-time copy, it is an always current copy; therefore, replication is an almost useless technology to rely on when you need to recover from human error, aside from those errors caused by a human doing something nasty to a piece of hardware. 

What technology is right for you?  The answer is “it depends”.  From my point of view you ALWAYS need to have point-in-time backups as humans mess up far more frequently than computers.  And point-in-time backups can provide the basis for disaster recovery.  But recovery time for a full system rebuild can take a (relatively) long period of time so replication can save your behind in a full crisis situation.  You may want to consider using both types of systems if your budget will allow for it.

And, regardless of the technology used, a backup system is not worth much if recovery is not tested.  YOU NEED TO TEST your backup and recovery system(s)!  There’s nothing worse than finding out your system actually DOESN’T work when you are in crisis mode (like the Boss is hanging over your shoulder wondering WHERE that crucial Excel file went …..).