Okay, now that I’ve talked about backing up the datacenter with NetBackup and DataDomain, and backing up remote sites with NetBackup and PureDisk, it’s time to discuss how to get all that data offsite to protect against a catastrophic event at the datacenter.
As mentioned before we have a primary datacenter with the majority of our systems including the backup environment, and a secondary “disaster recovery” datacenter to which we replicate tier 1 applications for business continuity purposes. Since we really wanted to get away from using tapes and instead store the backups on disk in our datacenter we have a second backup environment in the DR datacenter and we replicate the backup data there.
There are several ways to replicate backup data between two sites but most of them have drawbacks..
1.) Duplicate the backup data from disk to tape and ship the tapes to the remote site to be ready for restore. This is the easiest and probably cheapest way. But there’s that pesky tape yet again with it’s media handling and shipping. And restore could take a while since you have to deal with restoring the catalog from tape, then importing the media, etc.
2.) Duplicate the backup data directly from the local disk to the disk in the second location across the WAN. This is not very feasible with any significant amount of data because every byte of data that is backed up in the datacenter has to be copied across the much slower WAN. It could take many days to duplicate a single nights’ backup. You’d also need a special Catalog backup job that wrote to a storage device across the WAN. The good here is that the backup application knows there is a second copy of the data and knows how to find it.
3.) Replicate the data with the backup storage devices’s native replication. Whether it’s PureDisk, Avamar, or DataDomain, pretty much every source-based or target-based deduplication solution has replication built in that leverages the deduplication to reduce the amount of data that traverses the WAN. The advantage here is that you can have a copy of all of your backup data in a second location in a much shorter time than a traditional copy process. If your deduplication device stores the data with 10:1 compression, then your WAN usage is reduced by 90%. The savings in practice is actually better than that. The drawback is that the backup application (hence the catalog) has no knowledge that there is a second copy of the backup data and after recovering the catalog, you would need to import all of the disk-based media which could take a long time.
4.) Leverage NetBackup Lifecycle Policies with Symantec OpenSTorage (OST) and an OST-capable backup storage system like DataDomain or PureDisk(with PDDO). Basically this has all the advantages of option #2, where it is a catalog-aware duplication, combined with the advantages of WAN bandwidth savings from option #3. Time to copy the data offsite is much shorter due to deduplication, and time to restore is very fast since the data is already in the catalog and available on the disk.
OpenSTorage (OST) is a network protocol that Symantec developed to interface with disk-based backup storage systems and DataDomain was an early adopter of OST. OST allows Netbackup to control replication between OST-capable storage systems and keep track of the replicated copies of backups in the Catalog just as if Netbackup had made both copies itself. OST is also used as the protocol to send the backup data to the storage device as opposed to CIFS/NFS or VTL. DataDomain appliances support OST as does PureDisk when used in conjunction with the PDDO option discussed earlier. In NetBackup, replication controlled by OST is called “optimized duplication” and is controlled primarily through Lifecycle Policies.
Traditionally, when creating NetBackup job policies, the administrator will specify a Storage Unit (either a disk storage unit or a tape library or drive) that the job policy will send backups to. Lifecycle Policies are treated like Storage Units as far as the Job Policy is concerned but the Lifecycle Policy includes a list of storage units, each with it’s own data retention, that the backup data must be stored onto in order for NetBackup to consider the data fully protected. Typically there is a “Backup” target which is where the actual data coming from the client is stored, followed by one or more “Duplication” targets. After the backup job completes, NetBackup will copy the backup data from the “backup” location to all of the “duplication” locations. This works with pretty much any type of storage and you can mix and match tape and disk in the same policy. Since these are duplication operations, NetBackup will read ALL of the data from the backup location, and write ALL of the data to each duplication location. This can take a long time even on the local network and trying to offsite a lot of data over the WAN is not very feasible.
With OST, the lifecycle policy operates exactly the same except that it uses “optimized duplication”, instructing the storage device to copy the file rather than performing the copy through a media server. So in the case of DataDomain, OST issues the command to the DDR, the DDR then copies the file to the second DDR in the remote site and gets all the benefits of deduplication and compression between the two. The media server doesn’t actually do any work. Once the duplication is complete, the DDR notifies NetBackup and the catalog is updated with a record of the second copy of the backup. Lifecycle Policies are fully automated, you can’t even restart a failed duplication, so in the event of a transient failure like a WAN hiccup NetBackup will retry a duplication job forever until it succeeds in order to satisfy the lifecycle policy.
As you can probably surmise, this is REALLY nice for a tape-less backup environment. Our DD690 offsites over 9TB of data every night DURING the backup window. When the last backup job completes, the offsite copies are complete within 30 minutes. And there is absolutely no management of the offsite process or duplication jobs besides configuring the lifecycle policies up front. The drawback to regular Netbackup lifecycle policies is that all duplications are taken from the initial backup copy which limits what you can do with the copies.
Enter NetBackup 6.5.4… Despite the small 6.5.3 -> 6.5.4 version number change, the 6.5.4 release had quite a few new features added. The biggest one was a revamping of the Lifecycle Policy engine to allow for nested duplications. Now you can create a copy of a backup, then create multiple copies from the copy, then create copies from the other copies. Why is this useful?
Remember when I discussed using NetBackup with PDDO to backup remote sites? Well the data backed up from the remote site is all stored in the primary datacenter and we need to get the second copy to the DR datacenter. Plus, we wanted to have a small cache of recently backed up data sitting on the remote media server for fast restore. Well, nested lifecycle’s are the key. The lifecycle writes the initial backup copy onto the media server’s local disk which is configured as a capacity-managed staging area (ie: it stores as much as it can and expires data when it needs more space for new backups). The lifecycle then creates a duplicate of the backup onto the PureDisk storage unit in the primary datacenter. Since bandwidth to the remote site is very limited we don’t want to copy it from the remote site twice so the lifecycle has a second duplication nested under the first to copy it to the DR datacenter. The source of the second copy is the primary datacenter copy, NOT the remote media server copy.
Where else can we use this? Let’s consider our tape-less datacenter backups.. We backup the clients to the DataDomain in our primary datacenter, then using a lifecycle policy and OST, create a copy on the DataDomain in the DR datacenter. If we also wanted to have a tape copy for long term archive or vaulting we could create a nested duplication to make a copy to a tape library in the DR datacenter from the disk copy that is also in the DR datacenter. Without nested lifecycle’s the only workable solution would be to create the tape in the primary datacenter. Every copy of the backup made via the lifecycle policy whether it is using OST or not is maintained by the catalog and easily used for restore. Furthermore, using OST as the protocol between Netbackup and DataDomain actually increases throughput to the DataDomain DDR systems by approximately 2X vs VTL/CIFS/NFS.
Now to the caveats.. Optimized duplication via OST is only available when you are using OST as the protocol between the media server and the storage unit. This means it doesn’t work with VTL even when the DataDomain IS the VTL. OST only works over an ethernet network which is why we skipped VTL completely and used 10gbps networks for the DDR connections. We even skipped VTL/Tape for the NAS systems, connected them directly to the 10gbps network and use 3-way NDMP to backup them up over the network, through the media servers, to the DataDomain. We get the benefit of lifecycle policies, optimized duplication, and I may have mentioned before–no pesky tape even with NDMP/NAS backups. And the interesting thing is that with the 10gbps connection, the NDMP dumps are faster than direct fiber to tape.
There were other enhancements to NetBackup 6.5.4 centered around OST functionality but the lifecycle policy improvements were huge in my opinion.
To cover the catalog replication, we run Netbackup hot catalog backups to a CIFS share that is hosted by the DataDomain. The DDR replicates that share using DataDomain native replication to the DDR in the DR datacenter where the same data is available via a similar CIFS share. Our standby Netbackup master server is already connected to the CIFS share for catalog restore and connected to the DDR via OST. A single operation restores the catalog from the replicated copy. In a real disaster we can begin restoring user data within 30 minutes from the DR datacenter.