February 5, 2012

BC8449 – Using VMware Site Recovery Manager with NetApp

Larry Touchette from NetApp and Arturo Fagundo from VMware presented this session on using SRM with NetApp.
Overview of VMware Site Recovery Manager
SRM allows you to do non-disruptive DR testing.  In order to do this you group VMs you want into protection groups.  Protection groups are the minimum level you can failover.  Once you have created your protection groups you build a recovery plan – a recovery plan is similar to an electronic runbook.  Site Recovery manager handles failing over the storage for you, promoting a replica image at the recovery site, registering the VMs at the DR site from the replicate storage and powering them on.  There are two different mods for SRM: Test and Failover.  When using test mode, it won’t affect your production virtual machines and it will create a separate, isolated network in the recovery site to bring the VMs up for testing.  Another thing to be aware of, SRM supports bi-direction protection since a lot of customers would be running production out of both locations.
High Level Configuration Info of VMware Site Recovery Manager
SRM leverages array replication technology and requires the use of a Storage Resource Adapter (SRA) that is provided by the vendor (NetApp, EMC, etc)
You must have a vCenter at each site, since its likely the two environments are not identical you configure inventory mappings with SRM to map resource pools, networks, folders from one site to another.  The protection groups correspond 1:1 to datastore groups but it is not configured by datastores, rather it is configured by virtual machines.  The recovery plan contains protection groups.
NetApp Specific SRM Info
When performing a test recovery, SRM will request a temporary copy of the storage which in this case is a Flexclone and then add the LUNs to igroups or create NFS exports AND the SnapMirror replication will still continue.  The next release of Site Recovery Manager will optionally allow requesting synchronization of replicated devices – what this will give you is if you are the Virtualization admin you wouldn’t need to contact the storage admin to have them update the SnapMirror if you wanted to do a DR test with the very latest data.
A recovery workflow is similar to a test but it actually breaks the SnapMirror and then promotes the destination volume to be read/write.  If using LUNs it adds them to the appropriate igroups and if using NFS it creates the exports with appropriate permissions for ESX hosts.
The latest version (as of this post) of the NetApp SRA is 1.4.3 which is a unified adapter, it works for either SAN or NAS (meaning the VM can have a system VMDK on NFS and an RDM device via iSCSI – this is common when using SnapManager products within VMware).  Some of the new features in 1.4.3 are:
  • Unified Adapter
  • Fully thin provision the DR test environment
  • Multistore vfilers as storage arrays
  • Non-quiesced SMVI snapshot recovery
If you are upgrading to the unified adapter you should be aware of the following:  If you are currently a SAN only environment it requires no SRM reconfig, the ONTAP version on the NetApp should be 7.2.4 or newer and you would simply need to uninstall 1.4.2 SAN and install 1.4.3.  If you are currently in a NAS environment you need to delete the protection groups and array managers prior to uninstalling the 1.4.2 adapter, then after you install 1.4.3 you must re-create your protection groups.
The next version of SRM will have a new re-protect workflow to reverse replication and synchronize storage in the opposite direction.  Any changes made at the DR site would be populated back to the original primary site.  If the storage itself wasn’t destroyed in the disaster it will only transfer the delta changes (as it will find a common storage snapshot and transfer changes made since then).  More details on that were available in BC8372 – SRM Futures: Failback and more.  The next major version of SRM should be released in the second half of 2011.
Also see NetApp’s Technical Report on using VMware Site Recovery Manager with NetApp storage: http://media.netapp.com/documents/tr-3671.pdf
Also, just as note at the end – one mentioned use case for SRM (other then the obvious) was for testing Application/Windows Updates:  You can run a DR test and apply all the updates or make the configuration changes, do your testing and then once it’s been validated make the changes to your production systems as well.

Popularity: 5% [?]

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

About mike
I am currently a Consulting Architect working for Nexus Information Systems in the Twin Cities, MN area. My professional summary is available via my LinkedIn page. I can be contacted by the Contact Me link at the top of the site. I also spend (too much) time on Twitter so feel free to follow or send me a tweet.