Wednesday, February 18, 2015

Site recovery manager and vsphere replication features

Overview of New Features in SRM and vSphere Replication
vCenter Site Recovery Manager 5.5 and vSphere Replication 5.5 include a number of significant improvements and exciting new features

New features in vCenter Site Recovery Manager 5.5 include:
New support for StorageDRS and Storage vMotion
Full integration with vSphere 5.5  - Support for Multiple Recovery Points
New features in vSphere Replication include:
New User Interface
Flexible Replication Topologies
Support for Multiple Point-in-Time Recovery Points
Support for vSphere Distributed Storage

Major Performance Improvements

What's New in vSphere Replication: Support for New Replication Topologies
vSphere Replication is a feature of the vSphere platform. It copies a virtual machine to another location, within or between clusters, and makes that copy available for restoration through the VMware® vCenter Server™ Web-based user interface.
vSphere Replication continues to protect the virtual machine on an ongoing basis. It replicates to the copy the changes that are made to the virtual machine. This ensures that the virtual machine remains protected and is available for recovery without requiring restore from backup.
One of the exciting new features of vSphere Replication v5.5 is the ability to support flexible replication topologies.  In previous versions of vSphere Replication, you were limited to a single vSphere Replication Management Server instance per vCenter Server.   This generally limited you to replicating between two distinct datacenter sites and required that each site had it's own vCenter Server instance.
Topologies with vSphere Replication can now be broadened to encompass inter-datacenter replication, intra-datacenter replication, and can include many different models of deployment dependent on where the vSphere Replication Server appliances are deployed.

Each vCenter Server needs to have a single, master vSphere Replication Appliance deployed and paired with it, but up to 9 further vSphere Replication Servers can be deployed to locations managed by that vCenter Server to act as the target for replication.

we are going to configure replication between two datacenters:PROD and DR. The DR datacenter is a small regional site that does not have it's own vCenter Server.  We will be configuring replication for a VM running in PROD, to ensure that we have a valid replica available in DR as part of our overall BCDR strategy

Examine master vsphere replication appliance for prod vcenter server

As stated earlier, each vCenter Server instance is configured and paired with a master vSphere Replication appliance.  
Using the vSphere web client, navigate to the About information pane for thePRODvCenter Server master appliance.
1. Select the vCenter icon to access the inventory
2. Select vCenter Servers and click onvc-01b
3. Click on the Manage tab
4. Select the vSphere Replication sub-tab
5. Select About

Examine target site information

Click on the Target Sites button.  
You are able to pair this instance with one or more remote vCenter instances.   In our example, this site has been paired with the vCenter Server for the DR datacenter (vc-01a).  This pairing is required only if you with to replicate to site managed by a different vCenter Server.
we will be configuring replication from our primary PROD datacenter to our smaller regional DR datacenter.  Both datacenters are managed by the same vCenter Server.  This inter-vCenter replication is a new feature of vSphere Replication 5.5.

Examine replication servers for this vcenter instance
Click on the Replication servers button.  
You will see that we have two vSphere Replication appliances configured.  One appliance is responsible for the primaryPRODdatacenter.  The second appliance manages replication for the remote regional DR datacenter

Complete the following steps to open the vSphere Replication configuration wizard.
1. Right-click on the Extranet ServerVM
2. Select All vSphere Replication Actions
3. Click on Configure Replication

Select the vcenter site

We will be replicating a full LAMP stack from DR to PROD.
1. Select the DR vCenter instance (vra-01a)
2. Click Next to continue

Select the dr vsphere replication server

We will be replicating the Extranet Server VM to DR
1. Click on Select vSphere Replication Server
2. Select the DR (ds-site-a-nfs01) Replication server
3. Click Next to continue

Select the desired datastore located at the DR remote regional site.
1. Select the ds-site-a-nfs01 datastore
2. Click Next to continue


Monitoring vsphere replication and point-in-time recovery points

What's New in vSphere Replication: Multiple Point-in-Time Recovery Points
Another new and powerful feature of vSphere Replication 5.5 is the ability to support multiple recovery points.   Prior to this release, only a single recovery point (most current) was supported.   Starting with version 5.5, the administrator has the ability to specify how many point-in-time copies of the VM should be created per day, and for how many days they should be retained.  
This capability allows you to easily revert to the last known good state of a virtual machine to recover from data corruption, virus infections, etc.  
we are going to configure multiple point-in-time recovery options for a VM that is currently being replicated betweenPRODand DR.   We have been given a directive to ensure we have four daily recovery points for this VM and that they should be retained for 2 days.
Best Practice Note: While we can support up to 24 recovery points per VM, we do not recommend as a general practice that you implement multiple recovery points for all of your replicated VMs.  This feature should only be used when there is a specific known need, and then only retain what is necessary.  

Examine existing replications

Using the vSphere web client, navigate back to the DR vCenter Server instance and examine the currently configured replications.
1. Navigate to the DR vCenter Server (vc-01a) in the inventory view
2. Select the Monitor tab
3. Click on vSphere Replication
4. Select the Incoming Replicationsbutton
We are replicating a full LAMP stack ofVMs from DR to PROD.  You can examine or modify the current replication configuration here.
Let's implement the multiple point-in-time recovery options as required by our new BCDR policy (four daily recovery points retained for two days).

Reconfigure replication settings

Make sure the DB Server VM is selected and click on the ReconfigureReplication icon to launch the configuration wizard.

Click Yes to dismiss the Security Alert if needed.

Click Next four times to keep currently configured settings and advance to the Recovery settingspage.
Remember, we want to configure four daily point-in-time recovery points and retain them for two days.  Execute the following steps to implement the desired recovery capabilities.  
1. Make sure Point in time instances is Enabled by clicking the checkbox
2. Enter 4 as the number of instances per day
3. Enter 2 as the number of days to retain
4. Click Next to continue

Review your changes and click Finish to continue.
Confirm new policy

Confirm your new settings took effect.
1. Click the Refresh button
2. Examine the Point in time recovery settings and existing Instance Sync Point(s)
Congratulations.  You have just configured custom multiple point-in-time recovery settings for vSphere Replication

What's New in SRM:  Storage vMotion and Storage DRS Interoperability
Prior to the release of vCenter Site Recovery Manager 5.5, VMware explicitly stated that StorageDRS and Storage vMotion were NOT supported with SRM.   The following example (taken from Cormac Hogan's excellentblog on this topic) explains the core issue preventing interoperability.
Example:  If a customer enables Storage DRS in fully automated mode on the protected site, and at 3am in the morning, Storage DRS decides that it needs to balance the datastores (either for space or for I/O load), and it Storage vMotions a VM to a different datastore, that VM is no longer protected by SRM. Let's say that at 4am, there is a disaster at the protected site. SRM does its thing and fails over to the recovery site. Unfortunately, not all the VMs are recovered because some of them were migrated to different datastores at the protected site, and were left in an unprotected state. This is not a nice situation to be in during a disaster.

Starting with vCenter Site Recovery Manager 5.5, VMware now provides support for protected-site StorageDRS and Storage vMotion with vSphere Replication.  
Historically we could not support Storage vMotion  and Storage DRS with vSphere Replication.  In earlier releases, vSphere Replication persistent state files that track changed blocks for current replication were deleted during a Storage vMotion operation.  This caused a vSphere Replication full sync operation to occur which is a very expensive operation.   This behavior has been changed in vSphere 5.5 so that the persistent state files are now migrated along with the VMDK file during a Storage vMotion operation.  

Protected site storage vmotion and storage drs now supported:  array-based replication

vCenter Site Recovery Manager 5.5, VMware now provides support for protected-site StorageDRS and Storage vMotion with array-based replication as well.  
Support relies on the use of a StorageDRS cluster that contains only datastores that are part of the same array consistency group.  The disks in this consistency group are all replicated with the same schedule and write order fidelity is maintained.   This allows them to be moved because there will always be a recoverable set of files either at the source location (if a crash/recovery occurs during migration) or at the target location (if it completes successfully before the crash/recovery).

What's New in SRM:  Multiple Point-in-Time Recovery Capability
SRM is fully compatible with the new Multiple Point-in-Time recovery point feature in vSphere Replication 5.5.   Using these technologies together, you can easily revert to a previous point-in-time copy of your virtual machine following the execution of an SRM recovery operation.   This can be extremely useful to revert a VM to a known good state prior to data corruption, virus infection, or other issues.

Tuesday, February 17, 2015


We have already done failover from prod to DR in my previous blog

Now we will Optimize the Disaster Recovery Plan

We will use the reporting features of Site Recovery Manager to view the test results and see if we can identify a problem and then reconfigure the recovery plan to address any issues

Review the dr test report for errors

If you've left the Recovery Plan page, navigate back and open the Customer Care Web App recovery plan.
1. Click on "History"
2. The report at the top of the list will be the cleanup operation we previously ran. Make sure you select the report for the Test instead, which should be the second item in the history. Click on "Export Report"

1. Leave the file format as "Web Page" and click "Generate Report."

The report for our DR test has been saved to the desktop. Minimize the open web browser and find the icon on the desktop for the saved report. Double click to open. Take a few minutes to review the report

  Our plan is attempting to start the web server and db server at the same time.  For the web application to start correctly, the database must be available when the web server starts.
Add application dependencies

We need to make sure that the DB server starts before the Web Server.  Return to the vSphere Web Client
1. Click on "Recovery Steps" at the top of the page
2. Right Click on the Web Server
3. Click on "Configure Recovery"

Configure dependencies for the web server

1. Expand "VM Dependencies"
2. Click on "Configure"

1) Check the box next to DB Server
2) Click on OK
3) When you are returned to previous window, click OK

Add new vms to the dr plan

To add the newly identified Portal Server to the DR plan we need to add it to our customer care web appProtection Group.  
1. Click on Protection Groups
2. Make sure that the Customer Care Web App protection group is highlighted in the left pane
3. Select the Summary tab
4. Click on Actions and then "Edit Protection Group"

1. Click "Next" until you get to Step 3: Virtual Machines. You will see a list of VM's that can be protected with this protection group. Notice that Portal Server is not selected!
2. Check the check box next to Portal Server to add it to the protection group. Do NOT check the box next to TestVM or you may not be able to complete later modules in this lab.
3. Click Next to move through the rest of the steps without making any additional changes. When you reach the last step click on Finish.

Monday, February 16, 2015


This is a second part of BCDR using VMware SRM. The earlier blog USE VMware SITE RECOVERY MANAGER TO TEST DISASTER RECOVERY -1 contained failoing over to DR. In this blog we will failback

Now, we will fail the application BACK to Production DC, restoring it to the original location. Applications protected with SRM can be migrated back and forth as often as necessary for everything from disaster recovery/avoidance to simply avoiding downtime due to planned maintenance outages

Connect to production dc site

1. Open the vSphere Web Client for the Production DC vCenter.  From the home tab navigate to the Site Recovery plugin.
We'll be running the planned failback from the Production DC site vCenter

Initiate recovery plan for planned migration'
Navigate to Recovery Plans.
1.Make sure that the Customer Care Web App is highlighted

2. Click the red button to recover the recovery plan.
Start the disaster avoidance procedure
We will use the same process that we followed when we originally failed over to DR DC.
1. Check the check box indicating that you understand the risks of a failover.
2. Make sure that you select Planned Migration.
3. Click Next and then Finish

Monitor the plan while it runs.

Once the plan is complete, it is important that we reprotect the Customer Care Web App as soon as possible.
1. Click the Reprotect button
Make sure to click the checkbox to confirm you understand what will be occurring during the operation before clicking Next. As with the previous Reprotect operation, this may take up to 20 minutes while the protected VMs are replicated back to DR DC

After the plan completes, verify that SRM is now protecting Production DC using resources in Albuquerque.
1. Click on the Summary tab
2. Ensure that Production DC is listed as the Protected site and DR DCas the Recovery Site.
That's it! You've failed a critical application over from one site to another to avoid a disaster, failed it back and used reprotect to make sure that it was protected from disaster regardless of the site in which it was running. 



In working with the business, IT has determined that they will protect the application with the following policy in mind.
All workloads in Production DC will be protected and recovered using resources in the DR DC
All workloads in DR DC will be protected and recovered using resources in the Production DC
vSphere Replication will be used to protect all workloads to save cost since a recovery point objective of 15 minutes is acceptable for all Customer  data.


We need to review the status.
1. Click on vSphere Replication in the left pane
2. Click on vc-01b.corp.local to select the local vCenter
3. Click on Monitor


1. Make sure you have Outgoing Replications selected
2. View the status of outgoing VM replications. All should have "OK" as their status and have vc-01a.corp.local as their target. The status for some VMs may show an RPO violation instead of "OK". This is because of the way the Hands-On Labs are configured and can be ignored.
3. Click on Home when you are done viewing the information on this page.

From the Home page of the vSphere Web Client, click on Site Recovery on the left pane of the client. This will open the Site Recovery home page.

We need to verify that the two sites, DR DC and Production DC, are paired correctly and connected. Click onSites to view information about the SRM pairings


You should see the two sites, DR DCand Production DC, listed in the left pane of the client. Now you should verify that the sites are paired and connected.

1. Click on Production DC

Now you are viewing the Summary page for Production DC. The local site is Production DC and the paired site is DR DC.
1. Make sure you're viewing the Summary tab.
2. Verify that the Client and Server Connection status for Production DC is "Connected."
3. Verify that the Client and Server Connection Status for DR DCis "Connected"
4. Optionally, you can view the same information from the perspective of the paired site. Click on DR DC in the left pane and you should see the same information displayed fromthat site's perspective.

5. When you're done, click on Site Recovery to be taken back to the main Site Recovery page.


We now need to view the details of our disaster recovery plan and verify that it takes into account the dependencies between VMs.  Click onRecovery Plans to see how the VMs will be recovered in DR DC


.Click on the Customer Care Web App recovery plan

Our web application consists of three virtual machines, a tier 1 web server, tier 1 database server, and an extranet server the business considers tier 2.  We need to make sure the VMs are in their appropriate groups.
1. Select the Monitor tab
2. Click on Recovery Steps
3. Expand Step 6 by clicking on the arrow next to orange step 2.
It appears that the VMs are all in the same tier! If all three VMs are started at the same time then the application may not work correctly. The plan will need to be modified to ensure that the application is recovered in the correct order so that the Database and Web Server VMs are starting up before the Extranet Server.
Reconfigure virtual machine start-up order

1. Right-click on DB Server
2. Select All Priority Actions
3. Select Priority 1
Click "Yes" when asked if you want to change the priority group. Virtual Machines can be part of multiple recovery plans and changing this change will impact any recovery plan that this VM is a member of.
Repeat these actions for the Web Server so that it is also in the Priority 1 group.

Test disaster recovery plan

Now that you have reviewed the SRM configuration and verified that the recovery plan is correct, you can test the plan at any time to ensure that it will function as required.

Navigate to the Recovery Plans page that you previously visited
1. The "Customer Care Web App" recovery plan should be highlighted as it appears in this screenshot. If it is not, click on it once to highlight it.
2. Click the green arrow to initiate a test

 . Verify that the Protected Site and Recovery Site are correct.
2. Make sure the "Replicate recent changes to recovery site" checkbox is NOT checked.
3. Click Next
4. Click Finish on the next screen.

Click on the "Customer Care Web App" recovery plan to monitor the test progress. The test may take several minutes.
1. Select the Monitor tab.
2. Make sure that you've selected
When all steps have completed, the Plan Status will be Test Complete.

You can optionally now login to the DR DC vCenter and navigate to the VMs and Templates screen to see that the VMs are in fact running. Open a new tab in your browser and click on the DR DC vCenter bookmark. Click the "Use Windows session authentication check box and log in. Open the Hosts and Clusters view.
Take a look and verify...
1. The VM has started
2. VMware Tools is running
The VM is connected to an SRM created test bubble port group. This port group is not connected to any physical uplinks and is created by SRM when the test plan is run. It will be deleted when the test is complete. Alternatively, you could have built a dedicated test network for a more in depth disaster recovery test.

Clean up activities from disaster recovery test

Switch back to the Production DC vSphere client. Now that we have verified that the Customer Care Web App can be successfully recovered in DR DC, it is time to clean up the test
1. Click the Cleanup button to initiate the cleanup process

1. Click Next to confirm the cleanup options. The "Force Cleanup" check box will be grayed out
2. Click Finish to initiate the cleanup

In the next blog, we will cover Return to normal operations - fail back the application from dr dcto production dc

Featured Post

Amazon Route 53

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service.Route 53  perform three main functions in any...