Tag Archives: infrastructure

EMC VPLEX enables the private cloud.. But what is a “private cloud”?

Posted on by

Buzzword Much?

If you have seen any of EMC’s marketing for EMC World, or you are attending EMC World in Boston this week, you no doubt noticed a ton of talk about the “Private Cloud”.  There has been a lot more talk from vendors as of late about the “cloud” and “cloud computing” and you may be reminded about how every few years the word “cloud” is shouted out by vendors of all kinds and how inevitably the talk quiets and nothing is really different.  So is it different this time?  I think so.

What is a Cloud?

In the context of IT, there are examples of clouds already.  The Internet and public telephone system are two examples of clouds.  Facebook, Flickr, and Salesforce are examples of clouds as well.  The common theme is that each of these examples provides some sort of service to the end user without requiring the end user to purchase or build any infrastructure to support it.  You can plug a phone into a wall and immediately call nearly anyone in the world.  Cloud is a fancy word (or buzzword) for providing something “as-a-service”.  Salesforce.com is software-as-a-service (SaaS).

So what is the Private Cloud?  

In the context of enterprise datacenters, the focus of EMC’s vision, the Private Cloud is Infrastructure-as-a-service (IaaS) and it enables corporate IT to transition from a necessary expense, to a profit center within the business, providing IT-as-a-Service to the rest of the business.  It decouples infrastructure from applications providing unprecedented levels of scalability, availability, and flexibility at lower cost.

What if…
a.) your corporate applications could run from anywhere, and users had access from anywhere?
b.) you could relocate your applications from anywhere to anywhere else, at any time, without disruption to your users.
c.) you could replace any piece of physical hardware in your infrastructure without impacting your applications.

Sounds too good to be true right? Maybe not…

This week, EMC announced a completely new product called VPLEX.  VPLEX has the ability to take your existing storage arrays and pool them into a cooperative pool of storage for hosts and applications.  It then allows you to move application data within and across those arrays as needed without disrupting the application or users.  If you are familiar with EMC’s Invista, IBM’s SVC, or Hitachi’s USP-V products you may be thinking that VPLEX is just another storage virtualization product.  But I assure you it’s different.  VPLEX virtualizes storage within the datacenter similar to how the above products can, but VPLEX can ALSO combine storage across multiple datacenters and allow an application to run from any of them or all of them, simultaneously, through the power of Federation.

Active/Active Datacenters

With VPLEX Federation, you can move a virtual machine and all of its data from datacenter A to datacenter B in a matter of minutes without user disruption; or hundreds of VMs, or thousands of VMs.  You can run the same application in both locations, sharing a single dataset.  Armed with EMC VPLEX and VMWare vSphere, you can upgrade, replace, and reconfigure any part of your infrastructure (storage, servers, network, power distribution, etc) without ever having to take your applications offline.  How’s that for availability?

The ability to create a virtual infrastructure from the storage layer through to the server layer and host any application on that infrastructure is the key to creating providing Infrastructure-as-a-Service, building the Private Cloud, and provisioning IT-as-a-Service within your organization.  Imagine running the IT department as a business within the business and actually showing financial value to the business.

There is a lot more to this concept but I wanted to at least bring some context around “cloud” as well as EMC’s new VPLEX product.  There will be more to come on this topic.

Chuck Hollis wrote about VPLEX as a new Storage Platform today, and VirtualGeek called it a Virtual Machine teleporter in his quite detailed write up of this new technology.  The key is to step back with an open mind and think about how application design and disaster recovery planning could be approached in entirely new ways when the data is no longer confined to a particular physical location.

Do you have a recovery plan? You should!

Posted on by

In my new role at EMC, I am one of the first people to learn of major problems that my customers experience.  In general, customers seem to call their sales team before technical support when a big problem happens.  In the past week, I’ve been involved in recovery efforts with two different customers, both resulting from complete power outages in their production datacenters.

Both of these customers process millions of dollars through their global customer facing websites.  The smaller customer of the two does not have a disaster recovery site of any kind, while the other (larger) customer does have a recovery site, but it is not designed for 100% operation and is hundreds of miles away.

What became clear through both of these incidents is that having a very clear, very well known recovery plan is critical to the business.  Interestingly, these experiences drove home the point that even if you don’t have a recovery site, aren’t using replication, and otherwise don’t have any way to recover the data offsite, you still need a plan that encompasses what you CAN do.  More often than not, major outages are short lived and you will be recovering in your primary datacenter anyway, so you need to have a pre-determined plan to prevent major issues and shorten the time to recover.

Here are some things to think about when creating a recovery plan:

  • Get the application owners together and build a list of all the applications running in your environment.  Document the purpose of each application and map dependencies that each application has on other applications.
  • Next, involve the server/systems admins and document the server names, database names, IP addresses, and DNS names for each application on the list.
  • Finally, involve the infrastructure teams (storage, network, datacenter) and document the network dependencies (subnets, routers, VPN connections, load balancers, etc).  Document any SAN storage used by the servers/applications.  Also document how each infrastructure component affects others (ie: the SAN switches are required to be operational before servers can connect to storage arrays.)
  • Work with business leaders to prioritize the applications.  The idea is to understand how much impact each application has to the business both from a productivity perspective as well as direct financial impact.  There may be legal requirements or service level agreements with customers to consider as well.
  • If possible, identify the maximum amount of time each application can be down in the event of a catastrophic event (RTO – Recovery Time Objective) and how much data can be lost without significant impact to the business (RPO – Recovery Point Objective).  These metrics are usually measured in minutes, hours, and days.
  • Document the backup method for each server and application.  How often are backups run?  What is the retention period?  How long does it take to complete backups?   What is the expected time to restore the data?  How long does it take to recall tapes from offsite storage?
  • At this point you have a prioritized list of applications, now build a step by step recovery plan that lists the exact order in which you must recover systems.  The list should include server names as well as validation points to ensure certain systems are working before moving to the next step.  For example:
    • Step 1: bring up the network switches and routers
    • Step 2: bring up the DNS/DHCP servers
    • Step 3: bring up Active Directory servers
    • Step 4: bring up SAN fabric switches
    • Step 5: bring up SAN storage arrays, verify health of arrays with help from vendor
    • Step 6: …

I recommend that one of the first steps before starting recovery is to contact your key vendors (storage array vendors at least) to notify them of your outage so they can get support resources ready to troubleshoot any hardware issues you may run into during the recovery.

  • Identify key players needed in a recovery, at least primary and secondary contacts for every application and vendor contacts for hardware/software, facilities, UPS/Generator support teams, etc.
  • Establish a standard communication plan to include at least the following…
    • A method to notify employees of an outage and give instructions
    • A method to notify key players for recovery
    • A mechanism for key players to communicate with each other during the recovery
    • Personal (not corporate/business) contact information for all of the key players

The key thing to remember here is that you cannot rely on any communication tools that are part of your infrastructure.   You must assume your PBX/VOIP system will be down, Email will be down, corporate instant messenger will be down, Sharepoint will be unavailable, etc.

  • If you have a remote recovery site, with or without replication technology, and intend to use the remote site to recover production applications in the event of a large failure, be sure to document the triggers for moving to the recovery site.  As an example, you may want to attempt recovery in the primary site, and then move to the recovery site if recovery at the primary site will take too long — be sure to document that time and get executive buyoff.  You should not hear “how long do we wait until we move to the DR site?” during an active recovery operation.  That decision needs to be made during the planning exercise.
  • Document the entire plan and store the digital copies in a readily accessible place (file shares, Sharepoint site, etc).  Keep additional copies on USB sticks or CDs stored in a safe place.  Keep even MORE copies in another location outside the primary datacenter facility (ie: safe deposit box, remote office safe, etc).  Print copies as well and store the printed copy in similar safe places.  Assume that a building may not be accessible due to fire or flood.  I know one customer who issues fingerprint secured USB sticks to every manager.  Each manager must sync their USB stick to a server at least monthly or upper management is notified.
  • Make sure that everyone is aware of the recovery plan, who has access to the plan, where the copies are stored, and what role each of the key players is expected to play during a recovery.

There is far more to think about but hopefully you can get a good start with what I’ve listed above.  If you have a recovery plan already, you should review it regularly and think about anything that needs to be added or modified in the plan.

If you are trying to get approval for a remote recovery site and replication technology and having trouble getting executive approval, going through this exercise and defining application priority with RPO/RTO for each could give you the ammo you need.  Traditional backup architectures aren’t designed for RPO’s under 24 hours while storage array based replication can get RPOs down into the minutes and restoring from tape takes way longer than restoring from replicated data.

Last but not least, keep the plan updated as your environment changes, add new application and server details to the plan as part of the implementation process for new applications, or as part of change control procedures for significant changes to the infrastructure.