Tag Archives: federation

Saving $90,000 per minute with VMAX Federated Live Migration

Posted on by

One of the customers I work with regularly has a set of key MS SQL databases totaling over 100TB in size which process transactions for their customer facing systems. If these databases are down, no transactions can be processed and customers looking to spend money will go elsewhere. The booking rate related to these databases is ~$90,000 per minute, all day, every day.

Today these databases live on Symmetrix DMX storage which has been very reliable and performant. As happens in an IT world, some of these Symmetrix systems are getting long in the tooth and we are working on refreshing several older DMX systems into fewer, newer VMAX systems. Aside from the efficiency and performance benefits of sub-LUN tiering (EMC FASTVP), and the higher scalability of VMAX vs DMX as well as pretty much any other storage array on the market, EMC has added another new feature to VMAX that is particularly advantageous to this particular customer — Federated Live Migration

Tech Refresh and Data Migrations:

Tech Refresh is a fact of life and data migrations, as a result are common place. The challenge with a data migration is finding a workable process and then shoehorning that process into existing SLAs and maintenance windows. In general, data migrations are disruptive in some way or another, and the level of disruption depends on the technology used and the type of migration. EMC has a long list of migration tools that cover our midrange storage systems, NAS systems, as well as high-end Symmetrix arrays. The downside with these and other vendors’ migration tools is that there is some level of downtime required.

There are 3 basic approaches:

  1. Highest Amount of Downtime:
    • Take system down, copy data, reconfigure system, bring system online
  2. Less Downtime:
    • Copy Data, Take system down, copy recent changes, reconfigure system, bring system online (EMC Open Replicator, EMCopy, RoboCopy, Rsync, EMC SANCopy, etc)
  3. Even Less Downtime:
    • Set up new storage to proxy old storage, reconfigure system, serve data while copying in the background. (EMC Celerra CDMS, EMC Symmetrix Open Replicator)
    • (Traditional virtualization systems like Hitachi USP/VSP, IBM SVC, etc are similar to this as well since you must take some level of downtime to get hosts configured through the virtualization layer, after which you can non-disruptively move data).

Common theme in all 3? Downtime!

Non-Disruptive Migrations are becoming more realistic:

A while ago now, EMC added a feature to PowerPath called Migration Enabler which is a way to non-disruptively migrate data between LUNs and/or Storage arrays from the host perspective. PowerPath Migration Enabler could also leverage storage based copy mechanisms and help with cut over. This is very helpful and I have multiple customers who have successfully migrated data with PPME. The challenge has been getting customers to upgrade to newer versions of PowerPath that include the PPME features they need for their specific environment. Software upgrades usually require some level of downtime, at least on a per-host basis, and we run into maintenance windows and SLAs again.

EMC Symmetrix Federated Live Migration (FLM)

For Symmetrix VMAX customers, EMC has added a new capability that could make data migrations easy as pie for many customers. With FLM, a VMAX can migrate data into itself from another storage array, and perform the host cutover, automatically, and non-disruptively. The VMAX does not have to be inserted in front of the other storage array, and there is no downtime required to reconfigure the host before or after the migration.

Federation is the key to non-disruptive migrations:

The way FLM works is really pretty simple. First, the host is connected to the new VMAX storage without removing the connectivity to the old storage. The VMAX is also connected to the old storage array directly (through the fabric in reality). The VMAX system then sets up a replication session for the devices owned by the host and begins copying data. This is all pretty straightforward. The smart stuff happens during the cut over process.

As you probably know, the VMAX is an Active/Active storage system, so all paths from a host to a LUN are active. Clariion, similar to most other midrange systems is an Active/Passive storage system. On Clariion, paths to the storage processor that owns the LUN are active, while paths to the non-owner are passive until needed. PowerPath and many other path management tools support both active/active and active/passive storage systems.

Symmetrix FLM leverages this active/passive support for the cut over. Essentially the old storage system is the “owning SP” before and during the migration. At cut over time, the VMAX essentially becomes the owning SP and it’s paths go active while the old paths go passive. PowerPath follows the “trespass” and the host keeps on chugging. To make this work, VMAX actually spoofs the LUN WWN and host ID, etc from the old array, so the host thinks it’s still talking to the old array. Filesystems and LVMs that signature and track LUNs based on WWN and/or Host ID are unaffected as a result. The cut over process is nothing more than a path change at the HBA level. The data copy and cut over are managed directly from the VMAX by an administrator.

Of course there are limitations… FLM initially supports only EMC Symmetrix DMX arrays as the source and requires PowerPath. From what I gather, this is not a technical limitation however, it’s just what EMC has tested and supports. Other EMC arrays will be supported later and I have no doubt it will support non-EMC arrays as a source as well.

Solving the $5 million problem:

Symmetrix Federated Live Migration became a signature feature in our VMAX discussions with the aforementioned customer because they know that for every minute their database is down, they lose $90,000 in revenue. Even with the fastest of the 3 traditional methods of migration, they would be down for up to an hour while 12 cluster nodes are rezoned to new storage, LUNs masked, drive letters assigned, etc. A reboot alone takes up to 30 minutes. 1 hour of downtime equates to $5,400,000 of lost revenue. By the way, that’s quite a bit more than the cost of the VMAX, and FLM is included at no additional license cost, so FLM just paid for the VMAX, and then some.

EMC, Isilon, and CSX possibilities..

Posted on by

As you’ve no doubt heard, EMC has completed the tender offer to acquire Isilon (www.isilon.com)  for a Cajillion dollars (actually ~$2 Billion) and some people are asking why.  From where I sit, there are many reasons why EMC would want a company like Isilon, ranging from it’s media-minded customer base, to the technical IP, like scale-out NAS, that sets Isilon apart from the rest.

This EMC Press Release, as well as this one, and Chucks Blog are some of the many places to find out more about the acquisition…

I was thinking a lot about that technology as I worked on a high-bandwidth NAS project with a customer recently.  Isilon’s primary product is an IP-based storage solution that uses commodity based hardware components, combined with their proprietary OneFS Operating System, to deliver scale-out NAS with super simple management and scalability.  A single Isilon OneFS based filesystem can scale to over 10PB across hundreds of nodes.  Isilon also provides various versions of hardware that can be intermixed to increase performance, capacity, or both depending on customer needs.  You don’t necessarily have to add disks to an Isilon cluster to increase performance.

When looking at EMC’s own product line, you’ll find that Atmos delivers similar scale-out clustering for object-based storage, while VMAX does a similar type of scaling for high-end block storage (FC, FCoE, and iSCSI), and Greenplum provides scale-out analytics as well.  Line up Isilon’s OneFS, EMC GreenPlum, EMC Atmos, and EMC VMAX, and we can now deliver massive scale-out storage for database, object, file, and block data.  With VPLEX and Atmos, EMC also delivers block and object storage federation across distance.

Isilon’s OneFS also has technologies that mirror EMC’s but are implemented in such a way as to leverage the Scale-Out NAS model.  Take FlexProtect, for example, which is Isilon’s data protection mechanism (similar to RAID) and allows admins to apply different protection schemes (N+1 ala RAID5, N+2 ala RAID6, N+3, and even N+4 redundancy) on individual files and directories.  SmartPools, which is policy based, automatically tiers data at a file level based on read/write activity across different protection types and physical nodes, similar to how FASTVP tiers data at a block level on EMC Unified and VMAX.  Both EMC and Isilon realize that all data is not equal.

Rather than just repackage OneFS with an EMC logo (which I’m sure we’ll do at first), I wonder what else can we do with Isilon’s IP…

A recent series of blog posts by Steve Todd (Information Playground) on the topic of a Common Software Execution Environment (See CSX Technology and The Benefits of Component Assembly) got me thinking about deeper integration and how CSX can accelerate that integration.

For example…

What if EMC Engineering took the portions of code from Isilon’s OneFS that handle client load-balancing, file-level automated tiering, and flexible protection and turned them into CSX components.  Those components could be dropped into Celerra and immediately add Scale-Out NAS to EMC’s existing Unified storage platforms.  Or, imagine those components running directly in VMAX engines, providing scale-out NAS simultaneously with scale-out SAN across multiple, massive scale storage systems.  Combine the load balancing code and FlexProtect from Isilon with FASTVP in EMC Clariion to provide scale-out SAN in a midrange platform.

We could also reverse the situation and use the compression component that is in Clariion and Celerra, plus federation technology in Atmos, both added to OneFS in order reduce the storage footprint and extend Scale-Out NAS to many sites over any distance.  Add a GreenPlum component and suddenly you have a massive analytics cluster that spans multiple sites for data where you need it, when you need it.

The possibilities here really are endless, it will be very interesting to see what happens over the next 12 to 24 months.

Disclaimer: Even though I am an EMC employee, I am in no way involved in the EMC/Isilon acquisition, have no knowledge of future plans and roadmaps with regard to EMC and Isilon, and am not privy to any non-public information about this topic.  I am merely expressing my own personal views on this topic.

Resiliency vs Redundancy: Using VPLEX for SQL HA

Posted on by

A little history on my philosophy around high-availability

Around the year 2000, when I was working in network operations for a large wireless telco, a very senior network architect explained to me the company’s philosophy on building high availability solutions into the network.  The phrase I remember from that conversation was “we don’t build redundant networks, we build resilient networks..” The difference is that while redundant networks failover to secondary paths to resume traffic, resilient networks don’t go down at all.  This concept has stuck with me ever since and I tend to tackle high-availability problems of all kinds with this idea in mind.  It’s frankly been very difficult to build solutions that are resilient across the entire stack, mostly because infrastructure technology hasn’t quite gotten there yet.

Things may have changed…

I recently had a meeting with a customer to discuss local high availability for SQL.  This customer has a very large multi-node clustered SQL environment (hundreds of TBs of data, hundreds of databases, hundreds of instances, many clusters, many nodes per cluster) and has been testing SQL database mirroring as an alternative to traditional Windows Failover Clustering.  The focus of the meeting wound up focused primarily on leveraging VPLEX as an alternative to SQL mirroring, and the reasons for that decision suddenly reminded me of the Resiliency vs Redundancy discussion I had years ago.  A VPLEX solution potentially solves the same problem as DB mirroring, does it with less complexity, and less risk.

VPLEX Local as a Resilient HA solution

One of the many features of VPLEX is it’s ability to mirror data across multiple storage arrays and present that mirror as a single LUN to the host.  For customers already running large multi-node MSCS clusters, the LUN appears just like any normal storage LUN and Windows/SQL treat the LUN normally.  There are several reasons VPLEX should be considered as an alternative to database mirroring. (much of this applies to Exchange CCR as well)

VPLEX hardware is inherently Resilient.  A VPLEX cluster is an N+1 cluster of loosely coupled nodes, cooperating with each other, but not depending on each other.  Hosts can access any of the hosted data, through any of the ports, on any of the cluster nodes.  If a node fails for any reason, the remaining nodes continue serving IO for any data.  Except for a dead path on the host side (managed by PowerPath or MPIO), there is no failover process, and no cache mirroring to worry about.  The potential performance impact of a failure is equal to 1, divided by the quantity of that component in the cluster. (128 x 8gbe ports across 8 director nodes for a large VPLEX Local cluster)

In addition, because VPLEX utilizes a write-through cache, there is never any dirty cache data (data in cache that has not been committed to disk) in a VPLEX system.  A power outage or VPLEX hardware failure does not put data at risk.

Other Advantages of using VPLEX over SQL Database Mirroring

Improved Performance:

  • Compared with SQL Database mirroring, VPLEX mirroring has significantly less impact on transaction performance for writes and can improve transaction performance in some cases due to the large read cache in the VPLEX directors. (Note: I am comparing to DB Mirroring in Full-Safety mode since the customer’s requirement was a zero-data-loss solution.)

Non-Disruptive Storage Failover:

  • In the event of a storage failure, SQL Mirroring must perform a cluster node failover which takes a few seconds at best, possibly disrupting applications.  VPLEX provides completely non-disruptive failover when a storage failure occurs.  (A server hardware failure still triggers a node failover as it would in any other failover clustering scenario.)

Less Management Overhead:

  • From a management perspective, using VPLEX instead of SQL Database mirroring gives the SQL DBAs fewer SQL instances and fewer moving parts to manage on a daily basis.  The storage team just presents a mirrored LUN from VPLEX to the cluster and it’s business as usual for the DBAs.
  • VPLEX also allows the storage team to non-disruptively migrate data between storage arrays behind VPLEX to balance load, perform hardware refreshes, resolve capacity problems.  VPLEX performs the migration at the direction of the storage admins.

Reduced Risk:

  • Reducing management complexity also reduces risk.  With a high number of database instances and db mirrors involved in a large environment like this one, the chance of one of those mirrors having a problem, or being configured incorrectly, is increased.  DBAs can rely on VPLEX mirroring all of the data, 24x7x365, even when host maintenance is being performed.

Reduced Cost:

  • When compared with the SQL Database Mirroring solution, the VPLEX solution reduced the number of physical servers needed in this environment, reducing cost enough to more than offset the cost of VPLEX itself.  Combined with reductions in soft costs, like reduced DBA management overhead, VPLEX will actually save them quite a bit of money, and increased uptime during storage refresh and maintenance will increase revenues in this case as well.

A Distributed Future:

  • Next year, when a second datacenter is online nearby, the first VPLEX Local cluster can be connected to another VPLEX cluster in the new datacenter.  Then the SQL cluster nodes and data can be distributed across both datacenters, providing protection from entire datacenter outages, or solving space constraints with no changes to the application or servers, and no downtime.

I wonder how many other customers would like to build more resilient infrastructures?

If you combine a VPLEX solution with a true cluster file system and an active-active database engine (ie: Oracle RAC), you can eliminate the disruption caused by server hardware failures.  It’s just a matter of time now until the entire stack can be designed for true resiliency with very little management overhead.  I can’t wait to see what happens.

The following EMC White Paper has a lot of good information about using VPLEX in this same context:

Workload Resiliency with EMC VPLEX

EMC VPLEX enables the private cloud.. But what is a “private cloud”?

Posted on by

Buzzword Much?

If you have seen any of EMC’s marketing for EMC World, or you are attending EMC World in Boston this week, you no doubt noticed a ton of talk about the “Private Cloud”.  There has been a lot more talk from vendors as of late about the “cloud” and “cloud computing” and you may be reminded about how every few years the word “cloud” is shouted out by vendors of all kinds and how inevitably the talk quiets and nothing is really different.  So is it different this time?  I think so.

What is a Cloud?

In the context of IT, there are examples of clouds already.  The Internet and public telephone system are two examples of clouds.  Facebook, Flickr, and Salesforce are examples of clouds as well.  The common theme is that each of these examples provides some sort of service to the end user without requiring the end user to purchase or build any infrastructure to support it.  You can plug a phone into a wall and immediately call nearly anyone in the world.  Cloud is a fancy word (or buzzword) for providing something “as-a-service”.  Salesforce.com is software-as-a-service (SaaS).

So what is the Private Cloud?  

In the context of enterprise datacenters, the focus of EMC’s vision, the Private Cloud is Infrastructure-as-a-service (IaaS) and it enables corporate IT to transition from a necessary expense, to a profit center within the business, providing IT-as-a-Service to the rest of the business.  It decouples infrastructure from applications providing unprecedented levels of scalability, availability, and flexibility at lower cost.

What if…
a.) your corporate applications could run from anywhere, and users had access from anywhere?
b.) you could relocate your applications from anywhere to anywhere else, at any time, without disruption to your users.
c.) you could replace any piece of physical hardware in your infrastructure without impacting your applications.

Sounds too good to be true right? Maybe not…

This week, EMC announced a completely new product called VPLEX.  VPLEX has the ability to take your existing storage arrays and pool them into a cooperative pool of storage for hosts and applications.  It then allows you to move application data within and across those arrays as needed without disrupting the application or users.  If you are familiar with EMC’s Invista, IBM’s SVC, or Hitachi’s USP-V products you may be thinking that VPLEX is just another storage virtualization product.  But I assure you it’s different.  VPLEX virtualizes storage within the datacenter similar to how the above products can, but VPLEX can ALSO combine storage across multiple datacenters and allow an application to run from any of them or all of them, simultaneously, through the power of Federation.

Active/Active Datacenters

With VPLEX Federation, you can move a virtual machine and all of its data from datacenter A to datacenter B in a matter of minutes without user disruption; or hundreds of VMs, or thousands of VMs.  You can run the same application in both locations, sharing a single dataset.  Armed with EMC VPLEX and VMWare vSphere, you can upgrade, replace, and reconfigure any part of your infrastructure (storage, servers, network, power distribution, etc) without ever having to take your applications offline.  How’s that for availability?

The ability to create a virtual infrastructure from the storage layer through to the server layer and host any application on that infrastructure is the key to creating providing Infrastructure-as-a-Service, building the Private Cloud, and provisioning IT-as-a-Service within your organization.  Imagine running the IT department as a business within the business and actually showing financial value to the business.

There is a lot more to this concept but I wanted to at least bring some context around “cloud” as well as EMC’s new VPLEX product.  There will be more to come on this topic.

Chuck Hollis wrote about VPLEX as a new Storage Platform today, and VirtualGeek called it a Virtual Machine teleporter in his quite detailed write up of this new technology.  The key is to step back with an open mind and think about how application design and disaster recovery planning could be approached in entirely new ways when the data is no longer confined to a particular physical location.