Tag Archives: virtualization

Saving $90,000 per minute with VMAX Federated Live Migration

Posted on by

One of the customers I work with regularly has a set of key MS SQL databases totaling over 100TB in size which process transactions for their customer facing systems. If these databases are down, no transactions can be processed and customers looking to spend money will go elsewhere. The booking rate related to these databases is ~$90,000 per minute, all day, every day.

Today these databases live on Symmetrix DMX storage which has been very reliable and performant. As happens in an IT world, some of these Symmetrix systems are getting long in the tooth and we are working on refreshing several older DMX systems into fewer, newer VMAX systems. Aside from the efficiency and performance benefits of sub-LUN tiering (EMC FASTVP), and the higher scalability of VMAX vs DMX as well as pretty much any other storage array on the market, EMC has added another new feature to VMAX that is particularly advantageous to this particular customer — Federated Live Migration

Tech Refresh and Data Migrations:

Tech Refresh is a fact of life and data migrations, as a result are common place. The challenge with a data migration is finding a workable process and then shoehorning that process into existing SLAs and maintenance windows. In general, data migrations are disruptive in some way or another, and the level of disruption depends on the technology used and the type of migration. EMC has a long list of migration tools that cover our midrange storage systems, NAS systems, as well as high-end Symmetrix arrays. The downside with these and other vendors’ migration tools is that there is some level of downtime required.

There are 3 basic approaches:

  1. Highest Amount of Downtime:
    • Take system down, copy data, reconfigure system, bring system online
  2. Less Downtime:
    • Copy Data, Take system down, copy recent changes, reconfigure system, bring system online (EMC Open Replicator, EMCopy, RoboCopy, Rsync, EMC SANCopy, etc)
  3. Even Less Downtime:
    • Set up new storage to proxy old storage, reconfigure system, serve data while copying in the background. (EMC Celerra CDMS, EMC Symmetrix Open Replicator)
    • (Traditional virtualization systems like Hitachi USP/VSP, IBM SVC, etc are similar to this as well since you must take some level of downtime to get hosts configured through the virtualization layer, after which you can non-disruptively move data).

Common theme in all 3? Downtime!

Non-Disruptive Migrations are becoming more realistic:

A while ago now, EMC added a feature to PowerPath called Migration Enabler which is a way to non-disruptively migrate data between LUNs and/or Storage arrays from the host perspective. PowerPath Migration Enabler could also leverage storage based copy mechanisms and help with cut over. This is very helpful and I have multiple customers who have successfully migrated data with PPME. The challenge has been getting customers to upgrade to newer versions of PowerPath that include the PPME features they need for their specific environment. Software upgrades usually require some level of downtime, at least on a per-host basis, and we run into maintenance windows and SLAs again.

EMC Symmetrix Federated Live Migration (FLM)

For Symmetrix VMAX customers, EMC has added a new capability that could make data migrations easy as pie for many customers. With FLM, a VMAX can migrate data into itself from another storage array, and perform the host cutover, automatically, and non-disruptively. The VMAX does not have to be inserted in front of the other storage array, and there is no downtime required to reconfigure the host before or after the migration.

Federation is the key to non-disruptive migrations:

The way FLM works is really pretty simple. First, the host is connected to the new VMAX storage without removing the connectivity to the old storage. The VMAX is also connected to the old storage array directly (through the fabric in reality). The VMAX system then sets up a replication session for the devices owned by the host and begins copying data. This is all pretty straightforward. The smart stuff happens during the cut over process.

As you probably know, the VMAX is an Active/Active storage system, so all paths from a host to a LUN are active. Clariion, similar to most other midrange systems is an Active/Passive storage system. On Clariion, paths to the storage processor that owns the LUN are active, while paths to the non-owner are passive until needed. PowerPath and many other path management tools support both active/active and active/passive storage systems.

Symmetrix FLM leverages this active/passive support for the cut over. Essentially the old storage system is the “owning SP” before and during the migration. At cut over time, the VMAX essentially becomes the owning SP and it’s paths go active while the old paths go passive. PowerPath follows the “trespass” and the host keeps on chugging. To make this work, VMAX actually spoofs the LUN WWN and host ID, etc from the old array, so the host thinks it’s still talking to the old array. Filesystems and LVMs that signature and track LUNs based on WWN and/or Host ID are unaffected as a result. The cut over process is nothing more than a path change at the HBA level. The data copy and cut over are managed directly from the VMAX by an administrator.

Of course there are limitations… FLM initially supports only EMC Symmetrix DMX arrays as the source and requires PowerPath. From what I gather, this is not a technical limitation however, it’s just what EMC has tested and supports. Other EMC arrays will be supported later and I have no doubt it will support non-EMC arrays as a source as well.

Solving the $5 million problem:

Symmetrix Federated Live Migration became a signature feature in our VMAX discussions with the aforementioned customer because they know that for every minute their database is down, they lose $90,000 in revenue. Even with the fastest of the 3 traditional methods of migration, they would be down for up to an hour while 12 cluster nodes are rezoned to new storage, LUNs masked, drive letters assigned, etc. A reboot alone takes up to 30 minutes. 1 hour of downtime equates to $5,400,000 of lost revenue. By the way, that’s quite a bit more than the cost of the VMAX, and FLM is included at no additional license cost, so FLM just paid for the VMAX, and then some.

Compression better than Dedup? NetApp Confirms!

Posted on by

The more I talk with customers, the more I find that the technical details of how something works is much less important than the business outcome it achieves.  When it comes to storage, most customers just want a device that will provide the capacity and performance they need, at a price they can afford–and it better not be too complicated.  Pretty much any vendor trying to sell something will attempt to make their solution fit your needs even if they really don’t have the right products.  It’s a fact of life, sell what you have.  Along these lines, there has been a lot of back and forth between vendors about dedup vs. compression technology and which one solves customer problems best.

After snapshots and thin provisioning, data reduction technology in storage arrays has become a big focus in storage efficiency lately; and there are two primary methods of data reduction — compression and deduplication.

While EMC has been marketing compression technology for block and file data in Celerra, Unified, and Clariion storage systems, NetApp has been marketing deduplication as the technology of choice for block and file storage savings.  But which one is the best choice?  The short answer is.. it depends.  Some data types benefit most from deduplication while others get better savings with compression.

Currently, EMC supports file compression on all EMC Celerra NS20, 40, 80, 120, 480, 960, VG2, and VG8 systems running DART 5.6.47.x+ and block compression on all CX4 based arrays running FLARE30.x+.  In all cases, compression is enabled on a volume/LUN level with a simple check box and processing can be paused, resumed, and disabled completely, uncompressing the data if desired.  Data is compressed out-of-band and has no impact on writes, with minimal overhead on reads.  Any or all LUN(s) and/or Filesystem(s) can be compressed if desired even if they existed prior to upgrading the array to newer code levels.

With the release of OnTap 8.0.1, NetApp has added support for in-line compression within their FAS arrays.  It is enabled per-FlexVol and as far as I have been able to determine, cannot be disabled later (I’m sure Vaughn or another NetApp representative will correct me if I’m wrong here.)  Compression requires 64-bit aggregates which are new in OnTap 8, so FlexVols that existed prior to an upgrade to 8.x cannot be compressed without a data migration which could be disruptive.  Since compression is inline, it creates overhead in the FAS controller and could impact performance of reads and writes to the data.

Vaughn Stewart, of NetApp, expertly blogged today about the new compression feature, including some of the caveats involved, and to me the most interesting part of the post was the following graphic he included showing the space savings of compression vs. dedup for various data types.

Image Credit: Vaughn Stewart, NetApp

The first thing that struck me was how much better compression performed over deduplication for all but one data type (Virtualization will usually fare well because in a typical environment there are many VMs with the same operating system files).  In fact, according to NetApp, deduplication achieves very little savings, if any, for the majority of the data types here.
 
The light green bar indicates savings with both dedupe AND compression enabled on the same dataset.  In 5 out of 9 cases, dedup adds ZERO savings over compression alone.  I can’t help but wonder why anyone would enable dedup on those data types if they already had compression, since both features use storage array CPU resources to find and compress or dedup data.  I am aware that in some cases, dedup can improve performance on NetApp systems due to dedup-aware cache, but I also believe that any performance gain is directly related to the amount of duplication in the data.  Using this chart, virtualization is really the only place where dedup seems particularly effective and hence the only place where real performance gains would likely present themselves.
 
The challenge for NetApp customers will be getting their data into a configuration that supports compression due to the 64-bit aggregate requirement, lack of an easy and non-disruptive LUN migration feature (DataMotion appears to only support iSCSI and NFS and requires several additional licenses), and no way to convert an aggregate from 32-bit to 64-bit.  Once compression has been enabled, if there is truly no way to disable it, any resulting performance impact will be very difficult to rectify.
 
On the other hand, any EMC customer with current maintenance can upgrade their NS or CX4 array to newer versions of DART or FLARE, and compression can be enabled on any existing data after the fact.  If performance becomes an issue for a particular dataset once compressed, the data can be uncompressed later.  Both operations are completely non-disruptive and run in the background.  While block compression only works on LUNs in a virtual pool, as opposed to a traditional RAID group, enabling compression on a normal LUN will automatically migrate the LUN into a virtual pool, perform zero-page reclaim, followed by compression, and the entire process is completely non-disruptive to the application.  Oh, and compressed data can still be tiered with FASTVP across SSD, FC, and SATA disk and/or benefit from up to 2TB of FASTCache.
 
I admit that there is a place for deduplication as well as compression in reducing the footprint of customer data.  However, based on what I’ve seen in my career as an IT professional, and with my customers in my current role at EMC, there are more use cases for compression than there are for deduplication when it comes to primary data, whether SAN or NAS.  Either way, if I was using a new technology for the first time on a particular data set, whether compression or deduplication, I would definitely want a backout plan in case the drawbacks outweight the benefits.

EMC VPLEX enables the private cloud.. But what is a “private cloud”?

Posted on by

Buzzword Much?

If you have seen any of EMC’s marketing for EMC World, or you are attending EMC World in Boston this week, you no doubt noticed a ton of talk about the “Private Cloud”.  There has been a lot more talk from vendors as of late about the “cloud” and “cloud computing” and you may be reminded about how every few years the word “cloud” is shouted out by vendors of all kinds and how inevitably the talk quiets and nothing is really different.  So is it different this time?  I think so.

What is a Cloud?

In the context of IT, there are examples of clouds already.  The Internet and public telephone system are two examples of clouds.  Facebook, Flickr, and Salesforce are examples of clouds as well.  The common theme is that each of these examples provides some sort of service to the end user without requiring the end user to purchase or build any infrastructure to support it.  You can plug a phone into a wall and immediately call nearly anyone in the world.  Cloud is a fancy word (or buzzword) for providing something “as-a-service”.  Salesforce.com is software-as-a-service (SaaS).

So what is the Private Cloud?  

In the context of enterprise datacenters, the focus of EMC’s vision, the Private Cloud is Infrastructure-as-a-service (IaaS) and it enables corporate IT to transition from a necessary expense, to a profit center within the business, providing IT-as-a-Service to the rest of the business.  It decouples infrastructure from applications providing unprecedented levels of scalability, availability, and flexibility at lower cost.

What if…
a.) your corporate applications could run from anywhere, and users had access from anywhere?
b.) you could relocate your applications from anywhere to anywhere else, at any time, without disruption to your users.
c.) you could replace any piece of physical hardware in your infrastructure without impacting your applications.

Sounds too good to be true right? Maybe not…

This week, EMC announced a completely new product called VPLEX.  VPLEX has the ability to take your existing storage arrays and pool them into a cooperative pool of storage for hosts and applications.  It then allows you to move application data within and across those arrays as needed without disrupting the application or users.  If you are familiar with EMC’s Invista, IBM’s SVC, or Hitachi’s USP-V products you may be thinking that VPLEX is just another storage virtualization product.  But I assure you it’s different.  VPLEX virtualizes storage within the datacenter similar to how the above products can, but VPLEX can ALSO combine storage across multiple datacenters and allow an application to run from any of them or all of them, simultaneously, through the power of Federation.

Active/Active Datacenters

With VPLEX Federation, you can move a virtual machine and all of its data from datacenter A to datacenter B in a matter of minutes without user disruption; or hundreds of VMs, or thousands of VMs.  You can run the same application in both locations, sharing a single dataset.  Armed with EMC VPLEX and VMWare vSphere, you can upgrade, replace, and reconfigure any part of your infrastructure (storage, servers, network, power distribution, etc) without ever having to take your applications offline.  How’s that for availability?

The ability to create a virtual infrastructure from the storage layer through to the server layer and host any application on that infrastructure is the key to creating providing Infrastructure-as-a-Service, building the Private Cloud, and provisioning IT-as-a-Service within your organization.  Imagine running the IT department as a business within the business and actually showing financial value to the business.

There is a lot more to this concept but I wanted to at least bring some context around “cloud” as well as EMC’s new VPLEX product.  There will be more to come on this topic.

Chuck Hollis wrote about VPLEX as a new Storage Platform today, and VirtualGeek called it a Virtual Machine teleporter in his quite detailed write up of this new technology.  The key is to step back with an open mind and think about how application design and disaster recovery planning could be approached in entirely new ways when the data is no longer confined to a particular physical location.