Tag Archives: emc

EMCWorld 2011 Announcements

Posted on by

After flying into Las Vegas this morning, checking in to the Cosmopolitan Hotel with 2000+ other EMCer’s, and taking the bus to the Venetian, I’m finally settling in a bit.  I watched Pat Gelsinger’s keynote via recorded video on Facebook.com/EMCCorp to get caught up on what I’ve missed so far and there was more in there than I expected.

Buried in the hour long keynote about EMC’s portfolio and how it aligns with Big Data and Cloud Computing were several important announcements.

  1. Atmos 2.0 – Improved Performance, Compatibility with Amazon S3, and a Windows native client called GeoDrive.   Atmos already offered a client for Redhat Linux which joined the Redhat client into the Atmos Cloud for direct access to objects/files.  GeoDrive provides native file access to the Atmos cloud for Windows clients.  This is particularly useful for supporting legacy applications that have not (or cannot) be modified to use the Atmos RESTful API.
  2. Isilon NL108 – The new NearLine nodes contain 108TB of disk space in each node, and this increase in node capacity has similarly increased the maximum filesystem size of an Isilon cluster from 10PB to 15PB (In a SINGLE filesystem..  And yes, that’s PetaBytes).
  3. Project Lightning – PCIe based FLASH caching adapters for Servers.  Intended to work in conjunction with FASTVP to cache data at the server and possibly distributed across servers.
  4. EMC Hadoop (aka Greenplum HD) – EMC supplied and supported Hadoop distribution which can be acquired as software or as an appliance, similar to Greenplum.     

Lot’s more to come.  Chuck Hollis just sat down here so it’s time to socialize..

Performance Analysis for Clariion and VNX – Part 4

Posted on by

<< Back to Part 3 — Part 4 — Go to Part 5 >>

Making Lemonade from Lemons.

In the last post, we looked at the storage processor statistics to check for cache health, excessive queuing, and response time issues and found that SPA has some performance degradation which seems to be related to write IO.  Now we need to drill down on the individual LUNs to see where that IO is being directed.  This is done in the LUN tab of Analyzer.  First, right click on the storage array itself in the left pane and choose deselect all -> items.  Then click the LUN tab and right click on the top level of the tree “LUNs”, choose select all -> LUNs.  Click on one of the LUNs to highlight it, then in choose Write Throughput (IO/s) from the bottom pane.  It may take a second for Analyzer to render the graph but you’ll end up with something like this…

You’ll quickly realize that this view doesn’t really help you figure out what’s going on.  With many LUNs, there is simply too much data to display it this way.  So click the clipboard button that has the I’s and O’s in it (next to the red arrow) to copy the graph data (in CSV format) into your desktop clipboard.  Now launch Microsoft Excel, select cell A1 and type Ctrl-V to paste the data.  It will look like the following image at first, with all LUNs statistics pasted into Column A.

Now we need to break out the various metrics into their own columns to make meaningful data, so go to the Data menu and click Text to Columns (see red arrow above).  Select Delimited, click Next..  Select ONLY comma as the delimiter, then next, next, finish.  Excel will separate the data into many columns (one column per LUN).  Next we’ll create a graph that can actually tell us something.  First, click the triangle button at the upper left corner of the sheet to select all of the data in the sheet at once.  Then click the area chart icon, select Area, then the Stacked Area (see Red Arrows below) icon.  Click OK.

You’ll get a nice little graph like this one below that is completely useless because the default chart has the X and Y axis reversed from what we need for Analyzer data.

To Fix this, right click on the graph, choose “Select Data”, click the Switch row/column button, and click OK.

Now you have a useful graph like the one below.  What we are seeing here is each band of color representing the Write IOPS for a particular LUN.  You’ll note that about 6 LUNs have very thick bands, and the rest of the over 100 LUNs have very small bands.  In this case, 6 LUNs are driving more than 50% of the total write IOPS on the array. Since the column header in the Excel sheet has the LUN data, you can mouse over the color band to see which LUN it represents.

Now that you know where to look, you can go back to Analyzer, deselect all LUNs and drill down to the individual LUNs you need to look at.  You may also want to look at the hosts that are using the busy LUNs to see what they are doing.  In Analyzer, check the Write IO Size for the LUNs you are interested in and see if the size is in line with your expectations for the application involved. Very large IO sizes coupled with high IOPS (ie: high bandwidth) may cause write cache contention.  In the case of this particular array, these 6 LUNs are VMFS datastores, and based on the Thin LUN space utilization and write IO loads, I would recommend that the customer convert them from Thin LUNs to Thick LUNs in the same Virtual Pool.  Thick LUNs have better write performance and lower processor overhead compared with Thin LUNs and the amount of free space in these Thin LUNs is fairly small.  This conversion can be done online with no host impact using LUN Migration.

You can use this copy/paste technique with Excel to graph all sorts of complex datasets from Analyzer that are pretty much not viewable with the default Analyzer graph.  This process lets you select specific data or groups of metrics from an complete Analyzer archive and graph just the data you want, in the way you want to see it.  There is also a way to do this as a bulk export/import, which can be scheduled too, and I’ll discuss that in the next post.

<< Back to Part 3 — Part 4 — Go to Part 5 >>

Performance Analysis for Clariion and VNX – Part 3

Posted on by

<< Back to Part 2 — Part 3 — Go to Part 4 >>

Disclaimer: Performance Analysis is an art, not a science.  Every array is different, every application is different, and every environment has a different mix of both.  These posts are an attempt to get you started in looking at what the array is doing and pointing you in a direction to go about addressing a problem.  Keep in mind, a healthy array for one customer could be a poorly performing array for a different customer.  It all comes down to application requirements and workload.  Large block IO tends to have higher response times vs. small block IO for example.  Sequential IO also has a smaller benefit from (and sometimes can be hindered by) cache.  High IOPS and/or Bandwidth is not a problem, in fact it is proof that your array is doing work for you.  But understanding where the high IOPS are coming from and whether a particular portion of the IO is a problem is important.  You will not be able to read these series of posts and immediately dive in and resolve a performance problem on your array.  But after reading these, I hope you will be more comfortable looking at how the system is performing and when users complain about a performance problem, you will know where to start looking.  If you have a major performance issue and need help, open an case.

Starting from the top…

First let’s check the health of the front end processors and cache.  The data for this is in the SP Tab which shows both of the SPs.  The first thing I like to look at is the “SP Cache Dirty Pages (%)” but to make this data more meaningful we need to know what the write cache watermarks are set to.  You can find this by right-clicking on the array object in the upper-left pane and choosing properties.  The watermarks are shown in the SP Cache tab.

Once you note the watermarks, close the properties window and check the boxes for SPA and SPB.  In the lower pane, deselect utilization and chose SP Cache Dirty Pages (%).

Dirty pages are pages in write cache that have received new data from hosts, but have not been flushed to disk.  Generally speaking you want to have a high percentage of dirty pages because it increases the chance of a read coming from cache or additional writes to the same block of data being absorbed by the cache.  Any time an IO is served from cache, the performance is better than if the data had to be retrieved from disk.  This is why the default watermarks are usually around 60/80% or 70/90%.


What you don’t want is for dirty pages to reach 100%.  If the write cache is healthy, you will see the dirty pages value fluctuating between the high and low watermarks (as SPB is doing in the graph).  Periodic spikes or drops outside the watermarks are fine, but repeatedly hitting 100% indicates that the write cache is being stressed (SPA is having this issue on this system).  The storage system compensates for a full cache by briefly delaying host IO and going into a forced flushing state.  Forced Flushes are high priority operations to get data moved out of cache and onto the back end disks to free up write cache for more writes.  This WILL cause performance degradation.  Sustained Large Block Write IO is a common culprit here.

While we’re here, deselect Dirty Pages (%) and select Utilization (%) and look for two things here:

1.) Is either SP running at a load of higher than 70%?  This will increase application response time.  Check whether the SPs seem to fluctuate with the business day.  For non-disruptive upgrades, both SPs need to be under 50% utilization.

2.) Are the two SPs balanced?  If one is much busier than the other that may be something to investigate.

Now look at Response time (ms) and make sure that, again, both SPs are relatively even, and that Response time is within reasonable levels.  If you see that one SP has high utilization and response time but the other SP does not, there may be a LUN or set of LUNs owned by the busy SP that are consuming more array resources.  Looking at Total Throughput and Total Bandwidth can help confirm this, and then graphing Read vs. Write Throughput and Bandwidth to see what the IO operations actually are.  If both SPs have relatively similar throughput but one SP has much higher bandwidth, then there is likely some large block IO occurring that you may want to track down.

As an example, I’ve now seen two different customers where a Microsoft Sharepoint server running in a virtual machine (on a VMFS datastore) had a stuck process that caused SQL to drive nearly 200MB/sec of disk bandwidth to the backend array.  Not enough to cause huge issues, but enough to overdrive the disks in that LUN’s RAID Group, increasing queue length on the disks and SP, which in turn increased SP utilization and response time on the array.  This increased response time affected other applications unrelated to Sharepoint.

Next, let’s check the Port Queue Full Count.  This is the number of times that a front end port issued a QFULL response back to the hosts.   If you are seeing QFULL’s there are two possible causes.. One is that the Queue Depth on the HBA is too large for the LUNs being accessed.  Each LUN on the array has a maximum queue depth that is calculated using a formula based on the number of data disks in the RAID Group.  For example, a RAID5 4+1 LUN will have a queue depth of 88.  Assuming your HBA queue depth is 64 then you won’t have a problem.  However, if the LUN is used in a cluster file system (Oracle ASM, VMWare VMFS, etc) where multiple hosts are accessing the LUN simultaneously, you could run into problems here.  Reducing the HBA Queue Depth on the hosts will alleviate this issue.

The second cause is when there are many hosts accessing the same front end ports and the HBA Execution Throttle is too large on those hosts.  A Clariion/VNX front end port has a queue depth of 1600 which is the maximum number of simultaneous IO’s that port can process.  If there are 1600 IOs in queue and another IO is issued, the port responds with QFULL.   The host HBA responds by lowering its own Queue Depth (per LUN) to 1 and then gradually increasing the queue depth over time back to normal.  An example situation might be 10 hosts, all driving lots of IO, with HBA Execution Throttle set to 255.  It’s possible that those ten hosts can send a total of 2550 IOs simultaneously.  If they are all driving that IO to the same front end port, that will flood the port queue.  Reducing the HBA Execution throttle on the hosts will alleviate this issue.

Looking at the Port Throughput, you can see here that 2 ports are driving the majority of the workload.  This isn’t necessarily a problem by itself, but PowerPath could help spread the load across the ports which could potentially improve performance.

In VMWare environments specifically, it is very common to see many hosts all accessing many LUNs over only 1 or 2 paths even though there may be 4 or 8 paths available.  This is due to the default path selection (lowest port) on boot.  This could increase the chances of a QFULL problem as mentioned above or possibly exceeding the available bandwidth of the ports.  You can manually change the paths on each LUN on each host in a VMWare cluster to balance the load, or use Round-Robin load balancing.  PowerPath/VE automatically load balances the IO across all active paths with zero management overhead.

Another thing to look for is an imbalance of IO or Bandwidth on the processors.  Look specifically at Write Throughput and Write Bandwidth first as writes have the most impact on the storage system and more specifically the write cache.  As you can see in this graph, SPA is processing a fair bit more write IOPS compared to SPB.  This correlates with the high Dirty Pages and Response Time on SPA in the previous graphs.

So we’ve identified that there is performance degradation on SPA and that it is probably related to Write IO.  The next step is to dig down and find out if there are specific LUNs causing the high write load and see if those could be causing the high response times.

<< Back to Part 2 — Part 3 — Go to Part 4 >>

Performance Analysis for Clariion and VNX – Part 2

Posted on by 11 comments

<< Back to Part 1 — Part 2 — Go to Part 3 >>

Okay, so you’ve got the Analyzer enabler on your array and enabled logging, and you’ve installed Unisphere Server, Unisphere Client, and Microsoft Excel on your workstation.  Next step is to download a NAR file from the array.  In Navisphere, right click on the array, go to the Analyzer menu and retrieve an archive.  You can get the archive from either SP of the array, both have the same data.  You will eventually see multiple NAR files, each covering some period of time.  Retrieve the one for the period of time you want to look at.  You can also merge multiple files together to get larger time periods into a single analyzer session.  In Unisphere, the process is essentially the same, select the array, go to Monitoring -> Analyzer.

You’ve got your workstation set up and you have a NAR file downloaded to your workstation.  Let’s get to it.  Launch Unisphere Client from the Start Menu and connect to “localhost” when prompted.  Login to Unisphere.  You’ll see something like this…

In the drop down menu change to the “Unisphere Server – 127.0.0.1” which will change the main screen to Event Notification most likely.  Click on Monitoring, then Analyzer.

Let’s set some defaults before we open a NAR file.

  1. In the left pane, click Customize Charts
    1. In the General Tab, check the Advanced box so we can see more detailed metrics in Analyzer
    2. In the Archive Tab, under Analyzer, select Performance Detail and make sure Initially Check All Tree Objects is unchecked.
  1. Click OK to save.

In the right pane, click on Open Archive , browse to the NAR file you want to view and open.

Because the NAR file can contain many hours (sometimes multiple days) or performance data, you will be prompted to set a time range.  The default times will show all data available in the archive.  If you want to narrow down to a smaller time range, change the Graph Start and End times, otherwise just click OK.

The Performance Detail window will launch and the LUN tab will be selected.  No items should be selected and as such no data will be graphed.

My personal methodology is to take a top-down approach when it comes to performance analysis and troubleshooting.

  • Check the SP’s, Cache, and SP Ports for obvious issues.  If a user is complaining of poor performance the Cache is usually the first place I look.
  • Drill down to RAID Groups, Pools, and LUNs to find the culprits
  • Drill down to the physical disk level if necessary
  • Export data to Excel for better graphs that make it easier to see whats happening

<< Back to Part 1 — Part 2 — Go to Part 3 >>

Performance Analysis for Clariion and VNX – Part 1

Posted on by
Part 1 — Go to Part 2 >>
  • Do you have an application owner complaining about performance?
  • Do you want to get a general idea of how your array is performing?
  • Do you want to turn this.. into this..?

I’ve been doing a lot of performance analysis with EMC Clariion CX3, CX4, and VNX storage recently and have a sort of an informal methodology I follow.  I’ve had a couple customers ask me to show them how to get useful data and graphs from their arrays and more recently after posting about FASTCache and FASTVP results I’ve had even more queries on the topic.  So I’ve decided to put together a sort of how-to guide.  It will take several posts to go through the whole process, so this first post will focus on making sure you have the right tools. The Tools: First, you MUST have the Navisphere/Unisphere Analyzer enabler on the storage array.  If you don’t have it, all you can really do is send an encrypted archive to EMC for help when you have a performance problem.  Analyzer is an indispensable performance analysis tool for CX/VNX systems and is really quite powerful.  Unfortunately, many customers don’t see the value during the purchase process but end up needing it someday in the future.  Make sure Analyzer is included in EVERY array purchase.

If you haven’t already, you also need to enable Statistics on the array AND in more recent versions of FLARE you need to enable Archive Logging.  Statistics logging is enabled in the array properties dialog, shown here…

Archive Logging is enabled in the Monitoring -> Analyzer -> Data Logging dialog, shown here…

In practice, 5 minutes is a good interval for archives.  Also make sure that periodic archiving is enabled which will generate a new NAR file every so often (it depends on the interval)

Next, you need an Analyzer workstation.  You can run Analyzer directly off an array through Navisphere Manager or Unisphere but I prefer installing the software directly on my PC.  It lets me work on the analysis from home or anywhere else, and since I look at data from many different customer’s arrays’ it’s easier.  You can download the latest version of Unisphere Server and Unisphere Client directly from PowerLink (Home > Support > Software Downloads and Licensing > Downloads T-Z > Unisphere Server Software).  Once you install both, you can launch the client and log in to your local Unisphere server.   You can then open Analyzer archive files (NAR files) from any array for analysis. Third, you need a graphing tool.  I currently use Microsoft Excel 2010 on the same workstation as my Unisphere installation, which happens to be my corporate laptop.  While Analyzer does graph the data you select, there is only one type of graph available and sometimes when many objects are being graphed together it’s almost impossible to actually compare them to each other.

Another reason to use Excel is that while Analyzer has a wealth of different statistics available for all sorts of array objects, there are some exceptions right now.  For example, if you are using newer features such as FASTCache or FASTVP on your array and want to see statistics for those technologies, there is not much in Analyzer to see.  I’ll go through some methods for teasing that data out as well.

Part 1 — Go to Part 2 >>

If You Are Using SSDs, You Should Be Encrypting

Posted on by

I saw the following article come across Twitter today.

http://www.zdnet.com/blog/storage/ssd-security-the-worst-of-all-worlds/1326

In it, Robin Harris describes the issues around data recovery and secure erasure specific to SSD disks.  In layman’s terms, since SSDs do all sorts of fancy things with writes to increase longevity and performance, disk erasure is nearly impossible using normal methods, and forensic or malicious data recovery is quite easy.  So if you have sensitive data being stored on SSDs, that data is at risk of being read by someone, some day, in the future.  It seems that pretty much the only way to mitigate this risk is to use encryption at some level outside the SSD disk itself.

Did you know that EMC Symmetrix VMAX offers data-at-rest encryption that is completely transparent to hosts and applications, and has no performance impact?  With Symmetrix D@RE, each individual disk is encrypted with a unique key, managed by a built-in RSA key manager, so disks are unreadable if removed from the array.   Since the data is encrypted as the VMAX is writing to the physical disk, attempting to read data off an individual disk without the key is pointless, even for SSD disks.

The beauty of this feature is that it’s set-it-and-forget it.  No management needed, it’s enabled during installation and that’s it.  All disks are encrypted, all the time.

  • Ready to decomm an old array and return it, trade it, or sell it?  Destroy the keys and the data is gone.  No need for an expensive Data Erasure professional services engagement.
  • Failed disk replaced by your vendor?  No need for special arrangements with your vendor to keep those disks onsite, or certify erasure of a disk every time one is replaced.  The key stays with the array and the data on that disk is unreadable.

If you have to comply with PCI and/or other compliance rules that require secure erasure of disks, you should consider putting that data on a VMAX with data-at-rest encryption.

Now, What if you have an existing EMC storage system and the same need to encrypt data?  You can encrypt at the volume level with PowerPath Encryption.  PowerPath encrypts the data at the host with a unique key managed by an RSA Key Manager.  And it works with the non-EMC arrays that PowerPath supports as well.

Under normal circumstances, PowerPath Encryption does have some level of performance impact to the host however HBA vendors, such as Emulex, are now offering HBAs with encryption offload that works with PowerPath.  If you combine PowerPath Encryption with Emulex Encryption HBAs, you get in-flight AND at-rest encryption with near-zero performance impact.

  • Do you replicate your sensitive data to a 3rd party remote datacenter for business continuity?  PowerPath Encryption prevents unauthorized access to the data because no host can read it without the proper key.

Real World EMC FASTVP and FASTCache results!

Posted on by

I have a customer who just recently upgraded their EMC Celerra NS480 Unified Storage Array (based on Clariion CX4-480) to FLARE30 and enabled FASTCache across the array, as well as FASTVP automated tiering for a large amount of their block data.  Now that it’s been configured and the customer has performed a large amount of non-disruptive migrations of data from older RAID groups and VP pools into the newer FASTVP pool, including thick-to-thin conversions, I was able to get some performance data from their array and thought I’d share these results.

This is Real-World data

This is NOT some edge case where the customer’s workload is perfect for FASTCache and FASTVP and it’s also NOT a crazy configuration that would cost an arm and a leg.  This is a real production system running in a customer datacenter, with a few EFDs split between FASTCache and FASTVP and some SATA to augment capacity in the pool for their existing FC based LUNS.  These are REAL results that show how FASTVP has distributed the IO workload across all available disks and how a relatively small amount of FASTCache is absorbing a decent percentage of the total array workload.

This NS480 array has nearly 480 drives in total and has approximately 28TB of block data (I only counted consumed data on the thin LUNs) and about 100TB of NAS data.  Out of the 28TB of block LUNs, 20TB is in Virtual Pools, 14TB of which is in a single FASTVP Pool.  This array supports the customers’ ERP application, entire VMWare environment, SQL databases, and NAS shares simultaneously.

In this case FASTCache has been configured with just 183GB of usable capacity (4 x 100GB EFD disks) for the entire storage array (128TB of data) and is enabled for all LUNs and Pools.  The graphs here are from a 4 hour window of time after the very FIRST FASTVP re-allocation completed using only about 1 days’ worth of statistics.  Subsequent re-allocations in the FASTVP pool will tune the array even more.

FASTCache

First, let’s take a look at the array as a whole, here you can see that the array is processing approximately ~10,000 IOPS through the entire interval.

FASTCache is handling about 25% of the entire workload with just 4 disks.  I didn’t graph it here but the total array IO Response time through this window is averaging 2.5 ms.  The pools and RAID Groups on this array are almost all RAID5 and the read/write ratio averages 60/40 which is a bit write heavy for RAID5 environments, generally speaking.

If you’ve done any reading about EMC FASTCache, you probably know that it is a read/write cache.  Let’s take a look at the write load of the array and see how much of that write load FASTCache is handling.  In the following graph you can see that out of the ~10,000 total IOPS, the array is averaging about 2500-3500 write IOPS with FASTCache handling about 1500 of that total.

That means FASTCache is reducing the back-end writes to disk by about 50% on this system.  On the NS480/CX4-480, FASTCache can be configured with up to 800GB usable capacity, so this array could see higher overall performance if needed by augmenting FASTCache further.  Installing and upgrading FASTCache is non-disruptive so you can start with a small amount and upgrade later if needed.

FASTVP and FASTCache Together

Next, we’ll drill down to the FASTVP pool which contains 190 total disks (5 x EFD, 170 x FC, and 15 x SATA).  There is no maximum number of drives in a Virtual Pool on FLARE30 so this pool could easily be much larger if desired.  I’ve graphed the IOPS-per-tier as well as the FASTCache IOPS associated with just this pool in a stacked graph to give an idea of total throughput for the pool as well as the individual tiers.

The pool is servicing between 5,000 and 8,000 IOPS on average which is about half of the total array workload.  In case you didn’t already know, FASTVP and FASTCache work together to make sure that data is not duplicated in EFDs.  If data has been promoted to the EFD tier in a pool, it will not be promoted to FASTCache, and vise-versa.  As a result of this intelligence, FASTCache acceleration is additive to an EFD-enabled FASTVP pool.   Here you can see that the EFD tier and FASTCache combined are servicing about 25-40% of the total workload, the FC tier another 40-50%, and the SATA tier services the remaining IOPS.  Keep in mind that FASTCache is accelerating IO for other Pools and RAID Group LUNs in addition to this one, so it’s not dedicated to just this pool (although that is configurable.)

FASTVP IO Distribution

Lastly, to illustrate FASTVP’s effect on IO distribution at the physical disk layer, I’ve broken down IOPS-per-spindle-per-tier for this pool as well.  You can see that the FC disks are servicing relatively low IO and have plenty of head room available while the EFD disks, also not being stretched to their limits, are servicing vastly more IOPS per spindle, as expected.  The other thing you may have noticed here is that the EFDs are seeing the majority of the workload’s volatility, while the FC and SATA disks have a pretty flat workload over time.  This illustrates that FASTVP has placed the more bursty workloads on EFD where they can be serviced more effectively.

Hopefully you can see here how a very small amount of EFDs used with both FASTCache and FASTVP can relieve a significant portion of the workload from the rest of the disks.  FASTCache on this system adds up to only 0.14% of the total data set size and the EFD tier in the FASTVP pool only accounts for 2.6% of the total dataset in that pool.

What do you think of these results?  Have you added FASTCache and/or FASTVP to your array?  If so, what were your results?

Say Hello to EMC VNX7500

Posted on by

I got these pictures from one of my customers who completed install of two new EMC VNX7500 arrays.  It may be hard to tell but each of these systems host 21 x EFD drives and over 500 x SAS drives in a single rack.  Building on EMC’s promise of efficiency, each of these VNX arrays are delivering over 60,000 heavy-write host IOPS while providing 130TB of usable capacity in a single rack, consuming less power than the 4 servers in the neighboring rack.  Combined, these two arrays provide 260TB usable of extremely high performance storage to host many hundreds of production virtual servers.

VNX7500s installed and running

 

Auto-Tiering, Cloud Storage, and Risk-Free Thin Pools

Posted on by

Some customers are afraid of thin provisioning…

Practically every week I have discussions with customers about leveraging thin provisioning to reduce their storage costs and just as often the customer pushes back worried that some day, some number of applications, for some reason, will suddenly consume all of their allocated space in a short period of time and cause the storage pool to run out of space.  If this was to happen, every application using that storage pool will essentially experience an outage and resolving the problem requires allocating more space to the pool, migrating data, and/or deleting data, each of which would take precious time and/or money.  In my opinion, this fear is the primary gating factor to customers using thin provisioning.  Exacerbating the issue, most large organizations have a complex procurement process that forces them to buy storage many months in advance of needing it, further reducing the usefulness of thin provisioning.  The IT organization for one of my customers can only purchase new storage AFTER a business unit requests it and approved by senior management; and they batch those requests before approving a storage purchase.  This means that the business unit may have to wait months to get the storage they requested.

This same customer recently purchased a Symmetrix VMAX with FASTVP and will be leveraging sub-LUN tiering with SSD, FC, and SATA disks totaling over 600TB of usable capacity in this single system.  As we began design work for the storage array the topic of thin provisioning came up and the same fear of running out of space in the pool was voiced.  To prevent this, the customer fully allocates all LUNs in the pool up front which prevents oversubscription.  It’s an effective way to guarantee performance and availability but it means that any free space not used by application owners is locked up by the application server and not available to other applications.  If you take their entire environment into account with approximately 3PB of usable storage and NO thin provisioning, there is probably close to $1 million in storage not being used and not available for applications.  If you weigh the risk of an outage causing the loss of several million dollars per hour of revenue, the customer has decided the risks outweigh the potential savings.  I’ve seen this decision made time and again in various IT shops.

Sub-LUN Tiering pushes the costs for growth down

I previously blogged about using cloud storage for block storage in the form of Cirtas BlueJet and how it would not be to much of a stretch to add this functionality to sub-LUN tiering software like EMC’s FASTVP to leverage cloud storage as a block storage tier as shown in this diagram.

Let’s first assume the customer is already using FASTVP for automated sub-LUN tiering on a VMAX.  FASTVP is already identifying the hot and cold data and moving it to the appropriate tier, and as a result the lowest tier is likely seeing the least amount of IOPS per GB.  In a VMAX, each tier consists of one or more virtual provisioned pools, and as the amount of data stored on the array grows FASTVP will continually adjust, pushing the hot data up to higher tiers and cold data down to the lower tiers  The cold data is more likely to be old data as well so in many cases the data sort of ages down the tiers over time and its the old/least used portion of the data that grows.  Conceptually, the only tier you may have to expand is the lowest (ie: SATA) when you need more space.  This reduces the long term cost of data growth which is great.  But you still need to monitor the pools and expand them before they run out of space, or an outage may occur.  Most storage arrays have alerts and other methods to let you know that you will soon run out of space.

Risk-Free Thin Provisioning

What if the storage array had the ability automatically expand itself into a cloud storage provider, such as AT&T Synaptic, to prevent itself from running out of space?  Technically this is not much different from using the cloud as a tier all it’s own but I’m thinking about temporary use of a cloud provider versus long term.  The cloud provider becomes a buffer for times when the procurement process takes too long, or unexpected growth of data in the pool occurs.  With an automated tiering solution, this becomes relatively easy to do with fairly low impact on production performance.  In fact, I’d argue that you MUST have automated tiering to do this or the array wouldn’t have any method for determining what data it should move to the cloud.  Without that level of intelligence, you’d likely be moving hot data to the cloud which could heavily impact performance of the applications.

Once the customer is able to physically add storage to the pool to deal with the added data, the array would auto-adjust by bringing the data back from the cloud freeing up that space.  The cloud provider would only charge for the transfer of data in/out and the temporary use of space.  Storage reduction technologies like compression and de-duplication could be added to the cloud interface to improve performance for data stored in the cloud and reduce costs.  Zero detect and reclaim technologies could also be leveraged to keep LUNs thin over time as well as prevent the movement of zero’d blocks to the cloud.

Using cloud storage as a buffer for thin provisioning in this way could reduce the risk of using thin provisioning, increasing the utilization rate of the storage, and reducing the overall cost to store data.

What do you think?  Would you feel better about oversubscribing storage pools if you had a fully automated buffer, even if that buffer cost some amount of money in the event it was used?

Saving $90,000 per minute with VMAX Federated Live Migration

Posted on by

One of the customers I work with regularly has a set of key MS SQL databases totaling over 100TB in size which process transactions for their customer facing systems. If these databases are down, no transactions can be processed and customers looking to spend money will go elsewhere. The booking rate related to these databases is ~$90,000 per minute, all day, every day.

Today these databases live on Symmetrix DMX storage which has been very reliable and performant. As happens in an IT world, some of these Symmetrix systems are getting long in the tooth and we are working on refreshing several older DMX systems into fewer, newer VMAX systems. Aside from the efficiency and performance benefits of sub-LUN tiering (EMC FASTVP), and the higher scalability of VMAX vs DMX as well as pretty much any other storage array on the market, EMC has added another new feature to VMAX that is particularly advantageous to this particular customer — Federated Live Migration

Tech Refresh and Data Migrations:

Tech Refresh is a fact of life and data migrations, as a result are common place. The challenge with a data migration is finding a workable process and then shoehorning that process into existing SLAs and maintenance windows. In general, data migrations are disruptive in some way or another, and the level of disruption depends on the technology used and the type of migration. EMC has a long list of migration tools that cover our midrange storage systems, NAS systems, as well as high-end Symmetrix arrays. The downside with these and other vendors’ migration tools is that there is some level of downtime required.

There are 3 basic approaches:

  1. Highest Amount of Downtime:
    • Take system down, copy data, reconfigure system, bring system online
  2. Less Downtime:
    • Copy Data, Take system down, copy recent changes, reconfigure system, bring system online (EMC Open Replicator, EMCopy, RoboCopy, Rsync, EMC SANCopy, etc)
  3. Even Less Downtime:
    • Set up new storage to proxy old storage, reconfigure system, serve data while copying in the background. (EMC Celerra CDMS, EMC Symmetrix Open Replicator)
    • (Traditional virtualization systems like Hitachi USP/VSP, IBM SVC, etc are similar to this as well since you must take some level of downtime to get hosts configured through the virtualization layer, after which you can non-disruptively move data).

Common theme in all 3? Downtime!

Non-Disruptive Migrations are becoming more realistic:

A while ago now, EMC added a feature to PowerPath called Migration Enabler which is a way to non-disruptively migrate data between LUNs and/or Storage arrays from the host perspective. PowerPath Migration Enabler could also leverage storage based copy mechanisms and help with cut over. This is very helpful and I have multiple customers who have successfully migrated data with PPME. The challenge has been getting customers to upgrade to newer versions of PowerPath that include the PPME features they need for their specific environment. Software upgrades usually require some level of downtime, at least on a per-host basis, and we run into maintenance windows and SLAs again.

EMC Symmetrix Federated Live Migration (FLM)

For Symmetrix VMAX customers, EMC has added a new capability that could make data migrations easy as pie for many customers. With FLM, a VMAX can migrate data into itself from another storage array, and perform the host cutover, automatically, and non-disruptively. The VMAX does not have to be inserted in front of the other storage array, and there is no downtime required to reconfigure the host before or after the migration.

Federation is the key to non-disruptive migrations:

The way FLM works is really pretty simple. First, the host is connected to the new VMAX storage without removing the connectivity to the old storage. The VMAX is also connected to the old storage array directly (through the fabric in reality). The VMAX system then sets up a replication session for the devices owned by the host and begins copying data. This is all pretty straightforward. The smart stuff happens during the cut over process.

As you probably know, the VMAX is an Active/Active storage system, so all paths from a host to a LUN are active. Clariion, similar to most other midrange systems is an Active/Passive storage system. On Clariion, paths to the storage processor that owns the LUN are active, while paths to the non-owner are passive until needed. PowerPath and many other path management tools support both active/active and active/passive storage systems.

Symmetrix FLM leverages this active/passive support for the cut over. Essentially the old storage system is the “owning SP” before and during the migration. At cut over time, the VMAX essentially becomes the owning SP and it’s paths go active while the old paths go passive. PowerPath follows the “trespass” and the host keeps on chugging. To make this work, VMAX actually spoofs the LUN WWN and host ID, etc from the old array, so the host thinks it’s still talking to the old array. Filesystems and LVMs that signature and track LUNs based on WWN and/or Host ID are unaffected as a result. The cut over process is nothing more than a path change at the HBA level. The data copy and cut over are managed directly from the VMAX by an administrator.

Of course there are limitations… FLM initially supports only EMC Symmetrix DMX arrays as the source and requires PowerPath. From what I gather, this is not a technical limitation however, it’s just what EMC has tested and supports. Other EMC arrays will be supported later and I have no doubt it will support non-EMC arrays as a source as well.

Solving the $5 million problem:

Symmetrix Federated Live Migration became a signature feature in our VMAX discussions with the aforementioned customer because they know that for every minute their database is down, they lose $90,000 in revenue. Even with the fastest of the 3 traditional methods of migration, they would be down for up to an hour while 12 cluster nodes are rezoned to new storage, LUNs masked, drive letters assigned, etc. A reboot alone takes up to 30 minutes. 1 hour of downtime equates to $5,400,000 of lost revenue. By the way, that’s quite a bit more than the cost of the VMAX, and FLM is included at no additional license cost, so FLM just paid for the VMAX, and then some.