Tag Archives: virtual provisioning

Real World EMC FASTVP and FASTCache results!

Posted on by

I have a customer who just recently upgraded their EMC Celerra NS480 Unified Storage Array (based on Clariion CX4-480) to FLARE30 and enabled FASTCache across the array, as well as FASTVP automated tiering for a large amount of their block data.  Now that it’s been configured and the customer has performed a large amount of non-disruptive migrations of data from older RAID groups and VP pools into the newer FASTVP pool, including thick-to-thin conversions, I was able to get some performance data from their array and thought I’d share these results.

This is Real-World data

This is NOT some edge case where the customer’s workload is perfect for FASTCache and FASTVP and it’s also NOT a crazy configuration that would cost an arm and a leg.  This is a real production system running in a customer datacenter, with a few EFDs split between FASTCache and FASTVP and some SATA to augment capacity in the pool for their existing FC based LUNS.  These are REAL results that show how FASTVP has distributed the IO workload across all available disks and how a relatively small amount of FASTCache is absorbing a decent percentage of the total array workload.

This NS480 array has nearly 480 drives in total and has approximately 28TB of block data (I only counted consumed data on the thin LUNs) and about 100TB of NAS data.  Out of the 28TB of block LUNs, 20TB is in Virtual Pools, 14TB of which is in a single FASTVP Pool.  This array supports the customers’ ERP application, entire VMWare environment, SQL databases, and NAS shares simultaneously.

In this case FASTCache has been configured with just 183GB of usable capacity (4 x 100GB EFD disks) for the entire storage array (128TB of data) and is enabled for all LUNs and Pools.  The graphs here are from a 4 hour window of time after the very FIRST FASTVP re-allocation completed using only about 1 days’ worth of statistics.  Subsequent re-allocations in the FASTVP pool will tune the array even more.

FASTCache

First, let’s take a look at the array as a whole, here you can see that the array is processing approximately ~10,000 IOPS through the entire interval.

FASTCache is handling about 25% of the entire workload with just 4 disks.  I didn’t graph it here but the total array IO Response time through this window is averaging 2.5 ms.  The pools and RAID Groups on this array are almost all RAID5 and the read/write ratio averages 60/40 which is a bit write heavy for RAID5 environments, generally speaking.

If you’ve done any reading about EMC FASTCache, you probably know that it is a read/write cache.  Let’s take a look at the write load of the array and see how much of that write load FASTCache is handling.  In the following graph you can see that out of the ~10,000 total IOPS, the array is averaging about 2500-3500 write IOPS with FASTCache handling about 1500 of that total.

That means FASTCache is reducing the back-end writes to disk by about 50% on this system.  On the NS480/CX4-480, FASTCache can be configured with up to 800GB usable capacity, so this array could see higher overall performance if needed by augmenting FASTCache further.  Installing and upgrading FASTCache is non-disruptive so you can start with a small amount and upgrade later if needed.

FASTVP and FASTCache Together

Next, we’ll drill down to the FASTVP pool which contains 190 total disks (5 x EFD, 170 x FC, and 15 x SATA).  There is no maximum number of drives in a Virtual Pool on FLARE30 so this pool could easily be much larger if desired.  I’ve graphed the IOPS-per-tier as well as the FASTCache IOPS associated with just this pool in a stacked graph to give an idea of total throughput for the pool as well as the individual tiers.

The pool is servicing between 5,000 and 8,000 IOPS on average which is about half of the total array workload.  In case you didn’t already know, FASTVP and FASTCache work together to make sure that data is not duplicated in EFDs.  If data has been promoted to the EFD tier in a pool, it will not be promoted to FASTCache, and vise-versa.  As a result of this intelligence, FASTCache acceleration is additive to an EFD-enabled FASTVP pool.   Here you can see that the EFD tier and FASTCache combined are servicing about 25-40% of the total workload, the FC tier another 40-50%, and the SATA tier services the remaining IOPS.  Keep in mind that FASTCache is accelerating IO for other Pools and RAID Group LUNs in addition to this one, so it’s not dedicated to just this pool (although that is configurable.)

FASTVP IO Distribution

Lastly, to illustrate FASTVP’s effect on IO distribution at the physical disk layer, I’ve broken down IOPS-per-spindle-per-tier for this pool as well.  You can see that the FC disks are servicing relatively low IO and have plenty of head room available while the EFD disks, also not being stretched to their limits, are servicing vastly more IOPS per spindle, as expected.  The other thing you may have noticed here is that the EFDs are seeing the majority of the workload’s volatility, while the FC and SATA disks have a pretty flat workload over time.  This illustrates that FASTVP has placed the more bursty workloads on EFD where they can be serviced more effectively.

Hopefully you can see here how a very small amount of EFDs used with both FASTCache and FASTVP can relieve a significant portion of the workload from the rest of the disks.  FASTCache on this system adds up to only 0.14% of the total data set size and the EFD tier in the FASTVP pool only accounts for 2.6% of the total dataset in that pool.

What do you think of these results?  Have you added FASTCache and/or FASTVP to your array?  If so, what were your results?

Capacity vs. Performance : Thin Provisioning

Posted on by

In my previous post, where I discussed the problem of unusable (or slack) disk space on a SAN, I promised a follow-up with techniques on how to increase storage utilization.  I realized that I should discuss some related technologies first and then follow that up with how to put it all together.  So today I start by talking about Thin Provisioning.  I will then follow up with an explanation of De-Duplication and finally talk about how to use multiple technologies together to get the most use out of your storage.

So what is Thin Provisioning?  It is a technology that allows you to create LUNs or Volumes on a storage device such that the LUN/Volume(s) appear to the host or client to be larger than they actually are.   In general, NAS clients and SAN attached hosts see “Thin Provisioned” LUNs just as they see any other LUN but the actual amount of disk space used on the storage device can be significantly smaller than the provisioned size.  How does this help increase storage utilization?  Well, with thin provisioning you provide applications with exactly the storage they want and/or need but you don’t have to purchase all of the disk capacity up front.

Let’s start with a comparison of using standard LUNs vs thin LUNs with a theoretical application set:

Say we have 3 servers, each running Windows Server.  The operating system partition is on local disk and application data drives are on SAN.  Each server runs an application that collects and stores data over time and the application owner expects that over the next year or so the data will grow to 1TB on each server.  In this particular case we also know that the application’s performance requirements are relatively low.

With traditional provisioning we might create 3 LUNs that are 1TB each and present them to the servers.  This provides the application with room for the expected growth.  Using 300GB FC disks we can carve out three 4+1 RAID5 sets, create one LUN in each and it would work fine.  Alternatively we could use wide striping (ie: a MetaLUN on EMC Clariion) and put all three LUNs on the same 15 disks.  Either way we’ve just burned 15 disks on the storage array based on uncertain future requirements.  If we were stingier with storage we could create smaller LUNs (500GB for example) and use LUN expansion technology to increase the size when the application data fills the disk to that capacity.

In the Thin Provisioning world we still create three 1TB LUNs but they would start out by taking no space.  The pool of disk that the LUNs get provisioned from doesn’t even need to have 3TB of capacity.  As the application data grows over the next 12 months or longer the pool size only needs to grow to accommodate the actual amount of data stored.  Depending on the storage array, we can add disks to the pool one at a time.  So on day one we start with 3 disks in the pool, and then add additional disks one by one throughout the year.  We can then create additional LUNs for other applications without adding disks.  As we add disks to the pool, we expand the capacity available for all of the LUNs to grow (up to each LUN’s maximum size) and we increase performance for ALL of the LUNs in the pool since we are adding spindles.  The real-world benefits come as we consolidate numerous LUNs into a single disk pool.

The nice thing about this approach is that we stop managing the size of individual LUNs and just manage the underlying disk pool as a whole.   And the cost-per-GB for SAN disk constantly goes down so we can spend only what we have to today, and when we add more later it will likely be a little cheaper.  Disk capacity utilization will be much higher in a thin model compared with the traditional/thick model.

The story gets even better in a virtual server environment such as with MS Hyper-V or VMWare ESX.  First, the virtual server OS drives are on the SAN in addition to the application data, and there can be multiple virtual disks on the same LUN.  Whether physical or virtual, we need to maintain some free space in the disks to keep applications running, plus with virtual systems we need some free space on the LUN for features of the virtualization technology like snapshots.  The net effect is that in a virtualized environment, disk utilization never gets much above 50% when slack space at both the virtual layer and inside the virtual servers is considered.  With thin provisioning we could potentially store twice the number of virtual servers on the same physical disks.

There are caveats of course.  Maintaining performance is the primary concern.  Whether used in a thick LUN or thin LUN, each disk has a specific amount of performance.   Thin provisioning has no effect on the amount of IOPS or bandwidth the application requires nor the amount of IOPS the physical disk can handle.  So even if thin provisioning saves 50% disk space in your environment, you may not be able to use all of that reclaimed space before running into performance bottlenecks.  If the storage array has QOS features (ie: EMC Clariion NQM) it is possible to prioritize the more important applications in your disk pool to maintain performance where it matters.

Other problems that you may encounter have to do with interoperability.  For starters, some applications are not “thin-friendly”; ie: they write data in such a way as to negate any benefit that thin provisioning provides.  Also, while many storage arrays support thin provisioning, each has different rules about the use of thin LUNs.  For example, in some scenarios you can’t replicate thin LUNs using native array tools.  It pays to do your homework before choosing a new storage array or implementing thin provisioning.

I didn’t cover thin provisioning in NAS environments directly but the feature works in the same manner.  Thin volumes are provisioned from pools of storage and users/clients see a large amount of available disk space even if the disk pool itself is very small.  Since NAS is traditionally used for user home directories and departmental shares, absolute performance is usually not as much of a concern so thin provisioning is much easier to implement and in many cases is the default behavior or simply a check box on NAS appliances like EMC Celerra or NetApp FAS.

Thin provisioning is a powerful technology when used where it makes sense.  In my next post I’ll explain de-duplication technology and then talk about how these technologies can be used together plus some workarounds for the caveats that I’ve mentioned.