Tag Archives: cache

Does EMC FASTCache work with Exchange?

Posted on by

Short Answer: Yes!

In my dealings with customers I’ve been requesting performance data from their storage systems whenever I can to see how different applications and environments react to new features. Today I’m going to give you some more real-world data, straight from a customer’s production EMC NS480.

I’ve pulled various stats out of Analyzer for this customer’s Exchange server, which has 3 mail databases totaling about 1TB of mail stored on the NS480 via FibreChannel connect. Since this customer is not extremely large (similar to most of our customers) they are using this NS480 for pretty much everything from VMWare, SQL, and Exchange, to NAS, web/app content, and Business Intelligence systems. There is about 30TB of block data and another 100TB of NAS data. FASTCache is enabled for all LUNs and Pools with just 183GB of usable FASTCache space (4 x 100GB SSDs). So in this environment, with a modest amount of FASTCache and very mixed workload, how does Exchange fare?

Let’s first take a look at the Exchange workload itself for a 24 hour period: (Note: There were no reads from the Exchange log LUNs to speak of so I left that out of this analysis.)

Total Read IOPS for the 3 databases: (the largest peak is a result of database maintenance jobs and the smaller peaks are due to backup jobs) Here it’s tough to see due to the maintenance and backup peaks, but production IO during the work day is about 200-400IOPS. By the way, a source-deduplicating incremental-forever backup technology, such as Avamar, could drastically reduce the IO Load and duration of the nightly backup

Total Write IOPS for the 3 databases: Obviously more changes to the database occurring during the work day.

Total Write IOPS for the 3 Log files: Log data is typically cached easily in the SP cache so FAST Cache isn’t terribly required here but I’m including it to show whether there is any value to using FASTCache with Exchange logs.

Now let’s look at the FASTCache hit ratios for this same set of data: (average of all 3 DBs)

First, the Read Activity: Here you can see that aside from the maintenance and backup jobs, FASTCache is servicing 70-90% of the Read IOPs. Keep in mind that a FASTCache miss could still be a Cache Hit if the data is in SP Cache. What’s interesting about this is that it looks like the nightly maintenance job is pushing the highest load.

And the Write Activity: The beauty of EMC’s FASTCache implementation being a read/write cache, the benefit extends beyond just read IO. Here you see that FASTCache is servicing 60-80% of the writes for these Exchange Databases. That’s a huge load off the backend disks.

And the Log Writes: Since Log writes are usually not a performance problem, I would say that FASTCache is not necessary here, and the average 30% hit ratio shown here is not great. If you wanted to spend the time to tune FASTCache a bit, you might consider disabling FASTCache for Log LUNs to devote the FASTCache capacity to more cache friendly workloads.

All in all you can see that for the database data, FASTCache is servicing a significant portion of the user generated workload, reducing the backend disk load and improving overall performance.

Hopefully this gives you a sense of what FASTCache could do for your Exchange environment, reducing backend disk workload for reads AND writes. I must reiterate, since an SP Cache hit is shown as a FASTCache miss, an 80% FASTCache hit ratio does not mean that 20% of the IOs are hitting disk. To illustrate this, I’ve graphed the sum of SP Cache Hits and FAST Cache Hits for a single database. You can see that in many cases we’re hitting a total of 100% cache hits.

Most interesting is the backup window where SP Cache is really handling a huge amount of the load. This is actually due to the Prefetch algorithms kicking in for the sequential read profile of a backup, something CX/VNX is very good at.

Real World EMC FASTVP and FASTCache results!

Posted on by

I have a customer who just recently upgraded their EMC Celerra NS480 Unified Storage Array (based on Clariion CX4-480) to FLARE30 and enabled FASTCache across the array, as well as FASTVP automated tiering for a large amount of their block data.  Now that it’s been configured and the customer has performed a large amount of non-disruptive migrations of data from older RAID groups and VP pools into the newer FASTVP pool, including thick-to-thin conversions, I was able to get some performance data from their array and thought I’d share these results.

This is Real-World data

This is NOT some edge case where the customer’s workload is perfect for FASTCache and FASTVP and it’s also NOT a crazy configuration that would cost an arm and a leg.  This is a real production system running in a customer datacenter, with a few EFDs split between FASTCache and FASTVP and some SATA to augment capacity in the pool for their existing FC based LUNS.  These are REAL results that show how FASTVP has distributed the IO workload across all available disks and how a relatively small amount of FASTCache is absorbing a decent percentage of the total array workload.

This NS480 array has nearly 480 drives in total and has approximately 28TB of block data (I only counted consumed data on the thin LUNs) and about 100TB of NAS data.  Out of the 28TB of block LUNs, 20TB is in Virtual Pools, 14TB of which is in a single FASTVP Pool.  This array supports the customers’ ERP application, entire VMWare environment, SQL databases, and NAS shares simultaneously.

In this case FASTCache has been configured with just 183GB of usable capacity (4 x 100GB EFD disks) for the entire storage array (128TB of data) and is enabled for all LUNs and Pools.  The graphs here are from a 4 hour window of time after the very FIRST FASTVP re-allocation completed using only about 1 days’ worth of statistics.  Subsequent re-allocations in the FASTVP pool will tune the array even more.

FASTCache

First, let’s take a look at the array as a whole, here you can see that the array is processing approximately ~10,000 IOPS through the entire interval.

FASTCache is handling about 25% of the entire workload with just 4 disks.  I didn’t graph it here but the total array IO Response time through this window is averaging 2.5 ms.  The pools and RAID Groups on this array are almost all RAID5 and the read/write ratio averages 60/40 which is a bit write heavy for RAID5 environments, generally speaking.

If you’ve done any reading about EMC FASTCache, you probably know that it is a read/write cache.  Let’s take a look at the write load of the array and see how much of that write load FASTCache is handling.  In the following graph you can see that out of the ~10,000 total IOPS, the array is averaging about 2500-3500 write IOPS with FASTCache handling about 1500 of that total.

That means FASTCache is reducing the back-end writes to disk by about 50% on this system.  On the NS480/CX4-480, FASTCache can be configured with up to 800GB usable capacity, so this array could see higher overall performance if needed by augmenting FASTCache further.  Installing and upgrading FASTCache is non-disruptive so you can start with a small amount and upgrade later if needed.

FASTVP and FASTCache Together

Next, we’ll drill down to the FASTVP pool which contains 190 total disks (5 x EFD, 170 x FC, and 15 x SATA).  There is no maximum number of drives in a Virtual Pool on FLARE30 so this pool could easily be much larger if desired.  I’ve graphed the IOPS-per-tier as well as the FASTCache IOPS associated with just this pool in a stacked graph to give an idea of total throughput for the pool as well as the individual tiers.

The pool is servicing between 5,000 and 8,000 IOPS on average which is about half of the total array workload.  In case you didn’t already know, FASTVP and FASTCache work together to make sure that data is not duplicated in EFDs.  If data has been promoted to the EFD tier in a pool, it will not be promoted to FASTCache, and vise-versa.  As a result of this intelligence, FASTCache acceleration is additive to an EFD-enabled FASTVP pool.   Here you can see that the EFD tier and FASTCache combined are servicing about 25-40% of the total workload, the FC tier another 40-50%, and the SATA tier services the remaining IOPS.  Keep in mind that FASTCache is accelerating IO for other Pools and RAID Group LUNs in addition to this one, so it’s not dedicated to just this pool (although that is configurable.)

FASTVP IO Distribution

Lastly, to illustrate FASTVP’s effect on IO distribution at the physical disk layer, I’ve broken down IOPS-per-spindle-per-tier for this pool as well.  You can see that the FC disks are servicing relatively low IO and have plenty of head room available while the EFD disks, also not being stretched to their limits, are servicing vastly more IOPS per spindle, as expected.  The other thing you may have noticed here is that the EFDs are seeing the majority of the workload’s volatility, while the FC and SATA disks have a pretty flat workload over time.  This illustrates that FASTVP has placed the more bursty workloads on EFD where they can be serviced more effectively.

Hopefully you can see here how a very small amount of EFDs used with both FASTCache and FASTVP can relieve a significant portion of the workload from the rest of the disks.  FASTCache on this system adds up to only 0.14% of the total data set size and the EFD tier in the FASTVP pool only accounts for 2.6% of the total dataset in that pool.

What do you think of these results?  Have you added FASTCache and/or FASTVP to your array?  If so, what were your results?

Lies, Damn Lies, and Marketing…

Posted on by

Yesterday, In his blog posted entitled “Myth Busting: Storage Guarantees“, Vaughn Stewart from NetApp blogged about the EMC 20% Guarantee and posted a chart of storage efficiency features from EMC and NetApp platforms to illustrate his point.  Chuck Hollis from EMC called it “chartsmithing” in comment but didn’t elaborate specifically on the charts deficiencies.  Well allow me to take that ball…

As presented, Vaughn’s chart (below) is technically factual (with one exception which I’ll note), but it plays on the human emotion of Good vs Bad (Green vs Red) by attempting to show more Red on EMC products than there should be.

The first and biggest problem is the chart compares EMC Symmetrix and EMC Clariion dedicated-block storage arrays with NetApp FAS, EMC Celerra, and NetApp vSeries which are all Unified storage systems or gateways.  Rather than put n/a or leave the field blank for NAS features on the block-only arrays, the chart shows a resounding and red NO, leading the reader to assume that the feature should be there but somehow EMC left it out.

As far as keeping things factual, some of the EMC and NetApp features in this chart are not necessarily shipping today (very soon though, and since it affects both vendors I’ll allow it here).  And I must make a correction with respect to EMC Symmetrix and Space Reclamation, which IS available on Symm today.

I’ve taken the liberty of massaging Vaughn’s chart to provide a more balanced view of the feature comparison.  I’ve also added EMC Celerra gateway on Symmetrix to the comparison as well as an additional data point which I felt was important to include.

I’ve included some footnotes in the chart to explain some of the results but I’ll explain a little here as well.

1.) I removed the block only EMC configuration devices because the NetApp devices in the comparison are Unified systems.

2.) I removed the SAN data row for Single Instance storage because Single Instance (identical file) data reduction technology is inherently NAS related.

3.) Zero Space Reclamation is a feature available in Symmetrix storage.  In Clariion, the Compression feature can provide a similar result since zero pages are compressible.

I left the 3 different data reduction techniques as individually listed even though the goal of all of them is to save disk space.  Depending on the data types, each method has strengths and weaknesses.

One question, if a bug in OnTap causes a vSeries to lose access to the disk on a Symmetrix during an online Enginuity upgrade, who do you call?  How would you know ahead of time if EMC hasn’t validated vSeries on Symmetrix like EMC does with many other operating systems/hosts/applications in eLab?

The goal if my post here really is to show how the same data can be presented in different ways to give readers a different impression.  I won’t get into too much as far as technical differences between the products, like how comparing FAS to Symmetrix is like comparing a box truck to a freight train, or how fronting an N+1 loosely coupled clustered, global cached, high-end storage array with a midrange dual-controller gateway for block data might not be in a customer’s best interest.

What do you think?

EMC Unified: The benefit of having options

Posted on by

I’ve been having some fun discussions with one of my customers recently about how to tackle various application problems within the storage environment and it got me thinking about the value of having “options”.  This customer has an EMC Celerra Unified Storage Array that has Fiber Channel, iSCSI, NFS, and CIFS protocols enabled.  This single storage system supports VMWare, SQL, Web, Business Intelligence, and many custom applications.

The discussion was specifically centered on ensuring adequate storage performance for several different applications, each with a different type of workload…

1.)  Web Servers – Primarily VMs with general-purpose IO loads and low write ratios.

2.)  SQL Servers – Physical and Virtual machines with 30-40% write ratios and low latency requirements.

3.)  Custom Application  – A custom application database with 100% random read profiles running across 50 servers.

The EMC Unified solution:

EMC Storage already sports virtual provisioning in order to provision LUNs from large pools of disk to improve overall performance and reduce complexity.  In addition, QoS features in the array can be used to provide guaranteed levels of performance for specific datasets by specifying minimum and maximum bandwidth, response time, and IO requirements on a per-LUN basis.  This can help alleviate disk contention when many LUNs share the same disks, as in a virtual pool.  Enterprise Flash Drives (EFD) are also available for EMC Storage arrays to provide extremely high performance to applications that require it and they can coexist with FC and SATA drives in the same array.  Read and write cache can also be tuned at an array and LUN level to help with specific workloads.  With the updates to the EMC Unified Platform that I discussed previously, Sub-LUN FAST (auto tiering), and FAST Cache (EFD used as array cache) will be available to existing customers after a simple, non-disruptive, microcode upgrade, providing two new ways to tackle these issues.

So which feature should my customer use to address their 3 different applications?

Sub-LUN FAST (Fully Automated Storage Tiering)

Put all of the data into large Virtual Provisioning pools on the array, add a few EFD (SSD) and SATA disks to the mix and enable FAST to automatically move the blocks to the appropriate tier of storage.  Over time the workload would even out across the various tiers and performance would increase for all of the workloads with much fewer drives, saving on power, floor space, cooling, and potentially disk cost depending on the configuration.  This happens non-disruptively in the background.  Seems like a no-brainer right?

For this customer, FAST helps the web server VMs and the general-purpose SQL databases where the workload is predominately read and much of the same data is being accessed repeatedly (high locality of reference).   As long as the blocks being accessed most often are generally the same, day-to-day, automated tiering (FAST) is a great solution.  But what if the workload is much more random?  FAST would want to push all of the data into EFD, which generally wouldn’t be possible due to capacity requirements.  Okay, so tiering won’t solve all of their problems.  What about FAST Cache?

FAST Cache

Exponentially increase the size of the storage array’s read AND write cache with EFD (SSD) disks.  This would improve performance across the entire array for all “cache friendly” applications.

For this customer, increasing the size of write cache definitely helps performance for SQL (50% increase in TPM, 50% better response time as an example) but what about their custom database that is 100% random read?  Increasing the size of read cache will help get more data into cache and reduce the need to go to disk for reads, but the more random the data, the less useful cache is.   Okay, so very large caches won’t solve all of their problems.   EFDs must be the answer right?

EFD Disks

Forget SATA and FC disks; just use EFD for everything and it will be super fast!!   EFD has extremely high random read/write performance, low latency at high loads, and very high bandwidth.  You will even save money on power and cooling.

The total amount of data this customer is dealing with in these three applications alone exceeds 20TB.  To store that much in EFD would be cost prohibitive to say the least.  So, while EFD can solve all of this customer’s technical problems, they couldn’t afford to acquire enough EFD for the capacity requirements.

But wait, it’s not OR, it’s AND

The beauty of the EMC Unified solution is that you can use all of these technologies, together, on the same array, simultaneously.

In this customer’s case, we put FC and SATA into a virtual pool with FAST enabled and provision the web and general-purpose SQL servers from it.  FAST will eventually migrate the least used blocks to SATA, freeing the FC disks for the more demanding blocks.

Next, we extend the array cache using a couple EFDs and FAST Cache to help with random read, sequential pre-fetching, and bursty writes across the whole array.

Finally, for the custom 100% random read database, we dedicate a few EFDs to just that application, snapshot the DB and present copies to each server.  We disable read and write cache for the EFD backed volumes which leaves more cache available to the rest of the applications on the array, further improving total system performance.

Now, if and when the customer starts to see disk contention in the virtual pool that might affect performance of the general-purpose SQL databases, QoS can be tuned to ensure low response times on just the SQL volumes ensuring consistent performance.  If the disks become saturated to the point where QoS cannot maintain the response time or the other LUNs are suffering from load generated by SQL, any of the volumes can be migrated (non-disruptively) to a different virtual pool in the array to reduce disk contention.

Options

If you look at offerings from the various storage vendors, many promote large virtual pools, some also promote large caches of some kind, others promote block level tiering, and a few promote EFD (aka SSDs) to solve performance problems.  But, when you are consolidating multiple workloads into a single platform, you will discover that there are weaknesses in every one of those features and you are going to wish you had the option to use most or all of those features together.

You have that option on EMC Unified.