Category Archives: solutions

EMC’s New VNX Unified Storage Systems

Posted on by

Today, EMC announced the new VNX and VNXe Unified Storage platforms that merge the functionality of, and replaces, EMC’s popular Clariion and Celerra products.   VNX is faster, more scalable, more efficient, more flexible, and easier to manage than the platforms it replaces.

Key differences between CX4/NS and VNX:

  • VNX replaces the 4gb FC-Arbitrated Loop backend busses with 6gb SAS point-to-point switched backend.
    • Fast and Reliable
  • VNX supports both 3.5” and 2.5” SAS drives in EFD (SSD), SAS, and NearLine-SAS varieties.
    • Flexible and Efficient
  • VNX has more cache, more front-end ports, and faster CPUs
    • Fast and Flexible
  • VNX systems can manage larger FASTCache configurations.
    • Fast and Efficient
  • VNX builds on the management simplicity enhancements started in EMC Unisphere on CX4/NS by adding application aware provisioning.
    • Simple and Efficient
  • VNX allows you to start with Block-only or NAS-only and upgrade to Unified later if desired, or start with Unified at deployment.
    • Cost Effective and Flexible
  • VNX will support advanced data services like deduplication in addition to FASTVP, FASTCache, Block QoS, Compression, and other features already available in Clariion and Celerra.
    • Flexible and Efficient

Just as with every manufacturer, newer products take advantage of the latest technologies (faster Intel processors and SAS connectivity in this case,) but that’s only part of the story with VNX.

Earlier, I mentioned Application Aware Provisioning has been added to Unisphere:

Prior to Application Aware Unisphere, if tasked with provisioning storage for Microsoft Exchange (for example), a storage admin would take the mailbox count and size requirements, use best practices and formulas from Microsoft for calculating required IOPS, and then map that data to the storage vendors’ best practices to determine the best disk layout (RAID Type, Size, Speed, quantity, etc).  After all that was done, then the actual provisioning of RAID Groups and/or LUNs would be done.

Now with Application Aware Unisphere, the storage admin simply enters the mailbox count and size requirements into Unisphere and the rest is done automatically.  EMC has embedded the best practices from Microsoft, VMWare, and EMC into Unisphere and created simple wizards for provisioning Hyper-V, VMWare, NAS, and Microsoft Exchange storage using those best practices.

Combine Unisphere’s Application Aware Provisioning with the already included vCenter integration, and support for VMWare VAAI and you have a broad set of integration from the application layer down through to the storage system for optimum performance, simple and efficient provisioning, and unparalleled visibility.  This is especially useful for small to medium sized businesses with small IT departments.

EMC has also simplified licensing of advanced features on VNX.  Rather than licensing individual software products based on the exact features you want, VNX has 5 simple Feature Packs plus a few bundle packs.  The packs are created based on the overall purpose rather than the feature.  ie: Local Protection vs. Snapshots or Clones

  • FAST Suite includes FASTVP, FASTCache, Block QoS, and Unisphere Analyzer
  • Security and Compliance Pack includes File Level Retention for File and Block Encryption
  • Local Protection Pack includes Snapshots for block and file, full copy clones, and RecoverPoint/CDP
  • Remote Protection Pack includes Synchronous and Asynchronous replication for block and file as well as RecoverPoint/CRR for near-CDP remote replication of block and.or file data.
  • Application Protection Pack extends the application integration by adding Replication Manager for application integrated replication and Data Protection Advisor for SLA based replication monitoring and reporting.

You can also get the Total Protection Pack which includes Local Protection, Remote Protection, and Application Protection packs at a discounted cost or the Total Efficiency Pack which includes all five.  That’s it, there are no other software options for VNX/VNXe.  Compression and Deduplication are included in the base unit as well as SANCopy.  You will also find that the cost of these packs is extremely compelling once you talk with your EMC rep or favorite VAR.

So there you have it — powerful, simple and efficient storage, unified management, extensive data protection features, simplified licensing, and class leading functionality (FASTVP, FASTCache, Integrated CDP, Quality of Service for Block, etc) in a single platform.  That’s Unified, That’s EMC VNX.

I didn’t have time to touch on VNXe here but there is even more cool stuff going on there.  You can read more about these products here..

Using Cloud as a SAN Tier?

Posted on by

I came across this press release today from a company that I wasn’t familiar with and immediately wanted more information.  Cirtas Systems has announced support for Atmos-based clouds, including AT&T Synaptic Storage.  Whenever I see these types of announcements, I read on in hopes of seeing real fiber channel block storage leveraging cloud-based architectures in some way.  So far I’ve been a bit disappointed since the closest I’ve seen has been NAS based systems, at best including iSCSI.

Cirtas BlueJet Cloud Storage Controller is pretty interesting in its own right though.  It’s essentially an iSCSI storage array with a cache and a small amount of SSD and SAS drives for local storage.  Any data beyond the internal 5TB of usable capacity is stored in “the cloud” which can be an onsite Private Cloud (Atmos or Atmos/VE) and/or a Public Cloud hosted by Amazon S3, Iron Mountain, AT&T Synaptic, or any Atmos-based cloud service provider.

Cirtas BlueJet

The neat thing with BlueJet is that it leverages a ton of the functionality that many storage vendors have been developing recently such as data de-duplication, compression, some kind of block level tiering, and space efficient snapshots to improve performance and reduce the costs of cloud storage.  It seems that pretty much all of the local storage (SAS, SSD, and RAM) is used as a tiered cache for hot data.  This gives users and applications the sense of local SAN performance even while hosting the majority of data offsite.

While I haven’t seen or used a BlueJet device and can’t make any observations about performance or functionality, I believe this sort of block->cloud approach has pretty significant customer value.  It reduces physical datacenter costs for power and cooling, and it presents some rather interesting disaster recovery opportunities.

Similar to how Compellent’s signature feature, tiered block storage, has been added to more traditional storage arrays, I think modified implementations of Cirtas’ technology will inevitably come from the larger players, such as EMC, as a feature in standard storage arrays.  If you consider that EMC Unified Storage and EMC Symmetrix VMAX both have large caches and block- level tiering today, it’s not too much of a stretch to integrate Atmos directly into those storage systems as another tier.  EMC already does this for NAS with the EMC File Management Appliance.

Conceptual Diagram

I can imagine leveraging FASTCache and FASTVP to tier locally for the data that must be onsite for performance and/or compliance reasons and pushing cold/stale blocks off to the cloud.  Additionally, adding cloud as a tier to traditional storage arrays allows customers to leverage their existing investment in Storage, FC/FCoE networks, reporting and performance trending tools, extensive replication options available, and the existing support for VMWare APIs like SRM and VAAI.

With this model, replication of data for disaster recovery/avoidance only needs to be done for the onsite data since the cloud data could be accessed from anywhere.  At a DR site, a second storage system connects to the same cloud and can access the cold/stale data in the event of a disaster.

Another option would be adding this functionality to virtualization platforms like EMC VPLEX for active/active multi-site access to SAN data, while only needing to store the majority of the company’s data once in the cloud for lower cost.  Customers would no longer have to buy double the required capacity to implement a disaster recovery strategy.

I’m eagerly awating the implementation of cloud into traditional block storage and I can see how some vendors will be able to do this easily, while others may not have the architecture to integrate as easily.  It will be interesting to see how this plays out.

Can you Compress AND Dedupe? It Depends

Posted on by

My recent post about Compression vs Dedupe, which was sparked by Vaughn’s blog post about NetApp’s new compression feature, got me thinking more about the use of de-duplication and compression at the same time.  Can they work together?  What is the resulting effect on storage space savings?  What if we throw encryption of data into the mix as well?

What is Data De-Duplication?

De-duplication in the data storage context is a technology that finds duplicate patterns of data in chunks of blocks (sized from 4-128KB or so depending on implementation), stores each unique pattern only once, and uses reference pointers in order to reconstruct the original data when needed.  The net effect is a reduction in the amount of physical disk space consumed.

What is Data Compression?

Compression finds very small patterns in data (down to just a couple bytes or even bits at a time in some cases) and replaces those patterns with representative patterns that consume fewer bytes than the original pattern.  An extremely simple example would be replacing 1000 x “0”s with “0-1000”, reducing 1000 bytes to only 6.

Compression works on a more micro level, where de-duplication takes a slighty more macro view of the data.

What is Data Encryption?

In a very basic sense, encryption is a more advanced version of compression.  Rather than compare the original data to itself, encryption uses an input (a key) to compute new patterns from the original patterns, making the data impossible to understand if it is read without the matching key.

Encryption and Compression break De-Duplication

One of the interesting things about most compression and encryption algorithms is that if you run the same source data through an algorithm multiple times, the resulting encrypted/compressed data will be different each time.  This means that even if the source data has repeating patterns, the compressed and/or encrypted version of that data most likely does not.  So if you are using a technology that looks for repeating patterns of bytes in fairly large chunks 4-128KB, such as data de-duplication, compression and encryption both reduce the space savings significantly if not completely.

I see this problem a lot in backup environments with DataDomain customers.  When a customer encrypts or compresses the backup data before it gets through the backup application and into the DataDomain appliance, the space savings drops and many times the customer becomes frustrated by what they perceive as a failing technology.  A really common example is using Oracle RMAN or using SQL LightSpeed to compress database dumps prior to backing up with a traditional backup product (such as NetWorker or NetBackup).

Sure LightSpeed will compress the dump 95%, but every subsequent dump of the same database is unique data to a de-duplication engine and you will get little if any benefit from de-duplication.   If you leave the dump uncompressed, the de-duplication engine will find common patterns across multiple dumps and will usually achieve higher overall savings.  This gets even more important when you are trying to replicate backups over the WAN, since de-duplication also reduces replication traffic.

It all depends on the order

The truth is you CAN use de-duplication with compression, and even encryption.  They key is the order in which the data is processed by each algorithm.  Essentially, de-duplication must come first.  After data is processed by de-duplication, there is enough data in the resulting 4-128KB blocks to be compressed, and the resulting compressed data can be encrypted.  Similar to de-duplication, compression will have lackluster results with encrypted data, so encrypt last.

Original Data -> De-Dupe -> Compress -> Encrypt -> Store

There are good examples of this already;

EMC DataDomain – After incoming data has been de-duplicated, the DataDomain appliance compresses the blocks using a standard algorithm.  If you look at statistics on an average DDR appliance you’ll see 1.5-2X compression on top of the de-duplication savings.  DataDomain also offers an encryption option that encrypts the filesystem and does not affect the de-duplication or compression ratios achieved.

EMC Celerra NAS – Celerra De-Duplication combines single instance store with file level compression.  First, the Celerra hashes the files to find any duplicates, then removes the duplicates, replacing them with a pointer.  Then the remaining files are compressed.  If Celerra compressed the files first, the hash process would not be able to find duplicate files.

So what’s up with NetApp’s numbers?

Back to my earlier post on Dedupe vs. Compression; what is the deal with NetApp’s dedupe+compression numbers being mostly the same as with compression alone?  Well, I don’t know all of the details about the implementation of compression in ONTAP 8.0.1, but based on what I’ve been able to find, compression could be happening before de-duplication.  This would easily explain the storage savings graph that Vaughn provided in his blog.  Also, NetApp claims that ONTAP compression is inline, and we already know that ONTAP de-duplication is a post-process technology.  This suggests that compression is occurring during the initial writes, while de-duplication is coming along after the fact looking for duplicate 4KB blocks.  Maybe the de-duplication engine in ONTAP uncompresses the 4KB block before checking for duplicates but that would seem to increase CPU overhead on the filer unnecessarily.

Encryption before or after de-duplication/compression – What about compliance?

I make a recommendation here to encrypt data last, ie: after all data-reduction technologies have been applied.  However, the caveat is that for some customers, with some data, this is simply not possible.  If you must encrypt data end-to-end for compliance or business/national security reasons, then by all means, do it.  The unfortunate byproduct of that requirement is that you may get very little space savings on that data from de-duplication both in primary storage and in a backup environment.  This also affects WAN bandwidth when replicating since encrypted data is difficult to compress and accelerate as well.

Small Innovations Can Make a Big Difference

Posted on by

On Friday, my local gas/electric utility decided it was time to replace the gas meter and 40-year old steel gas pipe between the street and my house.  I had a chance to chat with the guys a bit while they were working and I learned about a small little innovation that not only makes their work easier, it provides better uptime for natural gas customers, and most likely saves lives.

It all started when I looked out the window and saw the large hole they’d jackhammered into my driveway.  At first I was a little worried about the jackhammer hitting the gas line but I they do 2-3 of these a day so I figure they must know what they are doing.  Then I saw them welding–in the hole!  And it turns out that they were literally welding ON the gas line.  So I naturally asked, “so you had to turn off the gas to whole street to do this?” to which they replied “nope, the gas is still flowing in there.”  Now some of you may know how this is achieved without large fireballs in peoples’ front yards but I was a little stunned at first.  So they explained the whole deal.  It turns out that the little innovation that allows them to weld a new pipe onto an in-service gas line is called a hot tap.  Actually a hot tap is made with several components– a flange, a valve, a few other accessories, and a hot tapping machine.

I couldn’t find a picture that showed the same hot tapping valve they used on my gas line but the following picture from http://www.flowserve.com gives you an idea of what it does…

 

Flowserve "NAVAL" Hot Tapping Valve

 

One line shows a completed hot tap in service, and the other shows the hot-tapping tool inserted with a hand drill to drive the cutter.

Basically, they weld the valve onto an existing pipe, along with a flange to better match the contours and add some “meat” to the fitting.  In the case of this picture, the hot tapping machine is inserted through the valve, sealing the opening in the valve itself, and the drill turns a magnetic cutter to cut into the working gas line.  The magnetism helps to retrieve the metal shavings from the cut.

Once the hole is complete, the hot tapping machine is backed out a bit, the valve is closed, and the machine is completely removed.  After that, you can attach a new pipe to the valve and open it up whenever you are ready.

The Pilchuck crew that was working on my line had an even fancier valve with a knob on top and a built-in cutter.  So after they welded it on, they just screwed it down to cut the hole and unscrewed once they attached the branch line.  Pretty slick since they didn’t need a separate tool to do the cut.

I was thinking about this whole process the next day and it occurred to me just how dangerous it would be to tap live gas lines.  And how the idea of a hot tap is really pretty simple, but it probably saves lives.  It also keeps service up for every other customer who shares the main pipeline while maintenance is performed, and I’m pretty sure it speeds up the work significantly over shutting down a gas line to cut it and inserting a T-fitting.

While I was looking for a suitable picture I found out that they do this same thing with large continental pipelines as well.  There are companies that will hot yap pipes over 100″ in diameter.

This is totally unrelated to storage but I thought it was interesting.

Compression better than Dedup? NetApp Confirms!

Posted on by

The more I talk with customers, the more I find that the technical details of how something works is much less important than the business outcome it achieves.  When it comes to storage, most customers just want a device that will provide the capacity and performance they need, at a price they can afford–and it better not be too complicated.  Pretty much any vendor trying to sell something will attempt to make their solution fit your needs even if they really don’t have the right products.  It’s a fact of life, sell what you have.  Along these lines, there has been a lot of back and forth between vendors about dedup vs. compression technology and which one solves customer problems best.

After snapshots and thin provisioning, data reduction technology in storage arrays has become a big focus in storage efficiency lately; and there are two primary methods of data reduction — compression and deduplication.

While EMC has been marketing compression technology for block and file data in Celerra, Unified, and Clariion storage systems, NetApp has been marketing deduplication as the technology of choice for block and file storage savings.  But which one is the best choice?  The short answer is.. it depends.  Some data types benefit most from deduplication while others get better savings with compression.

Currently, EMC supports file compression on all EMC Celerra NS20, 40, 80, 120, 480, 960, VG2, and VG8 systems running DART 5.6.47.x+ and block compression on all CX4 based arrays running FLARE30.x+.  In all cases, compression is enabled on a volume/LUN level with a simple check box and processing can be paused, resumed, and disabled completely, uncompressing the data if desired.  Data is compressed out-of-band and has no impact on writes, with minimal overhead on reads.  Any or all LUN(s) and/or Filesystem(s) can be compressed if desired even if they existed prior to upgrading the array to newer code levels.

With the release of OnTap 8.0.1, NetApp has added support for in-line compression within their FAS arrays.  It is enabled per-FlexVol and as far as I have been able to determine, cannot be disabled later (I’m sure Vaughn or another NetApp representative will correct me if I’m wrong here.)  Compression requires 64-bit aggregates which are new in OnTap 8, so FlexVols that existed prior to an upgrade to 8.x cannot be compressed without a data migration which could be disruptive.  Since compression is inline, it creates overhead in the FAS controller and could impact performance of reads and writes to the data.

Vaughn Stewart, of NetApp, expertly blogged today about the new compression feature, including some of the caveats involved, and to me the most interesting part of the post was the following graphic he included showing the space savings of compression vs. dedup for various data types.

Image Credit: Vaughn Stewart, NetApp

The first thing that struck me was how much better compression performed over deduplication for all but one data type (Virtualization will usually fare well because in a typical environment there are many VMs with the same operating system files).  In fact, according to NetApp, deduplication achieves very little savings, if any, for the majority of the data types here.
 
The light green bar indicates savings with both dedupe AND compression enabled on the same dataset.  In 5 out of 9 cases, dedup adds ZERO savings over compression alone.  I can’t help but wonder why anyone would enable dedup on those data types if they already had compression, since both features use storage array CPU resources to find and compress or dedup data.  I am aware that in some cases, dedup can improve performance on NetApp systems due to dedup-aware cache, but I also believe that any performance gain is directly related to the amount of duplication in the data.  Using this chart, virtualization is really the only place where dedup seems particularly effective and hence the only place where real performance gains would likely present themselves.
 
The challenge for NetApp customers will be getting their data into a configuration that supports compression due to the 64-bit aggregate requirement, lack of an easy and non-disruptive LUN migration feature (DataMotion appears to only support iSCSI and NFS and requires several additional licenses), and no way to convert an aggregate from 32-bit to 64-bit.  Once compression has been enabled, if there is truly no way to disable it, any resulting performance impact will be very difficult to rectify.
 
On the other hand, any EMC customer with current maintenance can upgrade their NS or CX4 array to newer versions of DART or FLARE, and compression can be enabled on any existing data after the fact.  If performance becomes an issue for a particular dataset once compressed, the data can be uncompressed later.  Both operations are completely non-disruptive and run in the background.  While block compression only works on LUNs in a virtual pool, as opposed to a traditional RAID group, enabling compression on a normal LUN will automatically migrate the LUN into a virtual pool, perform zero-page reclaim, followed by compression, and the entire process is completely non-disruptive to the application.  Oh, and compressed data can still be tiered with FASTVP across SSD, FC, and SATA disk and/or benefit from up to 2TB of FASTCache.
 
I admit that there is a place for deduplication as well as compression in reducing the footprint of customer data.  However, based on what I’ve seen in my career as an IT professional, and with my customers in my current role at EMC, there are more use cases for compression than there are for deduplication when it comes to primary data, whether SAN or NAS.  Either way, if I was using a new technology for the first time on a particular data set, whether compression or deduplication, I would definitely want a backout plan in case the drawbacks outweight the benefits.

Unified of the Beholder???

Posted on by

Apart from “The Cloud”, “Unified Storage” is the other big buzzword in the storage industry of late.  But what exactly is Unified Storage?

Mirriam-Webster defines unify as “to make into a unit or coherent whole

So how does this apply to storage systems?  If you look at marketing messages by EMC, NetApp, and other vendors you’ll find that they all use the term in different ways in order to fit nicely with the products they have.  Based on what I see, there are generally two different approaches.

Single HW/SW Stack Approach:

Some vendors want you to believe that the only way it can be called Unified Storage is if the same physical box and software stack provides all protocols and features, even if management of the single system is not perfectly cohesive.

NetApp’s FAS storage systems are an example of this strategy.  A single filer provides all services whether SAN or NAS, IP or FiberChannel.  However, a single HA cluster is actually managed as two separate systems, each cluster node is managed independently using independent FilerView instances and there are separate tools (NetApp System Manager, Operations Manager, Provisioning Manager, Protection Manager) that can bring all of the filer heads into one view.  Disks are captive to a specific filer head in a cluster and moving disks and/or volumes between filer heads is not seamless.

Single Point of Management Approach:

Others approach it more holistically and figure that as long as the customer manages it as a single system, it qualifies as “Unified”, even if there may be disparate hardware and software components providing the different services.  After all, once it’s installed you don’t really go in the datacenter to physically look at the hardware very often.

EMC’s Unified Storage (which is a combination of Celerra NAS and Clariion Block storage systems) is an example of this.  In a best-of-breed approach, EMC allows the Clariion backend to do what it does best, block storage via FC or IP, while the Celerra, which is purpose built for NAS, provides CIFS/NFS services while leveraging the disk capacity, processors, cache, and other features of the Clariion as a kind of offload engine.  Regardless of which services you use, all parts of the solution are managed from a single Unisphere instance, including other Clariions and/or Celerras in the environment.  Unisphere launches from any Clariion or Celerra management port, and regardless of which device you launch it from, all systems are manageable together.

Which approach is better?

I see advantages and disadvantages to both approaches, as a former admin of both NetApp and EMC storage, I feel that while NetApp’s hardware and software stack is unified, their management stack is decidedly un-unified.  EMC’s Unified storage is physically “integrated” to work together as a system, but the unifying feature is the management infrastructure built-in with Unisphere.

There are other advantages to EMCs approach as well.  For example, if a particular workload seems to hammer the CPUs on the NAS but the backend is not a bottleneck, more Celerra datamovers can be added to take advantage of the same backend disks and improve front end performance.  Likewise, the backend can be augmented as needed to improve performance, increase capacity, etc without having to scale up the front end NAS head.  With the NetApp approach, if your CPU or cache is stressed, you need to deploy more FAS systems (in pairs for HA) along with any required disks for that new system to store data.

Both approaches work, and both have their merits, but what do customers really want?

In my opinion, most customers don’t really care *how* the hardware works, so long as it DOES WORK, and is easy to manage.  In the grand scheme of things, if I, as an admin, can provision, replicate, snapshot, and clone storage across my entire environment, regardless of protocol,  from a “single pane of glass”, that is a strong positive.

EMC Unisphere makes it easy to do just that and it launches right from the array with no separate installation or servers required.  Unisphere can authenticate against Active Directory or LDAP and has role-based-administration built in.  And since Unisphere launches from any Clariion Storage processor or Celerra Control Station, there’s no single point of failure for storage management either.

So what do you think customers want?  If you are a customer, what do YOU want?

EMC Unified: Guaranteed Efficiency with Better Application Availability

Posted on by

(Warning: This is a long post…)

You have a critical application that you can’t afford to lose:

So you want to replicate your critical applications because they are, well, critical.   And you are looking at the top midrange storage vendors for a solution.  NetApp touts awesome efficiency, awesome snapshots, etc while EMC is throwing considerable weight behind it’s 20% Efficiency Guarantee.  While EMC guarantees to be 20% more efficient in any unified storage solution, there is perhaps no better scenario than a replication solution to prove it.

I’m going to describe a real-world scenario using Microsoft Exchange as the example application and show why the EMC Unified platform requires less storage, and less WAN bandwidth for replication, while maintaining the same or better application availability vs. a NetApp FAS solution.  The example will use a single Microsoft Exchange 2007 SP2 server with ten 100GB mail databases connected via FibreChannel to the storage array.  A second storage array exists in a remote site connected via IP to the primary site and a standby Exchange server is attached to that array.

Basic Assumptions:

  • 100GB per database, 1 database per storage group, 1 storage group per LUN, 130GB LUNs
  • 50GB Log LUNs, ensure enough space for extra log creation during maintenance, etc
  • 10% change rate per day average
  • Nightly backup truncates logs as required
  • Best Practices followed by all vendors
  • 1500 users (Heavy Users 0.4IOPS), 10% of users leverage Blackberry (BES Server = 4X IOPS per user)
  • Approximate IOPS requirement for Exchange: 780IOPS for this server.
  • EMC Solution: 2 x EMC Unified Storage systems with SnapView/SANCopy and Replication Manager
  • NetApp Solution: 2 x NetApp FAS Storage systems with SnapMirror and SnapManager for Exchange
  • RPO: 4 hours (remote site replication update frequency)

Based on those assumptions we have 10 x 130GB DB LUNs and 10 x 50GB Log LUNs and we need approximately 780 host IOPS 50/50 read/write from the backend storage array.

Disk IOPS calculation: (50/50 read/write)

  • RAID10, 780 host IOPS translates to 1170 disk IOPS (r+w*2)
  • RAID5, 780 host IOPS translates to 1950 disk IOPS (r+w*4)
  • RAIDDP is essentially RAID6 so we have about 2730 disk IOPS (r + w*6)

Note: NetApp can create sequential stripes on writes to improve write performance for RAIDDP but that advantage drops significantly as the volumes fill up and free space becomes fragmented which is extremely likely to happen after a few months or less of activity.

Assuming 15K FiberChannel drives can make 180 IOPS with reasonable latencies for a database we’d need:

  • RAID10, Database 6.5 disks (round up to 8), using 450GB 15K drives =  1.7TB usable (1 x 4+4)
  • RAID5, 10.8 disks for RAID5 (round up to 12), using 300GB 15K drives = 2.8TB usable (2 x 5+1)
  • RAID6/DP, 15.1 disks for RAID6 (round up to 16), using 300GB 15K drives = 3.9TB usable (1 x 14+2)

Log writes are highly cachable so we generally need fewer disks; for both the RAID10 and RAID5 EMC options we’ll use a single RAID1 1+1 raid group with 2 x 600GB 15K drives.  Since we can’t do RAID1 or RAID10 on NetApp we’ll have to use at least 3 disks (1 data and 2 parity) for the 500GB worth of Log LUNs but we’ll actually need more than that.

Picking a RAID Configuration and Sizing for snapshots:

For EMC, the RAID10 solution uses fewer disks and provides the most appropriate amount of disk space for LUNs vs. the RAID5 solution.  With the NetApp solution there really isn’t another alternative so we’ll stick with the 16 disk RAID-DP config.  We have loads of free space but we need some of that for snapshots which we’ll see next.  We also need to allocate more space to the Log disks for those snapshots.

Since we expect about 10% change per day in the databases (about 10GB per database) we’ll double that to be safe and plan for 20GB of changes per day per LUN (DB and Log).

NetApp arrays store snapshot data in the same volume (FlexVol) as the application data/LUN so you need to size the FlexVol’s and Aggregates appropriately.  We need 200GB for the DB LUNs and 200GB for the Log LUNs to cover our daily change rate but we’re doubling that to 400GB each to cover our 2 day contingency.  In the case of the DB LUNs the aggregate has more than enough space for the 400GB of snapshot data we are planning for but we need to add 400GB to the Log aggregate as well so we need 4 x 600GB 15K drives to cover the Exchange logs and snapshot data.

EMC Unified arrays store snapshot data for all LUNs in centralized location called the Reserve LUN Pool or RLP.  The RLP actually consists of a number of LUNs that can be used and released as needed by snapshot operations occurring across the entire array.  The RLP LUNs can be created on any number of disks, using any RAID type to handle various IO loads and sizing an RLP is based on the total change rate of all simultaneously active snapshots across the array.  Since we need 400GB of space in the Reserve LUN Pool for one day of changes, we’ll again be safe by doubling that to 800GB which we’ll provide with 6 dedicated 300GB 15K drives in RAID10.

At this point we have 20 disks on the NetApp array and 16 disks on the EMC array.  We have loads of free space in the primary database aggregate on the NetApp but we can’t use that free space because it’s sized for the IOPS workload we expect from the Exchange server.

In order to replicate this data to an alternate site, we’ll configure the appropriate tools.

EMC:

  1. Install Replication Manager on a server and deploy an agent to each Exchange server
  2. Configure SANCopy connectivity between the two arrays over the IP ports built-in to each array
  3. In Replication Manager, Configure a job that quiesces Exchange, then uses SANCopy to incrementally update a copy of the database and log LUNs on the remote array and schedule for every 4 hours using RM’s built in scheduler.

NetApp:

  1. Install SnapManager for Exchange on each Exchange server
  2. Configure SnapMirror connectivity betweeen the two arrays over the IP ports built-in to each array
  3. In SnapManager, Configure a backup job that quiesces Exchange and takes a Snapshot of the Exchange DBs and Logs, then starts a SnapMirror session to replicate the updated FlexVol (including the snapshot) to the remote array.  Configure a schedule in Windows Task Manager to run the backup job every 4 hours.

Both the EMC and NetApp solutions run on schedule, create remote copies, and everything runs fine, until...

Tuesday night during the weekly maintenance window, the Exchange admins decide to migrate half of the users from DB1, to DB2 and DB3 and half of the users from DB4, to DB5 and DB6.  About 80GB of data is moved (25GB to each of the target DBs.)  The transactions logs on DB1 and DB4 jump to almost 50GB, 35GB each on DB2, DB3, DB5, and DB6.

On the NetApp array, the 50GB log LUNs already have about 10GB of snapshot data stored and as the migration is happening, new snapshot data is tracked on all 6 of the affected DB and Log LUNs.  The 25GB of new data plus the 10GB of existing data exceeds the 20GB of free space in the FlexVol that each LUN is contained in and guess what…  Exchange chokes because it can no longer write to the LUNs.

There are workarounds: First, you enable automatic volume expansion for the FlexVols and automatic Snapshot deletion as a secondary fallback.  In the above scenario, the 6 affected FlexVols autoextend to approximately 100GB each equaling 300GB of snapshot data for those 6 LUNs and another 40GB for the remaining 4 LUNs.  There is only 60GB free in the aggregate for any additional snapshot data across all 10 LUNs.  Now, SnapMirror struggles to update the 1200GB of new data (application data + snapshot data) across the WAN link and as it falls behind more data changes on the production LUNs increasing the amount of snapshot data and the aggregate runs out of space.  By default, SnapMirror snapshots are not included in the “automatically delete snapshots” option so Exchange goes down.  You can set a flag to allow SnapMirror owned snapshots to be automatically deleted but then you have to resync the databases from scratch.  In order to prevent this problem from ever occurring, you need to size the aggregate to handle >100% change meaning more disks.

Consider how the EMC array handles this same scenario using SANCopy.  The same changes occur to the databases and approximately 600GB of data is changed across 12 LUNs (6 DB and 6 Log).  When the Replication Manager job starts, SANCopy takes a new snapshot of all of the blocks that just changed for purposes of the current update and begins to copy those changed blocks across the WAN.

EMC Advantages:

  • SANCopy/Inc is not tracking the changes that occur AS they occur, only while an update is in process so the Reserve LUN Pool is actually empty before the update job starts.  If you want additional snapshots on top of the ones used for replication, that will increase the amount of data in the Reserve LUN Pool for tracking changes, but snapshots are created on both arrays independently and the snapshot data is NOT replicated.  This nuance allows you to have different snapshot schedules in production vs. disaster recovery for example.
  • Because SANCopy/Inc only replicates the blocks that have changed on the production LUNs, NOT the snapshot data, it copies only half of the data across the WAN vs SnapMirror which reduces the time out of sync.  This translates to lower WAN utilization AND a better RPO.
  • IF an update was occurring when the maintenance took place, the amount of data put in the Reserve LUN pool would be approximately 600GB (leaving 200GB free for more changed data).  More efficient use of the Snapshot pool and more flexibility.
  • IF the Reserve LUN Pool ran out of space, the SANCopy update would fail but the production LUNs ARE NEVER AFFECTED.  Higher availability for the critical application that you devoted time and money to replicate.
  • Less spinning disk on the EMC array vs. the NetApp.

EMC has several replication products available that each act differently.  I used SANCopy because, combined with Replication Manager, it provides similar functionality to NetApp SnapMirror and SnapManager.  MirrorView/Async has the same advantages as SANCopy/Incremental in these scenarios and can replicate Exchange, SQL, and other applications without any host involvement.

Higher Application availability, lower WAN Utilization , Better RPO, Fewer Spinning Disks, without even leveraging advanced features for even better efficiency and performance.

Why pNFS can be a big deal even if NFS4.1 isn’t…

Posted on by

It’s been a little while since I’ve posted, mostly due to my life being turned on it’s rear after our first child was born 8 weeks ago.  As things start to settle into a rhythm (as much as is possible) I’ve been back online more, reading blogs, following Twitter, and working with customers regularly.  As some of you may know, EMC announced support for pNFS in Celerra with the release of DART 6.x and there have been several recent posts about the technology which piqued my interest a little.

The other bloggers have done a good job of describing what pNFS is and what is new in NFS4.1 itself so I won’t repeat all of that.  I want to focus specifically on pNFS and why it IS a big deal.

Prior to my coming to work for EMC, I worked in internal IT at company that deals with large binary files in support of product development, as well as video editing for marketing purposes.  I had a chance to evaluate, implement, and support multiple clustered file system technologies.  The first was for an HD video editing solution using Mac’s and we followed the likely path of implementing Apple’s XSAN solution which you may know is an OEM’d version of Quantum(ADIC) StorNext.  StorNext allows you to create large filesystems across many disks and access them as local disk on many clients.  File Open, Close, byte-range locking, etc are handled by MetaData Controllers (MDCs) across an IP network while the actual heavy lifting of read/write IO is done over FibreChannel from the clients to the storage directly.  All the shared filesystem benefits of NAS with the performance benefits of SAN.

The second project was specifically targeted at moving large files (4+GB each) through a workflow across many computers as quickly as possible so we could ship products.  Faster processing of the workflow translated to more completed projects per person/per day which meant better margins and keeping our partners and customers happy.  The workflow was already established, using Windows based computers and a file server.  The file server was running out of steam and the amount of data being stored at any given time had increased from 500GB to 8TB over the past 12 months.  We needed a simple way to increase the performance of the file server and also allow for better scalability.  Working with our local EMC SE, we tested and deployed MPFSi using a Celerra NS40 with integrated storage.

MPFS has been around a long time (also known as High Road) and works with Windows and various *nix based platforms.  It is similar to XSAN/StorNext in that open/close/locking activity is handled over IP by the metadata controller (the Celerra datamover in the case of MPFS) while the read/write IO is handled over block storage technology (MPFS supports FibreChannel and iSCSI connectivity to storage).  The advantage of MPFS over many other solutions is that the metadata controller and storage are all built-in to the EMC Celerra storage device and you don’t have to deploy any other servers.

In our case we chose iSCSI due to the cost of FC (switches and HBAs) and used the GigE ports on the Celerra’s CX3 backend for block connectivity.  In testing we showed that CIFS alone provided approximately 240mbps of throughput over GigE connections while enabling MPFSi netted about 750mbps, even if we used the same NIC on the client.  So we tripled throughput over the same LAN by installing a software client.  Had we gone the extra mile to deploy FibreChannel for the block IO we would have seen much higher throughput.

Even better, the use of MPFS did not preclude the use of NDMP for backup to tape directly from the Celerra, accelerating backup many times over the old fileserver.  For clients that did not have MPFS software installed, they accessed the same files over traditional CIFS with no problems.  Another side benefit of MPFS over traditional CIFS, is that the block I/O stack is much more efficient than the NAS I/O stack so even with increased throughput, CPU utilization is lower on the client returning cycles to the application which is doing work for your business.

There are many clustered file system / clustered NAS solutions on the market from a variety of vendors (StorNext, MPFS, GFS, Polyserve, etc) and most of these products are trying to solve the same basic problems of storing more data and increasing performance.  The problem is they are all proprietary and because of that you end up with multiple solutions deployed in the same company.  In our case we couldn’t use MPFS for the video editing solution because EMC has not provided a client for Mac OSX.  And this is where pNFS really becomes attractive.  Storage vendors and operating system vendors alike will be upgrading the already ubiquitous NFS stack in their code to support NFS4.1 and pNFS.  And that support means that I could deploy an EMC Celerra MPFS like solution using the same Celerra based storage, with no extra servers, and no special client software, just the native NFS client in my operating system of choice.  Perhaps Apple will include a pNFS capable client in a future version of Mac OSX.

If you look at the pNFS standard you’ll see that it supports the use of not only block storage, but object and file based storage as well.  So as we build out larger and larger environments and private clouds start to expand into public clouds you could tier your pNFS data across FiberChannel storage, object storage (think Atmos on premises), as well as out to a service provider cloud (ie: AT&T Synaptic).  Now you’ve dramatically increased performance for the data that needs it, saved money storing the data that you need to keep long term, and geographically dispersed the data that needs to be close to users, with a single protocol supported by most of the industry and a single point of management.

Personally I think pNFS could kill off proprietary solutions over the long run unless they include support for it in their products.

This is just my opinion of course…

Resiliency vs Redundancy: Using VPLEX for SQL HA

Posted on by

A little history on my philosophy around high-availability

Around the year 2000, when I was working in network operations for a large wireless telco, a very senior network architect explained to me the company’s philosophy on building high availability solutions into the network.  The phrase I remember from that conversation was “we don’t build redundant networks, we build resilient networks..” The difference is that while redundant networks failover to secondary paths to resume traffic, resilient networks don’t go down at all.  This concept has stuck with me ever since and I tend to tackle high-availability problems of all kinds with this idea in mind.  It’s frankly been very difficult to build solutions that are resilient across the entire stack, mostly because infrastructure technology hasn’t quite gotten there yet.

Things may have changed…

I recently had a meeting with a customer to discuss local high availability for SQL.  This customer has a very large multi-node clustered SQL environment (hundreds of TBs of data, hundreds of databases, hundreds of instances, many clusters, many nodes per cluster) and has been testing SQL database mirroring as an alternative to traditional Windows Failover Clustering.  The focus of the meeting wound up focused primarily on leveraging VPLEX as an alternative to SQL mirroring, and the reasons for that decision suddenly reminded me of the Resiliency vs Redundancy discussion I had years ago.  A VPLEX solution potentially solves the same problem as DB mirroring, does it with less complexity, and less risk.

VPLEX Local as a Resilient HA solution

One of the many features of VPLEX is it’s ability to mirror data across multiple storage arrays and present that mirror as a single LUN to the host.  For customers already running large multi-node MSCS clusters, the LUN appears just like any normal storage LUN and Windows/SQL treat the LUN normally.  There are several reasons VPLEX should be considered as an alternative to database mirroring. (much of this applies to Exchange CCR as well)

VPLEX hardware is inherently Resilient.  A VPLEX cluster is an N+1 cluster of loosely coupled nodes, cooperating with each other, but not depending on each other.  Hosts can access any of the hosted data, through any of the ports, on any of the cluster nodes.  If a node fails for any reason, the remaining nodes continue serving IO for any data.  Except for a dead path on the host side (managed by PowerPath or MPIO), there is no failover process, and no cache mirroring to worry about.  The potential performance impact of a failure is equal to 1, divided by the quantity of that component in the cluster. (128 x 8gbe ports across 8 director nodes for a large VPLEX Local cluster)

In addition, because VPLEX utilizes a write-through cache, there is never any dirty cache data (data in cache that has not been committed to disk) in a VPLEX system.  A power outage or VPLEX hardware failure does not put data at risk.

Other Advantages of using VPLEX over SQL Database Mirroring

Improved Performance:

  • Compared with SQL Database mirroring, VPLEX mirroring has significantly less impact on transaction performance for writes and can improve transaction performance in some cases due to the large read cache in the VPLEX directors. (Note: I am comparing to DB Mirroring in Full-Safety mode since the customer’s requirement was a zero-data-loss solution.)

Non-Disruptive Storage Failover:

  • In the event of a storage failure, SQL Mirroring must perform a cluster node failover which takes a few seconds at best, possibly disrupting applications.  VPLEX provides completely non-disruptive failover when a storage failure occurs.  (A server hardware failure still triggers a node failover as it would in any other failover clustering scenario.)

Less Management Overhead:

  • From a management perspective, using VPLEX instead of SQL Database mirroring gives the SQL DBAs fewer SQL instances and fewer moving parts to manage on a daily basis.  The storage team just presents a mirrored LUN from VPLEX to the cluster and it’s business as usual for the DBAs.
  • VPLEX also allows the storage team to non-disruptively migrate data between storage arrays behind VPLEX to balance load, perform hardware refreshes, resolve capacity problems.  VPLEX performs the migration at the direction of the storage admins.

Reduced Risk:

  • Reducing management complexity also reduces risk.  With a high number of database instances and db mirrors involved in a large environment like this one, the chance of one of those mirrors having a problem, or being configured incorrectly, is increased.  DBAs can rely on VPLEX mirroring all of the data, 24x7x365, even when host maintenance is being performed.

Reduced Cost:

  • When compared with the SQL Database Mirroring solution, the VPLEX solution reduced the number of physical servers needed in this environment, reducing cost enough to more than offset the cost of VPLEX itself.  Combined with reductions in soft costs, like reduced DBA management overhead, VPLEX will actually save them quite a bit of money, and increased uptime during storage refresh and maintenance will increase revenues in this case as well.

A Distributed Future:

  • Next year, when a second datacenter is online nearby, the first VPLEX Local cluster can be connected to another VPLEX cluster in the new datacenter.  Then the SQL cluster nodes and data can be distributed across both datacenters, providing protection from entire datacenter outages, or solving space constraints with no changes to the application or servers, and no downtime.

I wonder how many other customers would like to build more resilient infrastructures?

If you combine a VPLEX solution with a true cluster file system and an active-active database engine (ie: Oracle RAC), you can eliminate the disruption caused by server hardware failures.  It’s just a matter of time now until the entire stack can be designed for true resiliency with very little management overhead.  I can’t wait to see what happens.

The following EMC White Paper has a lot of good information about using VPLEX in this same context:

Workload Resiliency with EMC VPLEX

While EMC users benefit from Replication Manager, NetApp users NEED SnapManager

Posted on by

This is a follow up to my recent post NetApp and EMC: Replication Management Tools Comparison, in which I discussed the differences between EMC Replication Manager and NetApp SnapManager.

————

As a former customer of both NetApp and EMC, and now as an employee of EMC, I noticed a big difference between NetApp and EMC as far as marketing their replication management tools. As a customer, EMC talked about Replication Manager several times and we purchased it and deployed it. NetApp made SnapManager a very central part of their sales campaign, sometimes skipping any discussion of the underlying storage in favor of showing off SnapManager functionality. This is an extremely effective sales technique and NetApp sales teams are so good at this that many people don’t even realize that other vendors have similar, and in my opinion EMC has better, functionality.  One of the reasons for this difference in marketing strategy is that NetApp users NEED SnapManager, while EMC users do not always need Replication Manager.

The reason why is both simple and complex…

EMC storage arrays (Clariion, Symmetrix, RecoverPoint, Invista) all have one technology in common that NetApp Filers do not–Consistency Groups. A consistency group allows the storage system to take a snapshot of multiple LUNs simultaneously, so simultaneous in fact that all of the snapshots are at the exact same point in time down to the individual write. This means that, without taking any applications offline and without any orchestration software, EMC storage arrays can create crash-consistent copies of nearly any kind of data at any time.

The EMC Whitepaper “EMC CLARiiON Database Storage Solutions: Oracle 10g/11g with CLARiiON Storage Replication Consistency” downloadable from EMC’s website has the following explanation of consistency groups in general…

“…Consistent replication operates on multiple LUNs as a set such that if the replication action fails for one member in the set, replication for all other members of the set are canceled or stopped.  Thus the contents of all replicated LUNs in the set are guaranteed to be identical point-in-time replicas of their source and dependent-write consistency is maintained…”

“…With consistent replication, the database does not have to be shut down or put into “hot backup mode.”  Replicates created with SnapView or MV/S (or MV/A, Timefinder, SRDF, Recoverpoint, etc) consistency operations, without first quiescing or halting the application, are restartable point-in-time replicas of the production data and guaranteed to be dependent-write consistent.”

Consistency is important for any application that is writing to multiple LUNs at the same time such as SQL database and log volumes. SnapManager and Replication Manager actually prepare the application by quiescing the database during the snapshot creation process. This process creates “application-consistent” copies which are technically better for recovery compared with “storage-consistent” copies (also known as crash-consistent copies).

So, while I will acknowledge that quiescing the database during a snapshot/replication operation provides the best possible recovery image, that may not be realistic in some scenarios.  The first issue is that the actual operation of quiescing, snapping, checking the image, then pushing an update to a remote storage array takes some time.  Depending on the size of the dataset, this operation can take from several minutes to several hours to complete.  If you have a Recovery Point Objective (RPO) of 5 minutes or less, using either of these tools is pretty much a non-starter.

Another issue is one of application support.  EMC Replication Manager and NetApp SnapManager have very wide support for the most popular operating systems, filesystems, databases, and applications, they certainly don’t support every application.  A very simple example is a Novell Netware file server with a NSS pool/volume spanning multiple LUNs.  Neither NetApp nor EMC have support for Novell Netware in their replication management tools.  While you can certainly replicate all of the LUNs with NetApp SnapManager, SnapManager has no consistency technology built-in to keep the LUNs write-order consistent.  The secondary copy will appear completely corrupt to the Netware server if a recovery is attempted.  Through the use of consistency groups with MirrorView/Async, the replication of each LUN is tracked as a group and all of the LUNs are write-order consistent with each other, keeping the filesystem itself consistent.  You would need to have either array-level consistency technology, or support for Netware in the replication management tool in order to replication such a server..  Unfortunately, NetApp provides neither.

You may have complex applications that consist of Oracle and SQL databases, NTFS filesystems, and application servers running as VMs.  Using array-based consistency groups, you can replicate all of these components simultaneously and keep them all consistent with each other.  This way you won’t have transactions that normally affect two databases end up missing in one of the two after a recovery operation, even if those databases are different technologies (Oracle and MySQL, or PostgreSQL for example).

EMC Storage arrays provide consistency group technology for Snapshots and Replication in Clariion and Symmetrix storage arrays.  In fact, with Symmetrix, consistency groups can span multiple arrays without any host software.  By comparison, NetApp Filers do not have consistency group technology in the array.  Snapshots are taken (for local replicas and for SnapMirror) at the FlexVolume level.  Two FlexVolumes cannot be snapped consistently with each other without SnapManager.

There are a couple workarounds for NetApp users–you can snapshot an aggregate, but that is not recommended by NetApp for most customers, or you can put multiple LUNs in the same FlexVol, but that still limits you to 16TB of data including snapshot reserve space, and both options violate best practices for database designs of keeping data and logs in separate spindles for recovery.  Even with these workarounds, you cannot gain LUN consistency across the two controllers in an HA Filer pair, something the CLARiiON does natively, and can help for load balancing IO across the storage processors.

In general, I recommend that EMC customers use EMC Replication Manager and NetApp customers use SnapManager for the applications that are supported, and for most scenarios.  But when RPO’s are short, or the environment falls outside the support matrix for those tools, consistency groups become the best or only option.

Incidentally, with EMC RecoverPoint, you get the best of both worlds.  CDP or near-CDP replication of data using consistency groups for zero or near-zero RPOs plus application-consistent bookmarks made anytime the database is quiesced.  Recovery is done from the up-to-the-second version of the data, but if that data is not good for any reason, you can roll back to another point in time, including a point-in-time when the database was quiesced (a bookmark).

So, while EMC has, in Replication Manager, an equivalent offering to NetApp’s SnapManager, EMC customers are not required to use it, and in some cases they can achieve better results using array-based consistency technologies.