Tag Archives: encryption

If You Are Using SSDs, You Should Be Encrypting

Posted on by

I saw the following article come across Twitter today.

http://www.zdnet.com/blog/storage/ssd-security-the-worst-of-all-worlds/1326

In it, Robin Harris describes the issues around data recovery and secure erasure specific to SSD disks.  In layman’s terms, since SSDs do all sorts of fancy things with writes to increase longevity and performance, disk erasure is nearly impossible using normal methods, and forensic or malicious data recovery is quite easy.  So if you have sensitive data being stored on SSDs, that data is at risk of being read by someone, some day, in the future.  It seems that pretty much the only way to mitigate this risk is to use encryption at some level outside the SSD disk itself.

Did you know that EMC Symmetrix VMAX offers data-at-rest encryption that is completely transparent to hosts and applications, and has no performance impact?  With Symmetrix D@RE, each individual disk is encrypted with a unique key, managed by a built-in RSA key manager, so disks are unreadable if removed from the array.   Since the data is encrypted as the VMAX is writing to the physical disk, attempting to read data off an individual disk without the key is pointless, even for SSD disks.

The beauty of this feature is that it’s set-it-and-forget it.  No management needed, it’s enabled during installation and that’s it.  All disks are encrypted, all the time.

  • Ready to decomm an old array and return it, trade it, or sell it?  Destroy the keys and the data is gone.  No need for an expensive Data Erasure professional services engagement.
  • Failed disk replaced by your vendor?  No need for special arrangements with your vendor to keep those disks onsite, or certify erasure of a disk every time one is replaced.  The key stays with the array and the data on that disk is unreadable.

If you have to comply with PCI and/or other compliance rules that require secure erasure of disks, you should consider putting that data on a VMAX with data-at-rest encryption.

Now, What if you have an existing EMC storage system and the same need to encrypt data?  You can encrypt at the volume level with PowerPath Encryption.  PowerPath encrypts the data at the host with a unique key managed by an RSA Key Manager.  And it works with the non-EMC arrays that PowerPath supports as well.

Under normal circumstances, PowerPath Encryption does have some level of performance impact to the host however HBA vendors, such as Emulex, are now offering HBAs with encryption offload that works with PowerPath.  If you combine PowerPath Encryption with Emulex Encryption HBAs, you get in-flight AND at-rest encryption with near-zero performance impact.

  • Do you replicate your sensitive data to a 3rd party remote datacenter for business continuity?  PowerPath Encryption prevents unauthorized access to the data because no host can read it without the proper key.

Can you Compress AND Dedupe? It Depends

Posted on by

My recent post about Compression vs Dedupe, which was sparked by Vaughn’s blog post about NetApp’s new compression feature, got me thinking more about the use of de-duplication and compression at the same time.  Can they work together?  What is the resulting effect on storage space savings?  What if we throw encryption of data into the mix as well?

What is Data De-Duplication?

De-duplication in the data storage context is a technology that finds duplicate patterns of data in chunks of blocks (sized from 4-128KB or so depending on implementation), stores each unique pattern only once, and uses reference pointers in order to reconstruct the original data when needed.  The net effect is a reduction in the amount of physical disk space consumed.

What is Data Compression?

Compression finds very small patterns in data (down to just a couple bytes or even bits at a time in some cases) and replaces those patterns with representative patterns that consume fewer bytes than the original pattern.  An extremely simple example would be replacing 1000 x “0”s with “0-1000”, reducing 1000 bytes to only 6.

Compression works on a more micro level, where de-duplication takes a slighty more macro view of the data.

What is Data Encryption?

In a very basic sense, encryption is a more advanced version of compression.  Rather than compare the original data to itself, encryption uses an input (a key) to compute new patterns from the original patterns, making the data impossible to understand if it is read without the matching key.

Encryption and Compression break De-Duplication

One of the interesting things about most compression and encryption algorithms is that if you run the same source data through an algorithm multiple times, the resulting encrypted/compressed data will be different each time.  This means that even if the source data has repeating patterns, the compressed and/or encrypted version of that data most likely does not.  So if you are using a technology that looks for repeating patterns of bytes in fairly large chunks 4-128KB, such as data de-duplication, compression and encryption both reduce the space savings significantly if not completely.

I see this problem a lot in backup environments with DataDomain customers.  When a customer encrypts or compresses the backup data before it gets through the backup application and into the DataDomain appliance, the space savings drops and many times the customer becomes frustrated by what they perceive as a failing technology.  A really common example is using Oracle RMAN or using SQL LightSpeed to compress database dumps prior to backing up with a traditional backup product (such as NetWorker or NetBackup).

Sure LightSpeed will compress the dump 95%, but every subsequent dump of the same database is unique data to a de-duplication engine and you will get little if any benefit from de-duplication.   If you leave the dump uncompressed, the de-duplication engine will find common patterns across multiple dumps and will usually achieve higher overall savings.  This gets even more important when you are trying to replicate backups over the WAN, since de-duplication also reduces replication traffic.

It all depends on the order

The truth is you CAN use de-duplication with compression, and even encryption.  They key is the order in which the data is processed by each algorithm.  Essentially, de-duplication must come first.  After data is processed by de-duplication, there is enough data in the resulting 4-128KB blocks to be compressed, and the resulting compressed data can be encrypted.  Similar to de-duplication, compression will have lackluster results with encrypted data, so encrypt last.

Original Data -> De-Dupe -> Compress -> Encrypt -> Store

There are good examples of this already;

EMC DataDomain – After incoming data has been de-duplicated, the DataDomain appliance compresses the blocks using a standard algorithm.  If you look at statistics on an average DDR appliance you’ll see 1.5-2X compression on top of the de-duplication savings.  DataDomain also offers an encryption option that encrypts the filesystem and does not affect the de-duplication or compression ratios achieved.

EMC Celerra NAS – Celerra De-Duplication combines single instance store with file level compression.  First, the Celerra hashes the files to find any duplicates, then removes the duplicates, replacing them with a pointer.  Then the remaining files are compressed.  If Celerra compressed the files first, the hash process would not be able to find duplicate files.

So what’s up with NetApp’s numbers?

Back to my earlier post on Dedupe vs. Compression; what is the deal with NetApp’s dedupe+compression numbers being mostly the same as with compression alone?  Well, I don’t know all of the details about the implementation of compression in ONTAP 8.0.1, but based on what I’ve been able to find, compression could be happening before de-duplication.  This would easily explain the storage savings graph that Vaughn provided in his blog.  Also, NetApp claims that ONTAP compression is inline, and we already know that ONTAP de-duplication is a post-process technology.  This suggests that compression is occurring during the initial writes, while de-duplication is coming along after the fact looking for duplicate 4KB blocks.  Maybe the de-duplication engine in ONTAP uncompresses the 4KB block before checking for duplicates but that would seem to increase CPU overhead on the filer unnecessarily.

Encryption before or after de-duplication/compression – What about compliance?

I make a recommendation here to encrypt data last, ie: after all data-reduction technologies have been applied.  However, the caveat is that for some customers, with some data, this is simply not possible.  If you must encrypt data end-to-end for compliance or business/national security reasons, then by all means, do it.  The unfortunate byproduct of that requirement is that you may get very little space savings on that data from de-duplication both in primary storage and in a backup environment.  This also affects WAN bandwidth when replicating since encrypted data is difficult to compress and accelerate as well.

EMC CLARiiON and Celerra Updates – Defining Unified Storage

Posted on by

This past week, during EMC World 2010 in Boston, EMC made several announcements of updates to the Celerra and CLARiiON midrange platforms.  Some of the most impressive were new capabilities coming to CLARiiON FLARE in just a couple short months.  Major updates to Celerra DART will coincide with the FLARE updates and if you are already running CLARiiON CX4 hardware, or are evaluating CX4 (or Celerra), you will want to check these new features out.  They will be available to existing CX4(120,240,480,960)/NS(120,480,960) systems as part of a software update.

Here’s a list of key changes in FLARE 30:

  • Unified management for midrange storage platforms including CLARiiON and Celerra today, plus RecoverPoint, Replication Manager and more in the future.  This is a true single pane of glass for monitoring AND managing SAN, NAS, and data protection and it’s built in to the platform.  “EMC Unisphere” replaces Navisphere Manager and Celerra Manager and supports multiple storage systems simultaneously in a single window. (Video Demo)
  • Extremely large cache (ie: FASTCache) – Up to 2TB of additional read/write cache in CLARiiON using SSDs (Video Demo)
  • Block level Fully Automated Storage Tiering (ie: sub-LUN FAST) – Fully automated assignment of data across multiple disk types
  • Block Level Compression – Compress LUNs in the CLARiiON to reduce disk space requirements
  • VAAI Support – Integrate with vSphere ESX for improved performance

These features are in addition to existing features like:

  • Seamless and non-disruptive mobility of LUNs within a storage array – (via Virtual LUNs)
  • Non-Disruptive Data Migration – (via PowerPath Migration Enabler)
  • VMWare Aware Storage Management – (Navisphere, Unisphere, and vSphere Plugins giving complete visibility  and self-service provisioning for VMWare admins (Video Demo) AND Storage Admins
  • CIFS and NFS Compression – Compress production data on Celerra to reduce disk space requirements including VMs
  • Dynamic SAN path load balancing – (via PowerPath)
  • At-Rest-Encryption – (via PowerPath w/RSA)
  • SSD, FC, and SATA drives in the same system – Balance performance and capacity as needed for your application
  • Local and Remote replication with array level consistency – (SnapView, MirrorView, etc)
  • Hot-swap, Hot-Add, Hot-Upgrade IO Modules – Upgrade connectivity for FC, FCoE, and iSCSI with no downtime
  • Scale to 1.8PB of storage in a single system
  • Simultaneously provide FC, iSCSI, MPFS, NFS, and CIFS access

All together, this is an impressive list of features for a single platform. In fact, while many of EMC’s competitors have similar features, none of them have all of them in the same platform, or leverage them all simultaneously to gain efficiency.  When CLARiiON CX4 and Celerra NS are integrated and managed as a single Unified storage system with EMC Unisphere there is tremendous value as I’ll point out below…

Improve Performance easily…

  • Install a couple SSD drives into a CLARiiON and enable FASTCache to increase the array’s read/write cache from the industry competive 4GB-32GB up to 2TB of array based non-volatile Read AND Write cache available to ALL applications including NAS data hosted by the array.
  • Install PowerPath on Windows, Linux, Solaris, AND VMWare ESX hosts to automatically balance IO across all available paths to storage.  PowerPath detects latency and queuing occuring on each path and adjusts automatically, improving performance at the storage array AND for your hosts.  This is a huge benefit in VMWare environments especially.
  • When VMWare releases the updated version of vSphere ESX that supports VAAI, ESX will be able to leverage VAAI support in the CLARiiON to reduce the amount of IO required to do many tasks, improving performance across the environment again.
  • Upgrade from 1gbe iSCSI to 10gbe iSCSI, or from 4gbe FiberChannel to 8gbe FiberChannel, without a screwdriver or downtime.
  • Provide NAS shared file access with block-level performance for any application using EMC’s MPFS protocol.

Improve Efficiency and cost easily…

  • Create a single pool of storage containing some SSD, some FC, and some SATA drives, that automatically monitors and moves portions of data to the appropriate disk type to both improve performance AND decrease cost simultaneously.
  • Non-disruptively compress volumes and/or files with a single click to save 50% of your disk space in many cases.
  • Convert traditional LUNs to more efficient Thin-LUNs non-disruptively using PowerPath Migration Enabler, saving more disk space.

Increase and Manage Capacity easily…

  • Add additional storage non-disruptively with SSD, FC, and SATA drives in any mix up to 1.8PB of raw storage in a single CLARiiON CX4.
  • Using FASTCache, iSCSI, FC, and FCoE connectivity simultaneously does not reduce total capacity of the system.
  • Expanding LUNs, RAID Groups, and Storage Pools is non-disruptive.
  • Migrating LUNs between RAID groups and/or Storage Pools is non-disruptive using built-in CLARiiON LUN Migration, as is migrating data to a different storage array (using PowerPath Migration Enabler)!
  • Balancing workload between storage processors is non-disruptive and at individual LUN granularity.

Protect your data easily…

  • Snapshot, Clone, and Replicate any of the data to anywhere with built in array tools that can maintain complete data consistency across a single, or multiple applications without installing software.
  • Maintain application consistency for Exchange, SQL, Oracle, SAP, and much more, even within VMWare VMs, while replicating to anywhere with a single pane-of-glass.
  • Encrypt sensitive data seamlessly using PowerPath Encryption w/RSA.

Maintain Flexibility…

  • While you can do all of these things quickly and simply, you still have the flexibility to create traditional RAID sets using RAID 0, 1, 5, 6, and 10 where you need highly predicable performance, or tune read and write cache at the array and LUN level for specific workloads.  Do you want read/write snapshots? How about full copy clones on completely separate disks for workload isolation and failure protection? What about the ability to rollback data to different points in time using snapshots without deleting any other snapshots?  EMC Storage arrays have been able to do this for a long time and that hasn’t changed.

There are few manufacturers aside from EMC that can provide all of these capabilities, let alone provide them within a single platform.  That’s the definition of simple, efficient, Unified Storage in my opinion.