Tag Archives: bluejet

Auto-Tiering, Cloud Storage, and Risk-Free Thin Pools

Posted on by

Some customers are afraid of thin provisioning…

Practically every week I have discussions with customers about leveraging thin provisioning to reduce their storage costs and just as often the customer pushes back worried that some day, some number of applications, for some reason, will suddenly consume all of their allocated space in a short period of time and cause the storage pool to run out of space.  If this was to happen, every application using that storage pool will essentially experience an outage and resolving the problem requires allocating more space to the pool, migrating data, and/or deleting data, each of which would take precious time and/or money.  In my opinion, this fear is the primary gating factor to customers using thin provisioning.  Exacerbating the issue, most large organizations have a complex procurement process that forces them to buy storage many months in advance of needing it, further reducing the usefulness of thin provisioning.  The IT organization for one of my customers can only purchase new storage AFTER a business unit requests it and approved by senior management; and they batch those requests before approving a storage purchase.  This means that the business unit may have to wait months to get the storage they requested.

This same customer recently purchased a Symmetrix VMAX with FASTVP and will be leveraging sub-LUN tiering with SSD, FC, and SATA disks totaling over 600TB of usable capacity in this single system.  As we began design work for the storage array the topic of thin provisioning came up and the same fear of running out of space in the pool was voiced.  To prevent this, the customer fully allocates all LUNs in the pool up front which prevents oversubscription.  It’s an effective way to guarantee performance and availability but it means that any free space not used by application owners is locked up by the application server and not available to other applications.  If you take their entire environment into account with approximately 3PB of usable storage and NO thin provisioning, there is probably close to $1 million in storage not being used and not available for applications.  If you weigh the risk of an outage causing the loss of several million dollars per hour of revenue, the customer has decided the risks outweigh the potential savings.  I’ve seen this decision made time and again in various IT shops.

Sub-LUN Tiering pushes the costs for growth down

I previously blogged about using cloud storage for block storage in the form of Cirtas BlueJet and how it would not be to much of a stretch to add this functionality to sub-LUN tiering software like EMC’s FASTVP to leverage cloud storage as a block storage tier as shown in this diagram.

Let’s first assume the customer is already using FASTVP for automated sub-LUN tiering on a VMAX.  FASTVP is already identifying the hot and cold data and moving it to the appropriate tier, and as a result the lowest tier is likely seeing the least amount of IOPS per GB.  In a VMAX, each tier consists of one or more virtual provisioned pools, and as the amount of data stored on the array grows FASTVP will continually adjust, pushing the hot data up to higher tiers and cold data down to the lower tiers  The cold data is more likely to be old data as well so in many cases the data sort of ages down the tiers over time and its the old/least used portion of the data that grows.  Conceptually, the only tier you may have to expand is the lowest (ie: SATA) when you need more space.  This reduces the long term cost of data growth which is great.  But you still need to monitor the pools and expand them before they run out of space, or an outage may occur.  Most storage arrays have alerts and other methods to let you know that you will soon run out of space.

Risk-Free Thin Provisioning

What if the storage array had the ability automatically expand itself into a cloud storage provider, such as AT&T Synaptic, to prevent itself from running out of space?  Technically this is not much different from using the cloud as a tier all it’s own but I’m thinking about temporary use of a cloud provider versus long term.  The cloud provider becomes a buffer for times when the procurement process takes too long, or unexpected growth of data in the pool occurs.  With an automated tiering solution, this becomes relatively easy to do with fairly low impact on production performance.  In fact, I’d argue that you MUST have automated tiering to do this or the array wouldn’t have any method for determining what data it should move to the cloud.  Without that level of intelligence, you’d likely be moving hot data to the cloud which could heavily impact performance of the applications.

Once the customer is able to physically add storage to the pool to deal with the added data, the array would auto-adjust by bringing the data back from the cloud freeing up that space.  The cloud provider would only charge for the transfer of data in/out and the temporary use of space.  Storage reduction technologies like compression and de-duplication could be added to the cloud interface to improve performance for data stored in the cloud and reduce costs.  Zero detect and reclaim technologies could also be leveraged to keep LUNs thin over time as well as prevent the movement of zero’d blocks to the cloud.

Using cloud storage as a buffer for thin provisioning in this way could reduce the risk of using thin provisioning, increasing the utilization rate of the storage, and reducing the overall cost to store data.

What do you think?  Would you feel better about oversubscribing storage pools if you had a fully automated buffer, even if that buffer cost some amount of money in the event it was used?

Using Cloud as a SAN Tier?

Posted on by

I came across this press release today from a company that I wasn’t familiar with and immediately wanted more information.  Cirtas Systems has announced support for Atmos-based clouds, including AT&T Synaptic Storage.  Whenever I see these types of announcements, I read on in hopes of seeing real fiber channel block storage leveraging cloud-based architectures in some way.  So far I’ve been a bit disappointed since the closest I’ve seen has been NAS based systems, at best including iSCSI.

Cirtas BlueJet Cloud Storage Controller is pretty interesting in its own right though.  It’s essentially an iSCSI storage array with a cache and a small amount of SSD and SAS drives for local storage.  Any data beyond the internal 5TB of usable capacity is stored in “the cloud” which can be an onsite Private Cloud (Atmos or Atmos/VE) and/or a Public Cloud hosted by Amazon S3, Iron Mountain, AT&T Synaptic, or any Atmos-based cloud service provider.

Cirtas BlueJet

The neat thing with BlueJet is that it leverages a ton of the functionality that many storage vendors have been developing recently such as data de-duplication, compression, some kind of block level tiering, and space efficient snapshots to improve performance and reduce the costs of cloud storage.  It seems that pretty much all of the local storage (SAS, SSD, and RAM) is used as a tiered cache for hot data.  This gives users and applications the sense of local SAN performance even while hosting the majority of data offsite.

While I haven’t seen or used a BlueJet device and can’t make any observations about performance or functionality, I believe this sort of block->cloud approach has pretty significant customer value.  It reduces physical datacenter costs for power and cooling, and it presents some rather interesting disaster recovery opportunities.

Similar to how Compellent’s signature feature, tiered block storage, has been added to more traditional storage arrays, I think modified implementations of Cirtas’ technology will inevitably come from the larger players, such as EMC, as a feature in standard storage arrays.  If you consider that EMC Unified Storage and EMC Symmetrix VMAX both have large caches and block- level tiering today, it’s not too much of a stretch to integrate Atmos directly into those storage systems as another tier.  EMC already does this for NAS with the EMC File Management Appliance.

Conceptual Diagram

I can imagine leveraging FASTCache and FASTVP to tier locally for the data that must be onsite for performance and/or compliance reasons and pushing cold/stale blocks off to the cloud.  Additionally, adding cloud as a tier to traditional storage arrays allows customers to leverage their existing investment in Storage, FC/FCoE networks, reporting and performance trending tools, extensive replication options available, and the existing support for VMWare APIs like SRM and VAAI.

With this model, replication of data for disaster recovery/avoidance only needs to be done for the onsite data since the cloud data could be accessed from anywhere.  At a DR site, a second storage system connects to the same cloud and can access the cold/stale data in the event of a disaster.

Another option would be adding this functionality to virtualization platforms like EMC VPLEX for active/active multi-site access to SAN data, while only needing to store the majority of the company’s data once in the cloud for lower cost.  Customers would no longer have to buy double the required capacity to implement a disaster recovery strategy.

I’m eagerly awating the implementation of cloud into traditional block storage and I can see how some vendors will be able to do this easily, while others may not have the architecture to integrate as easily.  It will be interesting to see how this plays out.