Category Archives: problems

Capacity vs. Performance : Thin Provisioning

Posted on by

In my previous post, where I discussed the problem of unusable (or slack) disk space on a SAN, I promised a follow-up with techniques on how to increase storage utilization.  I realized that I should discuss some related technologies first and then follow that up with how to put it all together.  So today I start by talking about Thin Provisioning.  I will then follow up with an explanation of De-Duplication and finally talk about how to use multiple technologies together to get the most use out of your storage.

So what is Thin Provisioning?  It is a technology that allows you to create LUNs or Volumes on a storage device such that the LUN/Volume(s) appear to the host or client to be larger than they actually are.   In general, NAS clients and SAN attached hosts see “Thin Provisioned” LUNs just as they see any other LUN but the actual amount of disk space used on the storage device can be significantly smaller than the provisioned size.  How does this help increase storage utilization?  Well, with thin provisioning you provide applications with exactly the storage they want and/or need but you don’t have to purchase all of the disk capacity up front.

Let’s start with a comparison of using standard LUNs vs thin LUNs with a theoretical application set:

Say we have 3 servers, each running Windows Server.  The operating system partition is on local disk and application data drives are on SAN.  Each server runs an application that collects and stores data over time and the application owner expects that over the next year or so the data will grow to 1TB on each server.  In this particular case we also know that the application’s performance requirements are relatively low.

With traditional provisioning we might create 3 LUNs that are 1TB each and present them to the servers.  This provides the application with room for the expected growth.  Using 300GB FC disks we can carve out three 4+1 RAID5 sets, create one LUN in each and it would work fine.  Alternatively we could use wide striping (ie: a MetaLUN on EMC Clariion) and put all three LUNs on the same 15 disks.  Either way we’ve just burned 15 disks on the storage array based on uncertain future requirements.  If we were stingier with storage we could create smaller LUNs (500GB for example) and use LUN expansion technology to increase the size when the application data fills the disk to that capacity.

In the Thin Provisioning world we still create three 1TB LUNs but they would start out by taking no space.  The pool of disk that the LUNs get provisioned from doesn’t even need to have 3TB of capacity.  As the application data grows over the next 12 months or longer the pool size only needs to grow to accommodate the actual amount of data stored.  Depending on the storage array, we can add disks to the pool one at a time.  So on day one we start with 3 disks in the pool, and then add additional disks one by one throughout the year.  We can then create additional LUNs for other applications without adding disks.  As we add disks to the pool, we expand the capacity available for all of the LUNs to grow (up to each LUN’s maximum size) and we increase performance for ALL of the LUNs in the pool since we are adding spindles.  The real-world benefits come as we consolidate numerous LUNs into a single disk pool.

The nice thing about this approach is that we stop managing the size of individual LUNs and just manage the underlying disk pool as a whole.   And the cost-per-GB for SAN disk constantly goes down so we can spend only what we have to today, and when we add more later it will likely be a little cheaper.  Disk capacity utilization will be much higher in a thin model compared with the traditional/thick model.

The story gets even better in a virtual server environment such as with MS Hyper-V or VMWare ESX.  First, the virtual server OS drives are on the SAN in addition to the application data, and there can be multiple virtual disks on the same LUN.  Whether physical or virtual, we need to maintain some free space in the disks to keep applications running, plus with virtual systems we need some free space on the LUN for features of the virtualization technology like snapshots.  The net effect is that in a virtualized environment, disk utilization never gets much above 50% when slack space at both the virtual layer and inside the virtual servers is considered.  With thin provisioning we could potentially store twice the number of virtual servers on the same physical disks.

There are caveats of course.  Maintaining performance is the primary concern.  Whether used in a thick LUN or thin LUN, each disk has a specific amount of performance.   Thin provisioning has no effect on the amount of IOPS or bandwidth the application requires nor the amount of IOPS the physical disk can handle.  So even if thin provisioning saves 50% disk space in your environment, you may not be able to use all of that reclaimed space before running into performance bottlenecks.  If the storage array has QOS features (ie: EMC Clariion NQM) it is possible to prioritize the more important applications in your disk pool to maintain performance where it matters.

Other problems that you may encounter have to do with interoperability.  For starters, some applications are not “thin-friendly”; ie: they write data in such a way as to negate any benefit that thin provisioning provides.  Also, while many storage arrays support thin provisioning, each has different rules about the use of thin LUNs.  For example, in some scenarios you can’t replicate thin LUNs using native array tools.  It pays to do your homework before choosing a new storage array or implementing thin provisioning.

I didn’t cover thin provisioning in NAS environments directly but the feature works in the same manner.  Thin volumes are provisioned from pools of storage and users/clients see a large amount of available disk space even if the disk pool itself is very small.  Since NAS is traditionally used for user home directories and departmental shares, absolute performance is usually not as much of a concern so thin provisioning is much easier to implement and in many cases is the default behavior or simply a check box on NAS appliances like EMC Celerra or NetApp FAS.

Thin provisioning is a powerful technology when used where it makes sense.  In my next post I’ll explain de-duplication technology and then talk about how these technologies can be used together plus some workarounds for the caveats that I’ve mentioned.

Capacity vs. Performance : Why do I have so much free space on my SAN and why can’t I use it?

Posted on by

In the past–in the days of 2GB,4GB,9GB,18GB and even 36GB drives–when you were tasked with purchasing and configuring hard drives for an application, you were given the amount of storage space required for the application and that was pretty much good enough. If you or your company were more organized you’d do an analysis of the performance requirements for that application (ie: IOPS, read/write ratios, bandwidth, etc.) to make sure you had enough spindles to accommodate the application. More often than not, the capacity requirements necessitated more disk than the performance so you’d build your RAID group and fill it up all the way.

Fast forward a few years and 72GB drives are no longer available, 146GB drives are getting close to end-of-sale and there are 300, 400, 600GB drives, and terabyte SATA drives available for almost any storage system or server. The problem is that as these hard drives get bigger, they aren’t getting any faster. In fact, SATA drives are relatively new in the Enterprise space and are slower than traditional 10,000 and 15,000 RPM SCSI drives. But they hold terabytes of data. Today, performance is the primary requirement and capacity is second because in general you need more spindles for the performance of your application than you do to achieve the capacity requirement.

As an example, let’s take a 100GB SQL database that requires 800 IOPS at 50% Read/50% Write.

Back in the day with 18GB drives you’d need 12 disks to provide ~100GB of space in RAID10. Using SCSI-3 10K drives, you can expect about 140 IOPS per disk giving you 1680 IOPS available. Accounting for RAID10 write penalties, you’d have an effective 1100 IOPS, more than enough for your workload of 800 IOPS.

Today, a single 146GB 10K disk can provide all the capacity required for this database; but you still need at least 10 disks to achieve your 800 IOPS workload with RAID10, or 15 disks with RAID5. The capacity of a RAID10 group with ten 146GB drives is approximately 680GB, leaving you with 580GB of free (or slack) space in the RAID group. The trouble is that you can’t use that space for any of your other applications because the SQL database requires all of the performance available in that RAID group. Change it to RAID5, or use new larger disks, and it’s even worse. Switching to 15K RPM drives can help, but it’s only a 30% increase in performance.

If you are managing SAN storage for a large company, your management probably wants you to show them high disk capacity utilization on the SAN to help justify the cost of storage consolidation. But as the individual disk sizes get larger, it becomes increasingly difficult to keep the capacity utilization high, and for many companies it ends up dropping. Thin Provisioning and De-Duplication technologies are all the rage right now as storage companies push their wares, and customers everywhere are hoping that those buzzwords can somehow save them money on storage costs by increasing capacity utilization. But be aware, if you have slack space due to performance requirements, those technologies won’t do you any good and could hurt you. They are useful for certain types of applications, something I’ll discuss in a later post.

So what do you do? Well, there’s not a lot you can do except educate your management on the difference between sizing for performance and sizing for capacity. They should be aware that slack space is a byproduct of the ever increasing size of hard disk drives. Some vendors are selling high speed flash or SSD disks for their SAN storage systems which can be 30-50X faster than a 15K RPM drive and have similar capacities. But flash has a significant cost which only makes sense if you can leverage most of the IOPS available in each disk. In the next installment I’ll discuss tiered data techniques and how they can overcome some of these problems, increasing performance in some cases while also increasing utilization rates.