Commodity based Clustered Storage

Scale Computing’s core offering is Commodity‐based Clustered Storage, or CCS. CCS functions as an independently controlled mass data storage device, similar in function to a Storage Area Network (SAN) or Network Attached Storage (NAS) device.
In fact, their CCS appliances, called Storage Nodes (SN) with TrueClusterTM, can operate as a SAN, NAS or both simultaneously. Their software architecture also allows them to use commoditybased hardware, rather than expensive proprietary designs, to deliver enterprise‐class storage features at the lowest entry price on the market.
The key difference between Scale’s Storage Nodes with TrueCluster and the current state‐of‐the‐art systems (SAN/NAS) is one of underlying architecture: TrueCluster uses both proprietary and licensed technologies that come primarily from the supercomputing (also called High Performance Computing, or HPC) sector to create a grid storage system from multiple clustered nodes. As such, the grid consists of a number of small, commodity‐based devices which, when combined with this supercomputing based technology, function as a single unit, delivering a storage network that provides a number of key, game changing advantages over previously existing technologies.
Storage Market
Detached Enterprise Storage (encompassing SAN, NAS, and similar systems) is a $40 billion market, growing at 12% annually. EMC, Hewlett Packard, Network Appliance, and Dell lead the market. Existing solutions rely heavily upon physically redundant components and RAID technology to address customer requirements. A gap exists in the marketplace for a solution that offers many of the features of enterprise‐class storage, the low per TB cost of the array, and the versatility of cloud‐based storage systems. Scale is filling that market need.
Market Evolution and Scale Computing
As the market continues to grow, its evolution has intensified. While the simple array has offered low per/TB cost, its scalability is constrained to its form factor. Customers must buy storage in excess of their current needs and scale only to the point that the form factor allows. That solution has proved insufficient as storage needs grow exponentially.
To that end, solutions that attempt to turn arrays into simulated “cluster” or “grid” systems have entered the market. LeftHand (a solution now moving into the CCS category vis‐a‐vis HP), Compellent and EqualLogic (both use monolithic architectures to simulate clustered storage), lead this transition. Current brand leaders in the enterprise include EMC and NetApp. Both have substantial market share and extensive leadership in the storage industry.
Scalability comes with a price. Simple arrays are inexpensive, but can’t scale. Monolithic and brand name systems can scale, but at a large entry price.
That’s where Scale plays the role of the disruptor. Scale’s TrueCluster architecture works on many hardware platforms and commodity‐based hardware. Unlike their SAN/NAS competitors, TrueCluster’s entry price is comparable to simple arrays, yet its single‐file system scalability often exceeds that of more expensive enterprise solutions. The net result: with Scale’s TrueCluster, your entry price is low and you can expand storage, incrementally, as you need it, and within a much smaller budget.
Order of Magnitude Cost Savings
A traditional SAN system that delivers generally demanded functionality (high availability, redundancy, connectivity) carries a price tag that is typically in the area of $10,000 ‐ $20,000 per usable terabyte of storage capacity The fundamentals of the TrueCluster architecture drive down deployment costs. By using a series of smaller devices rather than one (or few) large device, redundancy and high availability are achieved without the need for the development of proprietary hardware. Instead, a combination of off‐the‐shelf or “commodity” hardware can be used in conjunction with software technology that manages the allocation of the data across all nodes, or “Storage Nodes” (SN) in the grid. Each node contains a number of physical hard disk drives, and it is the underlying software technology, rather than proprietary hardware and controllers, which manages the distribution of data across both individual drives and across nodes in the grid.
The resulting cost savings is significant: The cost of deploying Scale’s Storage Nodes is as low as $2,500 per usable terabyte, which is array‐like pricing for a high‐end clustered storage feature set. Additionally, their TrueCluster technology only requires a minimum entry purchase of three nodes (for redundancy). Each additional node can be added 1TB at a time, as needed. Compared to all other solutions in the market, Scale’s TrueCluster has by far the lowest cost of entry and the lowest cost per TB cost of any other clustered storage product and simulated clustered (control unit or third party software based) solutions.
Significant Scalability Improvements over Current Technologies
Current SAN/NAS technology is very limited in its ability to scale up as storage requirements grow. The most common method of scaling involves the non‐technology solution of forecasting future storage needs and grossly overbuying capacity today in order to have free capacity in the future. In some limited cases, a number of SAN devices can be interconnected, but in doing so, elements of high‐availability are lost and losing a single SAN component will take the entire array offline (see below). When the device reaches capacity, the standard procedure is to purchase an entirely new device, migrate data, and repurpose/decommission the previously used device, often at great expense.
Their TrueCluster technology makes expanding capacity quite effortless and straightforward. Simply adding a new node to the array will expand the capacity of the entire array, without the need to migrate data or take the old nodes offline. Hundreds or even thousands of nodes can be added on an as‐needed basis, allowing customers to grow from small data stores of just a few terabytes, growing then up into the petabytes by simply adding additional nodes as they go.
Substantial Improvements in High Availability, Redundancy, and Recovery
The TrueCluster architecture represents a paradigm shift in the ability to recover from hardware failure without the use of RAID technology. The underlying file system technology ensures that data is mirrored and redundant elsewhere in the grid, such that data recovery in the event of a failure is fast and painless. In the event of hardware failure, the controlling software instantly begins re‐replicating the lost data so it can be prepared to recover from a potential subsequent failure. Redundant data copies are not kept on the same individual node, thereby ensuring a smooth recovery even if an entire node is taken offline suddenly and without warning. Likewise, the controlling software ensures that adequate disk space exists elsewhere in the grid for re‐replication to take place. Data is distributed at the block level, providing for the ability to replicate even very large individual files (such as databases or video files) in this manner. System uptime is maintained up to N/2 – 1 node failures. For example, in a 100‐node grid, up to 49 devices can be generally lost before the system is taken offline.
No Single Point of Failure
Other attempts to use a multiple device‐based architecture have been plagued with the problem of incorporating a single point of failure somewhere in the system. Typically, these systems rely on either a central “master” node for either configuration, data allocation, or both, which then issues instructions or sends data information to the “slave” nodes which are essentially passive receivers. This introduces the inherent problem of creating a single point of failure (the master node) at the point in where the system is under the most stress (all data requests, user permissions and/or configuration changes flowing through the master).
TrueCluster’s proprietary grid control technology creates an architecture in which all nodes are active nodes and no single node functions as a master. This is best illustrated by the example of configuration management: an administrator can initiate changes to the entire grid by accessing a single administrative interface, and that interface is available from any of the individual nodes in the grid.
Likewise, any individual node on the grid can identify where any data physically exists on the grid without accessing any kind of master database (which would exist on a master node). It’s in this way that re‐replication of data is possible regardless of what specific node (or disk) failure triggered the need for re‐replication.
Throughput and Performance Improvements
Connectivity into most SAN/NAS architectures consists of one or two inputs through which all data flows, regardless of the size of the system. As such, simultaneous connections to these devices result in throughput degradation. For example, if a traditional NAS has throughput capabilities of 70 MB/s, then that throughput will be shared across all simultaneous connections. Two connections will each receive half the speed, or 35 MB/s; five connections would each receive 1/5th the speed; and so on. The grid nature of the TrueCluster architecture, combined with the elimination of a master node, results in the ability to deliver high throughput in a parallel access environment, because each node in the grid is yet another access point into the entire system. For example, if the node can deliver 70 MB/s and consists of 10 individual nodes, then 2 connections would each receive 70 MB/s; 5 connections would each receive 70 MB/s; and so on, until the point at which simultaneous connections exceeded the number of individual nodes in the grid, at which time throughput would be shared among them.
Furthermore, advancements in data storage protocols are making parallel access even more productive. For example, parallel CIFS (a standard data transfer protocol that is an improvement over standard CIFS) enables data read/writes to be performed across multiple devices simultaneously. If a server were using this protocol in conjunction with a TrueCluster‐based storage array, massive performance improvements are attainable. For example, if a file was being written over standard CIFS on a traditional NAS, the throughput is limited by the maximum throughput of the NAS, and even more so by the number of parallel access connections to that NAS as previously described. However, using parallel CIFS with TrueCluster, that single file can be written across all nodes simultaneously, such that, using the previous example, a 10 node grid could achieve 700 MB/s throughput (70 MB/s * 10 nodes) because each node is a autonomous entry point. Scale’s TrueCluster standard entry set‐up consists of (3) nodes with a combined throughput of 210 MB/sec.
Masterless Grid/Node Architecture
The TrueCluster architecture and the resulting Scale Storage Nodes use a robust combination of licensed intellectual property, proprietary technology, and open source software. Among the proprietary components of the system is master‐less grid/node architecture, specifically around the ability to change and control the entire cluster from any node – in essence, it is a grid of equals, where each node has the same abilities as any other node, and can be controlled, or initiate controls, as such.
Automation of HPC File system Management
A number of key technology components are licensed, and most notable among these licenses is a license from IBM for a High Performance Computing (HPC) file system, a file system developed by IBM for use primarily in the supercomputing world. Installations of this HPC file system are typically unique to each customer, and involve substantial professional services work on the part of IBM to setup and then maintain these systems.
A portion of their proprietary technology is the use of artificial intelligence based management systems that replace the need for customized professional service‐centric installations. This uses an expert system to handle the tasks that normally require human intervention in HPC file system management, including the complexities of node maintenance and disk provisioning. Using this system, the time to complete drive provisioning and other tasks is reduced from a full day project that requires considerable expertise into a simple point‐and‐click exercise taking 10 minutes or less.
TrueCluster Grid/File System Management
The management system underlying the TrueCluster architecture uses two key components. First, a proprietary expert system is used to handle the nuances of setting up and maintaining the grid architecture and the HPC file system. It is primarily due to this expert system that we are able to take a traditionally customized and complex architecture (grid computing), and use it in such a way that the end customers (and administrators) are shielded from that underlying complexity. A primary roadblock to using grid‐computing technologies in widespread commercial deployments is this very complexity, and Scale's expert system is a part of their competitive differentiation at the technology level.
The second component of their TrueCluster architecture is their Predictive Hardware Failure (PHF) technology. PHF uses combination of machine learning techniques to forecast when various hardware components are likely to fail, while there is still time to migrate data and smoothly shutdown those components before sudden unexpected failure occurs. This is done by monitoring low‐level kernel and device drive information in real‐time, and identifying the patterns that indicate impending failure. The integration of hardware and software is key to enabling this technology, and patent claims around this technology are being pursued.
Conclusion
The growth of data storage and the increasingly robust requirements for virtualization and archiving use cases is pushing disk‐based storage vendors to incorporate and/or emulate the benefits of a clustered, node‐based grid storage system.
By using commodity‐based hardware and a combination of proprietary and proven licensed software, Scale Computing can offer the future of storage today, and at a price anyone can afford.
» print friendly version
In fact, their CCS appliances, called Storage Nodes (SN) with TrueClusterTM, can operate as a SAN, NAS or both simultaneously. Their software architecture also allows them to use commoditybased hardware, rather than expensive proprietary designs, to deliver enterprise‐class storage features at the lowest entry price on the market.
The key difference between Scale’s Storage Nodes with TrueCluster and the current state‐of‐the‐art systems (SAN/NAS) is one of underlying architecture: TrueCluster uses both proprietary and licensed technologies that come primarily from the supercomputing (also called High Performance Computing, or HPC) sector to create a grid storage system from multiple clustered nodes. As such, the grid consists of a number of small, commodity‐based devices which, when combined with this supercomputing based technology, function as a single unit, delivering a storage network that provides a number of key, game changing advantages over previously existing technologies.
Storage Market
Detached Enterprise Storage (encompassing SAN, NAS, and similar systems) is a $40 billion market, growing at 12% annually. EMC, Hewlett Packard, Network Appliance, and Dell lead the market. Existing solutions rely heavily upon physically redundant components and RAID technology to address customer requirements. A gap exists in the marketplace for a solution that offers many of the features of enterprise‐class storage, the low per TB cost of the array, and the versatility of cloud‐based storage systems. Scale is filling that market need.
Market Evolution and Scale Computing
As the market continues to grow, its evolution has intensified. While the simple array has offered low per/TB cost, its scalability is constrained to its form factor. Customers must buy storage in excess of their current needs and scale only to the point that the form factor allows. That solution has proved insufficient as storage needs grow exponentially.
To that end, solutions that attempt to turn arrays into simulated “cluster” or “grid” systems have entered the market. LeftHand (a solution now moving into the CCS category vis‐a‐vis HP), Compellent and EqualLogic (both use monolithic architectures to simulate clustered storage), lead this transition. Current brand leaders in the enterprise include EMC and NetApp. Both have substantial market share and extensive leadership in the storage industry.
Scalability comes with a price. Simple arrays are inexpensive, but can’t scale. Monolithic and brand name systems can scale, but at a large entry price.
That’s where Scale plays the role of the disruptor. Scale’s TrueCluster architecture works on many hardware platforms and commodity‐based hardware. Unlike their SAN/NAS competitors, TrueCluster’s entry price is comparable to simple arrays, yet its single‐file system scalability often exceeds that of more expensive enterprise solutions. The net result: with Scale’s TrueCluster, your entry price is low and you can expand storage, incrementally, as you need it, and within a much smaller budget.
Order of Magnitude Cost Savings
A traditional SAN system that delivers generally demanded functionality (high availability, redundancy, connectivity) carries a price tag that is typically in the area of $10,000 ‐ $20,000 per usable terabyte of storage capacity The fundamentals of the TrueCluster architecture drive down deployment costs. By using a series of smaller devices rather than one (or few) large device, redundancy and high availability are achieved without the need for the development of proprietary hardware. Instead, a combination of off‐the‐shelf or “commodity” hardware can be used in conjunction with software technology that manages the allocation of the data across all nodes, or “Storage Nodes” (SN) in the grid. Each node contains a number of physical hard disk drives, and it is the underlying software technology, rather than proprietary hardware and controllers, which manages the distribution of data across both individual drives and across nodes in the grid.
The resulting cost savings is significant: The cost of deploying Scale’s Storage Nodes is as low as $2,500 per usable terabyte, which is array‐like pricing for a high‐end clustered storage feature set. Additionally, their TrueCluster technology only requires a minimum entry purchase of three nodes (for redundancy). Each additional node can be added 1TB at a time, as needed. Compared to all other solutions in the market, Scale’s TrueCluster has by far the lowest cost of entry and the lowest cost per TB cost of any other clustered storage product and simulated clustered (control unit or third party software based) solutions.
Significant Scalability Improvements over Current Technologies
Current SAN/NAS technology is very limited in its ability to scale up as storage requirements grow. The most common method of scaling involves the non‐technology solution of forecasting future storage needs and grossly overbuying capacity today in order to have free capacity in the future. In some limited cases, a number of SAN devices can be interconnected, but in doing so, elements of high‐availability are lost and losing a single SAN component will take the entire array offline (see below). When the device reaches capacity, the standard procedure is to purchase an entirely new device, migrate data, and repurpose/decommission the previously used device, often at great expense.
Their TrueCluster technology makes expanding capacity quite effortless and straightforward. Simply adding a new node to the array will expand the capacity of the entire array, without the need to migrate data or take the old nodes offline. Hundreds or even thousands of nodes can be added on an as‐needed basis, allowing customers to grow from small data stores of just a few terabytes, growing then up into the petabytes by simply adding additional nodes as they go.
Substantial Improvements in High Availability, Redundancy, and Recovery
The TrueCluster architecture represents a paradigm shift in the ability to recover from hardware failure without the use of RAID technology. The underlying file system technology ensures that data is mirrored and redundant elsewhere in the grid, such that data recovery in the event of a failure is fast and painless. In the event of hardware failure, the controlling software instantly begins re‐replicating the lost data so it can be prepared to recover from a potential subsequent failure. Redundant data copies are not kept on the same individual node, thereby ensuring a smooth recovery even if an entire node is taken offline suddenly and without warning. Likewise, the controlling software ensures that adequate disk space exists elsewhere in the grid for re‐replication to take place. Data is distributed at the block level, providing for the ability to replicate even very large individual files (such as databases or video files) in this manner. System uptime is maintained up to N/2 – 1 node failures. For example, in a 100‐node grid, up to 49 devices can be generally lost before the system is taken offline.
No Single Point of Failure
Other attempts to use a multiple device‐based architecture have been plagued with the problem of incorporating a single point of failure somewhere in the system. Typically, these systems rely on either a central “master” node for either configuration, data allocation, or both, which then issues instructions or sends data information to the “slave” nodes which are essentially passive receivers. This introduces the inherent problem of creating a single point of failure (the master node) at the point in where the system is under the most stress (all data requests, user permissions and/or configuration changes flowing through the master).
TrueCluster’s proprietary grid control technology creates an architecture in which all nodes are active nodes and no single node functions as a master. This is best illustrated by the example of configuration management: an administrator can initiate changes to the entire grid by accessing a single administrative interface, and that interface is available from any of the individual nodes in the grid.
Likewise, any individual node on the grid can identify where any data physically exists on the grid without accessing any kind of master database (which would exist on a master node). It’s in this way that re‐replication of data is possible regardless of what specific node (or disk) failure triggered the need for re‐replication.
Throughput and Performance Improvements
Connectivity into most SAN/NAS architectures consists of one or two inputs through which all data flows, regardless of the size of the system. As such, simultaneous connections to these devices result in throughput degradation. For example, if a traditional NAS has throughput capabilities of 70 MB/s, then that throughput will be shared across all simultaneous connections. Two connections will each receive half the speed, or 35 MB/s; five connections would each receive 1/5th the speed; and so on. The grid nature of the TrueCluster architecture, combined with the elimination of a master node, results in the ability to deliver high throughput in a parallel access environment, because each node in the grid is yet another access point into the entire system. For example, if the node can deliver 70 MB/s and consists of 10 individual nodes, then 2 connections would each receive 70 MB/s; 5 connections would each receive 70 MB/s; and so on, until the point at which simultaneous connections exceeded the number of individual nodes in the grid, at which time throughput would be shared among them.
Furthermore, advancements in data storage protocols are making parallel access even more productive. For example, parallel CIFS (a standard data transfer protocol that is an improvement over standard CIFS) enables data read/writes to be performed across multiple devices simultaneously. If a server were using this protocol in conjunction with a TrueCluster‐based storage array, massive performance improvements are attainable. For example, if a file was being written over standard CIFS on a traditional NAS, the throughput is limited by the maximum throughput of the NAS, and even more so by the number of parallel access connections to that NAS as previously described. However, using parallel CIFS with TrueCluster, that single file can be written across all nodes simultaneously, such that, using the previous example, a 10 node grid could achieve 700 MB/s throughput (70 MB/s * 10 nodes) because each node is a autonomous entry point. Scale’s TrueCluster standard entry set‐up consists of (3) nodes with a combined throughput of 210 MB/sec.
Masterless Grid/Node Architecture
The TrueCluster architecture and the resulting Scale Storage Nodes use a robust combination of licensed intellectual property, proprietary technology, and open source software. Among the proprietary components of the system is master‐less grid/node architecture, specifically around the ability to change and control the entire cluster from any node – in essence, it is a grid of equals, where each node has the same abilities as any other node, and can be controlled, or initiate controls, as such.
Automation of HPC File system Management
A number of key technology components are licensed, and most notable among these licenses is a license from IBM for a High Performance Computing (HPC) file system, a file system developed by IBM for use primarily in the supercomputing world. Installations of this HPC file system are typically unique to each customer, and involve substantial professional services work on the part of IBM to setup and then maintain these systems.
A portion of their proprietary technology is the use of artificial intelligence based management systems that replace the need for customized professional service‐centric installations. This uses an expert system to handle the tasks that normally require human intervention in HPC file system management, including the complexities of node maintenance and disk provisioning. Using this system, the time to complete drive provisioning and other tasks is reduced from a full day project that requires considerable expertise into a simple point‐and‐click exercise taking 10 minutes or less.
TrueCluster Grid/File System Management
The management system underlying the TrueCluster architecture uses two key components. First, a proprietary expert system is used to handle the nuances of setting up and maintaining the grid architecture and the HPC file system. It is primarily due to this expert system that we are able to take a traditionally customized and complex architecture (grid computing), and use it in such a way that the end customers (and administrators) are shielded from that underlying complexity. A primary roadblock to using grid‐computing technologies in widespread commercial deployments is this very complexity, and Scale's expert system is a part of their competitive differentiation at the technology level.
The second component of their TrueCluster architecture is their Predictive Hardware Failure (PHF) technology. PHF uses combination of machine learning techniques to forecast when various hardware components are likely to fail, while there is still time to migrate data and smoothly shutdown those components before sudden unexpected failure occurs. This is done by monitoring low‐level kernel and device drive information in real‐time, and identifying the patterns that indicate impending failure. The integration of hardware and software is key to enabling this technology, and patent claims around this technology are being pursued.
Conclusion
The growth of data storage and the increasingly robust requirements for virtualization and archiving use cases is pushing disk‐based storage vendors to incorporate and/or emulate the benefits of a clustered, node‐based grid storage system.
By using commodity‐based hardware and a combination of proprietary and proven licensed software, Scale Computing can offer the future of storage today, and at a price anyone can afford.
» print friendly version