by Dan Tanner
former Technical Analyst
ProgresSmart


A rich, new, and complex market is growing around data protection. At the market’s edges, technologies are becoming standardized and storage devices are becoming commoditized.

But in the market’s center there’s an often bewildering array of software and appliances which, if correctly applied, can offer a business not only solid data protection, but also business continuity and cradle to grave care for data within an Information Lifecycle Management context.

Dominant storage vendors have direct sales forces aimed squarely at Fortune 1000 enterprises and determined only to sell their products. A Value-Added Reseller (VAR) can provide a business with best-of-breed solutions assembled from offerings from its partnership portfolio.

A great VAR doesn’t simply sell commodities and move on but rather rolls up its sleeves and helps businesses, regardless of their position on Fortune’s list, realize solutions they may have thought were available to the top 1000 — with economy. This paper is about TriAxis Inc., a VAR that truly adds value for its customers.

As you can see in Figure 1, the edges of the storage and backup landscape is populated by standardized protocols such as Fibre Channel (FC) and FC over Internet Protocol (IP), the Small Computer System Interface (SCSI) and SCSI over IP (iSCSI) and Advanced Technology Attachment (ATA, also known as IDE) and serial ATA (sATA) for device connectivity depending on such factors as distance, addressability, reliability, and cost and by commodity disk and tape drives, tape loaders, and tape libraries.

The same figure also illustrates that numerous technologies exist in the sector of the storage and backup landscape that can be characterized as “management”. Even the most casual read of the trade press, or commonsense about the way Moore’s Law is driving down the cost while generally improving commodities and technologies as personnel costs that must be borne to accomplish management, leads to an inescapable conclusion: Storage and backup over lifetime will be dominated by operating expense (OPEX) over capital expense (CAPEX).

Clearly, the heavy lifting comes in management. And businesses engaged with VARs should seek out a VAR that adds value by helping with the heavy lifting, because all that stuff in the middle of the storage and backup product landscape represents pieces of a complex puzzle. Or rather, pieces of several puzzles, which rightly assembled, form a full kit that meets a businesses needs. The helpful VAR will work with a business to assess its needs and assemble the best kit for the business.

So, it behooves business IT planners to take a look at that jumble of acronyms and abbreviations in the middle of the storage and backup landscape. And a word to the wise: Don’t be fooled by the apparent simplicity of storage. Yes, the word “storage” is bland and, from that, the subject may seem mundane. But the old adage “still waters run deep” has never been truer than when it applies to computer storage. Computing systems, regardless of scale — from embedded system chips to vast enterprise IT networks — do only three things with information: process it, move it, and store it. And storage is every bit as complex as its companion elements. That’s why a business needs the value-add of a real, true VAR.

The Management Group: What’s in it? What are they?

Figure 1 shows eleven items in the Management circle. Let’s deconstruct a “classic” definition, Hierarchical Storage Management (HSM), before we examine which may fit together and/or be needed for particular business environments. And, to keep things simple, and this paper reasonably short, only HSM will be dealt with in great detail. After all, this paper’s purpose is not to explain network storage in full and in depth, which would take volumes (on-line ones, too, because of rapid advancement and change) but rather to illustrate that a VAR that does its job is a treasure.

Figure 1: The Storage and Backup Product Landscape
TriAxis Solution Scope
Hierarchal Storage Management

According to SearchStorage.com, except for numerical annotations added for our references back: “HSM (Hierarchical Storage Management) is policy-based1 management of file backup2 and archiving3 in a way that uses storage devices economically and without the user needing to be aware of when files are being retrieved from backup storage media4.

Although HSM can be implemented on a standalone system, it is more frequently used in the distributed network of an enterprise. The hierarchy represents different types of storage media, such as redundant array of independent disks systems, optical storage, or tape, each type representing a different level of cost and speed of retrieval when access is needed. For example, as a file ages in an archive, it can be automatically moved to a slower but less expensive form of storage. Using an HSM product, an administrator can establish and state guidelines for how often different kinds of files are to be copied to a backup storage device. Once the guideline has been set up, the HSM software manages everything automatically5.

HSM adds to archiving and file protection for disaster recovery the capability to manage storage devices6 efficiently, especially in large-scale user environments where storage costs can mount rapidly. It also enables the automation of backup, archiving, and migration to the hierarchy of storage devices in a way that frees users from having to be aware of the storage policies7. Older files can automatically be moved to less expensive storage. If needed, they appear to be immediately accessible and can be restored transparently from the backup storage medium8. The apparently available files are known as stubs and point to the real location of the file in backup storage9. The process of moving files from one storage medium to another is known as migration10.

An administrator can set high and low thresholds for hard disk capacity that HSM software will use to decide when11 to migrate older or less-frequently used files to another medium. Certain file types, such as executable files (programs), can be excluded from those to be migrated. ”

The first thing you may have noticed is that we’ve flagged eleven statements in the four preceding paragraphs. That’s because a possibly overzealous (or less informed) definer of the term HSM has practically defined Information Lifecycle Management (ILM, also sometimes referred to as Document Lifecycle Management, or DLM). Now, referring back to the annotations:
  1. HSM has been around since the days of mainframe computers running batch programs, when the hierarchy of memory in descending order of cost was active core (remember core?), disks (usually removable packs), and tape (7- or 9-track reel tape). And, the costs of concern were CAPEX, not OPEX. Importantly, HSM in its original form (and as it still exists in some incarnations) was not policy driven at all, unless one considers an administrator’s ability to set a single age since last accessed parameter “policy”. Rather, HSM is algorithmically driven by a single parameter. Naturally, with interactive, networked, multi-user, multi-application systems, this “one size fits all” so-called “policy” can do more harm than good.

  2. HSM doesn’t manage backup, let alone archiving (see following numbered comment) at all. Backup (and possibly separate archiving) programs do that. Consider this: A business may need to back up critical files hourly, daily, or weekly, but the HSM file migration parameter is set to one or two or six months since the most recent access. Under HSM, some critical application files would never be “backed up”!

  3. Archiving has also been around since the computer era’s old days, but the technologies with which to archive and the very rules for archiving have changed. For years, all tape, punch-card and microfilm archives were distinctly off-line. Auto-loading libraries for reel tapes were a late invention. Of course, we have cassette tapes and optical storage devices and autoloaders, libraries and jukeboxes now. But the reason to archive has changed dramatically so much so that the very word must be carefully defined each time archiving is planned or implemented. The old archive concept applied to data we believed would never be changed and wasn’t even referred to in a while, so we removed it from memory or on-line storage because such storage types cost so much compared to archive media. But nowadays, we archive data because of legal requirements and/or corporate governance policy, because it may be “reference data”, because it’s wasteful and possibly disruptive to repeatedly back up data that will not (or cannot) change (note here that we’re concerned with the cost of the operation, not the cost of the storage), because we’re required to maintain an off-site archive, and, yes, sometimes still because we are concerned with the cost of the storage. Some archives must be immutable, “frozen” when the file is created or received, implying that the archive device (or software) must be WORM — write once, read many. Others, reference data, can be allowed to change, but you wouldn’t want to do it, at least not lightly or frequently (an example would be a seismic data base for mineral or oil exploration), implying you may want to use content optimized storage — COS — or content-addressed storage. And some archives may be those plain old “quiescent” files that we’ve always wanted to archive. Some archives (especially medical and financial ones) must have special privacy attributes and possibly encryption and perhaps secure, access journals that can be audited. And don’t think that archive means off-line any more. ProgresSmart partner David Hill has written “Active Archiving is not an Oxymoron” on his Web site www.mesabigroup.com. Some archives should (or must) be “electronically shredded” according to policy, and HSM definitely doesn’t do that, although there are software packages and/or appliances that do just that. Lastly, some archiving may require the data be recoverable after a very long time — certain medical records, for example, must be kept for some time longer that the patient’s life, dictating the archive method and device selection.

  4. HSM didn’t originally act this way, and it is ILM that is defined to provide this type of operational transparency.

  5. Wow, that’s a dangerous statement! Does a business really want its system or storage IT personnel, who are charged with custody of the content (that is, they manage the storage used to hold the business’ content) with actual records management? That’s a rhetorical question, and the answer is “certainly not”. Records management policy belongs in the hands of professional records managers, lines of businesses, corporate directors and/or internal or external legal counsel. Proper records management may well be beyond the ken for IT managers, who, if they’re smart, will reject the responsibility. There’re also cultural issues surrounding records management and custodianship that are critically important but well beyond the scope of this paper.

  6. HSM does not manage storage. That’s done by storage administrators using storage management and storage resource management (SRM) software tools. HSM only moves files. Even ILM doesn’t manage storage; it only identifies the characteristics that the storage infrastructure must provide for the content at each stage of the content’s “lifecycle”.

  7. Yes, but these days the users or their managers must articulate the retention and destruction policies. The IT administrators then become charged with implementing those policies. This is another job for ILM, not HSM.

  8. Such recall must be under strict policy control. It should work nearly instantaneously if the data’s migration has simply been to another tier of on-line storage. ILM does this; HSM typically doesn’t. If the content has been moved to near-line storage (say, a tape cartridge in a library or optical storage in a jukebox), performance degradation at the very least will be noticeable. Also, there may be reasons (policy or versioning) not to allow users such access to migrated files. Or, the migrated file may be returned as read-only, to an application expecting to be able to write.

  9. Some HSM systems support stubs, others don’t. See the comment above.

  10. Migration can be placed under policy control, especially in a well-implemented ILM environment. But often, migrating files can be costly, time-consuming, disruptive, and even perilous (losing files). Migration under HSM is typically one-war — downstream in the storage hierarchy — without transparent access (or, often, any easy access) afterwards.

  11. By “when” in the HSM context, is meant “elapsed time from last access”, which is very different from “when” in the context of absolute clock or calendar time, let alone any other time under volition.

We don’t mean to denigrate SearchStorage, but merely wish to point out that some storage/content management tasks and solutions can require deep understanding and supplier to customer help, which is what a VAR should do.

Wide Area File Services (WAFS)

SearchStorage does a good job defining WAFS: “Wide-area file services (WAFS) is a storage technology that makes it possible to access a remote data center as though it were local. Among other benefits, WAFS enables businesses, academic institutions, and government agencies having numerous branch offices to centrally manage data backups in real-time12.

Other benefits of WAFS include immediate, round-the-clock read-write access to backed-up data for all end users in the network, low latency and rapid data transfer speed comparable to local area network (LAN) technologies, continuous real-time updating of backup content, enhanced data security, and simple, rapid system recovery in the event the network is compromised or damaged. Additional security of data may be possible by backing up the data on multiple servers at different physical locations.”
  1. Good, but not perfect. “Real-time” means deterministic, which open systems operating systems (Windows, Unix, and Linux), are not. But only very special environments (nuclear power plants and jet planes are examples) need true-real time systems, so let’s not quibble. By real-time, the definition writer meant “really fast, so fast as to appear instantaneous to a human being” which is good enough for this definition.

A good thing about examining HSM above is that doing so caused us to implicitly define many of the remaining terms that appear in the management circle of the storage and backup landscape. In fact we’re left with only D2D, D2D2T and VTL, which because of their relationships can be dealt with as a group, plus clustering and snapshots.

D2D, D2D2T and VTL: Because Tape is a 4-letter Word

Disk to disk (D2D), disk to disk to tape (D2D2T), and virtual tape libraries (VTL) are all replacement technologies for backup (or, point-in-time — PIT— copying) to tape.

Backup is what you must do because you know that your data is changing or will change (as opposed to archiving, which is something different you want to do with data that you know can’t or won’t change).

Backup is done for logical data protection, so that a business can recover data to at the very least, the backup of the last known good copy of it, in the event of an accidental deletion or file loss or corruption due either to accident or malice. Note that mirroring, which offers various levels of physical data protection (depending on distance and whether synchronous or asynchronous) does not offer the logical protection that backup does. For complete data protection and disaster recovery, at the very least, backup and mirroring must be used (and both to safe locations). Now, getting back to tape backup: There are many, many problems with tape backup, and D2D, D2D2T and VTL or eliminate or mitigate them.

VTL technology has been around since mainframe days, and is now commonly available to open systems as well. The core VTL idea is simple: Emulate tape operation onto disk. The advantages of this approach are that the businesses backup operations maintain the same look and feel, backups can become more reliable and typically complete in less time, and backup program license fees can be lessened because of the appearance of fewer tape drives. And, because disk is actually replacing tape, at least at a point in the process, disk advantages, such as a “synthetic backup” comprised of automatically melding incremental backups to create a virtual full backup, can be added. VTL products come as software and configured appliances. Restores from a VTL, especially when the backup exists on disk (before it’s removed, say, to a tape archive) also appear to operate much in the manner of a tape restore. That is, they usually require administrator assistance, but are typically much faster and more reliable than recovery from tape. A VAR can help a business choose the best VTL solution for its needs.

D2D2T and D2D, with Snapshots and COS

D2D2T and D2D differ from VTL in that they’re complete paradigm shifts, with no attempt made to emulate tape. The advantages are that backup can be made even more reliable by employing snapshot technology to effectively eliminate not only tape but also backup servers from the otherwise serial (i.e., as weak and as slow as the slowest, weakest link in the serial string) backup process. D2D2T presumes that more or less standard backup tapes will be archived eventually, and takes care of the process of creating them (usually synthetically merged full backups) from time to time on a separate server in the background.

D2D goes a step further, and presumes that tapes will never be used. That can save on media and medial handling, transportation, and management cost (offset partially by the cost of having a remote mirror of the backup, if desired). In order to make it possible to stuff vast amounts of backup onto available D2D storage, the D2D vendor may employ content optimization. Content Optimized Storage (COS) is usually a vendor by vendor proprietary technique, and may be done at the data source (requiring agent software on each source machine) or only at the storage. The advantage of the former is that the actual transmission of the backup is also optimized.

A VAR can help a business determine whether, when, and how to transition from tape-based backup to D2D2T or D2T, and even help with keeping tape around (for comfort) as long as is desired.

Clustering

Network Attached Storage (NAS) units don’t scale well because each unit has its own file system, but the problem can be overcome by applying a single unified file system across a cluster of NAS heads. With advanced file systems, a SAN cluster can enable data sharing across servers that otherwise would have had to have been partitioned apart from one another on the SAN.
Your VAR can help you decide when to consider using storage clustering, which type (NAS or SAN), which implementation (active/active, active/standby, for example), and so on.

Conclusion

Those still waters of storage certainly do run deep, and TriAxis is the kind of activist, intelligent and helpful value adding VAR than can help businesses navigate those waters. For example, TriAxis at its own time and expense produced a series of free educational Web seminars on storage.

As a good VAR should, TriAxis offers businesses considerable latitude in choice of best-of breed solutions. For example, two TriAxis solution partners offer “commodity” iSCSI SANs, and TriAxis can help a business sort out their subtle but possible important distinctions that matter to individual businesses. TriAxis provides VAR services in the entire data protection strategy tier.


» print friendly version