IBM Tivoli Storage Manager (ITSM)
and
Veritas NetBackup
A Comparison

The home of stix
aka Paul Ripke
[home]
[about]
[software]
[thoughts]
[photos]
[links]
[wiki]

[leeandstace]

contact

JJJ

vi Powered
NetBSD Powered
Valid HTML 4.01!
Valid CSS!

Last updated:
2004-10-09 20:18:15

[an error occurred while processing this directive]

I note that this page gets quite a few hits, but as yet I've had no comments. Please feel free to send them in, whether positive, negative, or indifferent.

This attempts to be an objective comparison of the two products. Given that I primarily administer TSM, and attempt to dodge any work involving Veritas NetBackup, my personal views are obviously not of an objective nature. Never the less, I'll try.

This is a work-in-progress, and will be updated as I have ideas, time and input.


Feature Comparison

Aspect ITSM NetBackup
Backup Methodology Incremental only backups, time-aged archiving, and full HSM support Traditional full, incremental/differential. "Archiving" is backup and delete.
Overall design TSM is a monolithic product, written and marketed by one company. This means, in general, that the interfaces presented are consistent and clear. NetBackup is an amalgam of two main products, Volume Manager and Backup Plus. As such, the interface can appear to be fragmented and inconsistent.
Resource Consumption Being very database intensive, TSM requires large system resources for the server. CPU, RAM, I/O bandwidth and good physical database layout are important. The catalog is a simple set of mostly text files, and is not used extensively during backups. NetBackup 4.5 introduces a binary catalog format, for space and performance reasons. I/O bandwidth is the main concern.
Point in time restores As the TSM database maintains a snapshot of which files existed within each filespace at the time the last backup completed, a point-in-time restore will only restore those files. As NetBackup uses a traditional full/incremental cycle, multiple restore cycles may be required, and files deleted from filesystems between restores will still exist on the restored system. This can make systems with a high file turnover difficult to restore. NetBackup includes a feature to work around this, True Image Restore (TIR), which needs to be enabled during backup.
Multiplexing Multiplexing is not required, and does not make sense, with TSM.
Slight Correction: Multiplexing has been introduced for the SAP R/3 TDP, where it does make some sense; allowing multiple client-compressed data streams to run directly to a single tape drive.
Multiplexing is supported, and may be required when backing up multiple slow clients simultaneously to allow tape drive streaming, and also to allow many clients to fit within a backup window. This may have a large impact on restore times.
Tape consumption As reclamations must be run as part of normal TSM operations, free space on tapes can be kept to a minimum. Also, for normal incremental backups, only one copy of each version of a file is kept in each copy pool. Given the requirement that full backups must be periodically run, there is the potential for many copies of exactly the same files to be kept on tape, wasting valuable tape capacity.
Network utilisation As only changed files are sent over the network, traffic is kept to a minimum. Additionally, client-side compression can further dramatically reduce traffic, and sub-file backups can be used over slow WAN links. As full backups are required periodically, network usage can be far greater. Backups over WAN links may be unpractical.
Retention management For backups, four values (versions data exists "VERExists", versions data deleted "VERDeleted", days extra versions "RETExtra", days only version "RETOnly") are set to define retention characteristics. These may be set on an individual file-by-file basis. Moving backups between servers can maintain all retention information. Nine retention levels are defined in NetBackup v3.4, and may be customized. In version 4.5, this was increased to 25 levels. It must be remembered when moving data between servers that the retention levels are specific to each server.
Configuration Clients A TSM server may be administered via two main methods: a single command-line interface with a rich command language, or a web-based graphical interface Five main configuration methods exist: the various command line tools and menu systems, an X-based GUI with a slightly limited view of configuration, a Java-based GUI with a different presentation of the configuration, and a Win32 administration tool.
Configuration complexity TSM is a large, feature rich, complex product. As such, understanding the concepts, abilities of the product, and methods of achieving goals, may take some time. However, once familiar with the concepts, installation and configuration is a moderately complex task. As a group of tools, NetBackup can be difficult to manage. Each administration tool or interface presents a different view of the configuration, and not all methods allow the viewing or changing of all parameters.
Debugging Almost all useful information is to be found in four places: the client log, the client error log, and server activity log, and the server error log. Additionally, there are many internal undocumented trace and debugging facilities, which vendor support may utilise. Some documentation may be found here. Logging may be turned on in various components which may generate extremely large log files. Often, errors may go unlogged, and must be found by other means. I have seen on several occasions, unknown and undocumented error codes logged.
Scheduler A fairly simple and powerful scheduling system is included, for both client operations and server operations. The scheduling system was completely replaced in version 4, as the version 3 scheduler was well known as being broken in a variety of ways. Due to the requirement of regular full backups, the schedules may need to become large and complex to stagger full backups, to reduce network and tape drive contention.
Offsite support DRM is an integrated capability. It becomes an additional layer within the storage heirarchy. Data may move in both directions, onsite tapes copied to an offsite tape pool, and damaged or lost onsite tapes may be rebuilt from offsite tapes. Database tracks locations of individual files, whether onsite or offsite. In version 3, vaulting was an after-market add-on. It was complex and and temperamental. In version 4, it has become a built-in component. Under version 3, tracking location of offsite data was left up to the user, I do not know if this has changed in version 4.
Scaleability If the posts on the ADSM-L mailing list are to be believed, there are many sites backing up over a thousand clients to a single TSM server, with some clients on WANs and mobile clients. There are other sites backing up multi-terabyte SAP instances nightly. It appears, if TSM is well configured, it can scale very well.
TSM started as a Hierarchical Storage Manager on IBM mainframes, and has grown downwards to become a general backup tool.
Due to full backup requirements and scheduling constraints, it is hard to see how NetBackup can be seen as an Enterprise class backup system.
NetBackup started as a backup system ideally suited to backing up a handful of related systems. It has then been expanded and pushed into the Enterprise arena without changing many of its fundamental design decisions. It is debatable whether it is as yet truely an Enterprise class tool.

TSM Gripes

Copypool Orphans
When moving data from one storage heirarchy to another with differing copypools, you may end up with "orphaned" objects in the old copypool. These should expire through normal TSM expiration, depending on the type of files.
Collocation node groups
Rumored to be in a future release. Will allow grouping of nodes for collocation purposes, saving media and media mounts.
audit db time
Depending on TSM server size, an repair database operation can take a very long time to analyse and repair database damage.
Aborting migration/backup stg/reclamation during large file copy
TSM will normally only abort a job at the completion of the file in progress. This may take a long time for large files.
Migration processes run per node
A large node with data in disk storage pool can only have one migration process. That is, migration processes will migrate data on a node by node basis.
Tape Error History
If using scratch volumes, a volumes read/write error count is lost when the tape returns to scratch status. This is less of an issue if the hardware maintains this information (eg. 3590).

NetBackup Gripes

Failed client install overwrites client on server
I've seen this happen twice, once with 3.4, once with 4.5. The first we believe was due to rsh/rcp failure, which resulted in a Solaris client being installed on the AIX server. Oops.
Undocumented return codes
We seem to get these fairly regularly. Support says they are not possible.
"hung" or slow restores
Idle tape drives, required tapes not busy, but restore doesn't start. No idea why.
Unbalanced vaulting processes
Images appear to be split between vaulting processes with no regard given to size. One process may complete after only 1 hour, leaving the other running for 20 hours.
Multi-volume catalog backups can't be restored
This is supposedly fixed in V5. This means that master servers do not scale well at all.
LTO drives must be defined as DLT
Small issue, but confusing to new admins.
Media/Drive failures
H/W or media problems can abort client backups, and if multiplexing, multiple client backups.

All opinions expressed herein are mine and not those of my employer.

$Id: tsm-netbackup.html,v 1.1 2004/10/09 10:18:13 stix Exp $