ART: Automated Restore Testing

ART’s weekly tests prevent restore failures and clean up wasted space.  And its consistent testing satisfies SOX / HIPAA Auditors.

 How ART  works

On each testing sweep, ART auto-discovers the client nodes on each TSM server, and:

  • Contacts the TSM server on behalf of that client. (The actual client computer is untouched.)
  • Randomly selects a file to test.
  • Tries to restore that file to itself.
  • Logs success or failure of the restore to its local database.

ART’s dashboard shows you what’s happening with the currently running sweep, and what happened in the past.

ART dashboard

Storage auditors can see proof that each client node is being tested periodically, satisfying some SOX and HIPAA requirements.  TSM admins can find the root cause of failures and prevent future problems. And storage managers usually find 15% to 40% wasted space on older TSM sites!

ART Finds Excessive use of storage (Hog Factor) by nodes that have not been backed up in weeks.

What ART finds      men inside disk<br /><br /><br /><br /><br /> dirve

ART has tested dozens of customer sites, and uncovered issues like these:

  • “Rogue” servers: A new server was installed in the production environment, but nobody told the backup team about it.  It never got registered to TSM, so all the TSM reports were blind to it.
  • Tapes not in the library: Tape volumes were removed for library maintenance.  Most were checked back in, but some were not – until ART needed one for a restore.  The admins then corrected the problem for all the missing tapes.
  • Nodes not on a schedule: ART flags nodes that have not been backed up in months.  One administrator realized he had installed TSM, done a manual backup, but had never put the node on a schedule!
  • Wasted storage: Servers decommissioned long ago, but never deleted from TSM.  TDB agents that never delete their older backups.  Duplicate backups due to user error or filesystem renaming.  ART finds all of these and more.  One customer trimmed almost 20% of total storage!
  • Broken Include/Exclude lists: if excludes or domain statements in the client’s configuration accidentally ignore an entire filespace, ART will show you that.  And user-auditing of exclude results will soon be available.
  • Restores too slow: If a restore takes more than 10 minutes, your users will start complaining.  ART flags these nodes as “Failed” if they take longer than you allow.
  • Not enough tape drives: when ART finds that there are no free drives, it does not try to restore the file, but marks it Failed.  If your users tried to restore a file, they would pre-empt your Migration and Backup Stgpool jobs.  This can help make the case for buying more tape drives.
  • … and more: The examples above are from our current customer base.  But every site is different.  ART continues to sweep the cobwebs from TSM installations!

Simple Installation.the<br /><br /><br /><br /><br /> pieces come<br /><br /><br /><br /><br /> together

ART is a pre-installed, web-based Virtual Appliance.

Your VMware ESX team can download it and start the appliances.

You simply browse to its built-in web server, tell it about your TSM servers, and say “Go!”

