Many companies are faced with challenging decisions regarding the need for continuous accessibility to their business systems. This document will discuss the concepts and approaches related to High Availability (HA) and Disaster Recovery (DR) specifically aligned to Infor XA on IBM servers running the IBM i operating system.
The Differences between HA and DR
While there are many discussions regarding the specifics of HA and DR, the simplest way to differentiate them is that HA is a set of technology that minimizes planned and unplanned system downtime. DR goes beyond HA, and beyond IT, to consider the steps needed to predictably continue and/or resume operations after a major disruption. In general, both approaches are accomplished via redundant systems at remote location(s).
Do you need an HA or DR Strategy?
In the IBM and XA world, the most fundamental approach to protecting your data is a daily backup, run to tape, with off-site tape storage. While this approach has been common for many years, consideration should be given to 2 key topics, to determine whether current practices are sufficient.
Recovery Point Objective (RPO)
This is the amount of data that you can tolerate losing, if an event occurs that affects your system. For instance, for a system with a tape backup that runs at midnight, the worst-case scenario being the system having a major failure at 11:59 PM the following night. In this case, 23 hours and 59 minutes of data and transactions would be lost. RPO discussions focus on how much data you can tolerate losing.
While traditional low volume, paper-based processes such as purchasing and invoicing might be tolerable of recreating those events; high volume events such as warehouse management, inventory receipts, and outbound orders represent many small transactions that are difficult to recreate. Even after only a day of lost transactions. Imagine trying to redo all the inventory transactions that occur throughout one day in a high-volume shipping operation.
A high availability system can reduce the recovery point to zero, through real-time replication from one system to the other.
Recovery Time Objective (RTO)
This is the amount of time you can tolerate being without a computing system. In our hypothetical case above, if the system has a catastrophic failure at 11:59 PM, you have not only lost a day’s worth of data, but also now need to pursue replacement hardware. If a redundant system is not already on-site and accessible, it is very unlikely that a replacement system could be in place in less than 2 days. And when considering all logistics related to shipping, setup, software installation, and restore procedures; 3-7 days of downtime can be a very likely scenario.
Given the inherent redundancy of an HA system, recovery times can be as little as a few minutes to a few hours, depending on the types of redundancy in place.
After understanding RPO and RTO, you can now make decisions regarding what your company can tolerate in the event of an outage. And then determine what steps are needed to address your RPO and RTO requirements.
In addition to the considerations above, a key aspect to consider is planned downtime events, and the ability of your business to tolerate downtime events for regular backups, system upgrades, etc. This will be the starting point of our discussion in Part 2 of this series, as we discuss replication and cost.
Questions? Would you like to talk to someone about your IBM and Infor environment?