Business Continuity Management (BCM) is not just about plans and documents; it is also about developing and maturing a capability to recover the organisation whenever required. A key component in this capability is ensuring the necessary infrastructure and facilities are available to meet the organisation's requirements at the time of a disaster, incident, or disruption and ensuring that this recovery environment can operate as required.
The Continuity Requirements Analysis (CRA; sometimes included within the BIA) will highlight the essential items needed to continue the most critical activities and will estimate how quickly these items need to be provided. To validate these requirements in the simplest way (especially if CRA did not form part of the company's Business Continuity programme), the concerned business managers should be brought together, given a blank sheet of paper, and asked to list their most critical requirements for continuing the most important activities identified in the BIA.
Once the business has identified the systems or applications it requires, the IT people can identify the dependencies and components needed beyond the basic hardware and software requirements (e.g., network links, data communication links, SAN or NAS storage, etc.). In the complexity of today's business environment, there are also often data dependencies between systems; therefore, the availability of one system can often depend upon one or more others. This produces the need to recover these other systems simultaneously to maintain data and transaction consistency and integrity.
For those organisations that decide to implement recovery and standby facilities, such as a business recovery site (office) for staff and an IT Disaster Recovery data centre for backup systems, IT equipment, and communication links, the solution does not stop there. Implementation is the easy part and only the "tip of the iceberg" for the work required to achieve the capability of recovery.
The most challenging part of implementing recovery facilities is the need for maintenance and upkeep. Once implemented, any changes to or within the organisation that could affect the usefulness of these recovery facilities should be assessed, and where necessary, the recovery environment should be amended to include these changes, thereby ensuring the recovery environment is ready for use at all times. Here is a list of some of the most likely things that could change in your organisation and require changes in your recovery environment:
PCs: Quantity, performance and specifications, operating system, anti-virus software, personal firewalls, security patches and associated updates.
PC Software: The software installed on the desktop (including its version and any updates), any files stored locally, browser settings (such as favourites), drive letter mappings, templates and any embedded code (such as Excel macros).
Printers: The make/model, printer drivers, and spare toner cartridges.
Multi-functional devices (MFDs): The list of users, PIN numbers, and email addresses (to email scanned documents).
Telephones & fax machines: Pre-programmed numbers, short dial codes, and direct-dial numbers (especially for fax numbers given to clients and third parties).
Stationery: Basic supplies of office stationery are always required, as well as stocks of pre-printed forms and headed stationery (letterheads, compliment slips, etc.).
IT DR data centre
Servers: Configuration, specifications, operating system, anti-virus updates, database versions, security patches and updates/big fixes for all software.
Communication links: Addition of new communication links or changes in bandwidth of existing links.
Network storage: Configuration, storage space, and specifications.
System priority: New applications, introduction of new functionality, or launch of new products or services could change the importance of a system and therefore how quickly it needs to be recovered.
Data or system dependencies: Changes to links or dependencies between systems may alter the priority, sequence, or requirement for restoration and may increase the need to validate data integrity and consistency.
There may be many other potential changes that could affect an organisation's ability to recover effectively. The only way to gain reassurance is to enforce a tight change-control regime to ensure any changes within the organisation are assessed for their impact on the recovery and to include these changes in the recovery system at the same time as they are implemented at the main site.
The cost of these changes, updates, or upgrades at the recovery location(s) should be included in the original cost for the changes to the main location, rather than as a separate justification afterwards.
Here are some examples of instances that resulted in recovery environments not being properly available for use when required:
Changes to the recovery site(s) are not made as part of the changes to the main site but are grouped together and then performed at a more convenient time in the future. Therefore, the recovery environment is out of date for a while (and may not work fully) until these updates are completed.
Staff are allowed to store files and data locally on their PCs, and either IT or the staff do not backup the local information and store it safely offsite. Often, critical information ends up being stored locally and is not backed up, causing recovery problems for the organisation.
The recovery site is used as an expansion space for the normal office or as space for "special" projects, resulting in desks, filing space, and PCs being used and not being configured for recovery usage. This usage results in recovery delays for the organisation while the space is converted back into a recovery space at the time of an incident.
Equipment at the recovery site is used as "spare" equipment to replace faulty or damaged equipment at the main office/site, resulting in missing equipment at the recovery site and delays in recovery while equipment is located or purchased. Often, replacements at the recovery site take a significant amount of time!
Organisations rely on being able to purchase equipment at the time of a disaster or incident. However, at the time of an incident, many other organisations will be doing the same thing. As a result, demand will be high, while stocks and supply will be low. Thus, the organisation may have to do without all or some of the equipment it needs to continue doing business or to recover its essential activities.
Centralised IT backups for different systems may be performed at different times (e.g., because a backup drive is shared across systems or servers). This can result in data inconsistency across systems when these backups are restored during recovery, resulting in incomplete information or transactions and confusing or delaying the organisation's recovery activities.