Developing a Disaster Recovery Planning Checklist
When disaster strikes, businesses stand to permanently lose assets and information. However, businesses that maintain a robust Disaster Recovery Plan can limit the resulting loss and damage. If this seems obvious, you might think that every business maintains a disaster recovery plan. That would be a faulty assumption.
Many businesses operate without a robust disaster recovery plan and some have no disaster recovery plan in place. Reasons for not maintaining a disaster recovery plan range from not believing the risk is high enough to warrant the expense, to not understanding how to develop a disaster recovery plan.
In this post, we will not attempt to persuade you about the potential consequences of not having a disaster recovery plan in place. You can read about that in a CIS blog post Disaster Recovery and Business Continuity. Instead, we will provide you with a stepwise guide, which will help you understand how to develop and implement an economic and robust disaster recovery plan.
NOTE: Diverse companies, across a wide range of industries and operating scenarios, use similar procedures for developing disaster recovery plans. In this post, however, we focus only on data protection and recovery of IT infrastructure and information assets, as they apply to any business, regardless of industry.
1. Perform Risk Assessment
Consistent with NIST/FISMA guidelines for the protection of information assets, we recommend conducting a risk assessment as a means of identifying risks, controls, and mitigations. Effectiveness of a Disaster Recovery Plan depends on a robust risk assessment.
In the first step of a risk assessment, we identify hazards that, if not contained, have the potential to cause harm. Additionally, we define the potential consequences of the uncontrolled release of the hazard. Next, we identify controls in place to contain the hazards. Finally, we identify threats that jeopardize the effectiveness of the controls.
Below we provide some considerations for risk assessment applied specifically to averting potential data and information loss due to a disaster, and to assist the development of a data recovery plan.
Identify Hazards – We can group hazards into two main categories, including naturally occurring hazards and man-made hazards. Naturally occurring hazards include damaging events such as hurricanes, tornados, or floods. Man-made hazards can include events such as fire, or a ransomware attack. Exposure to any of these hazards can result in a disastrous loss of data and information.
Define Controls – We attempt to contain hazards by putting controls in place that will limit exposure sufficiently to eliminate damage. Controls for limiting exposure to natural hazards may include offsite data backup to a location deemed safe, or isolated from the hazard. Controls for limiting exposure to man-made hazards may include cybersecurity controls firewalls, AV/AM scanning, employee training, etc.
Identify Threats to Controls – Rarely should we consider controls 100% effective, because threats can jeopardize the effectiveness of controls. For example, hackers finding ways around firewalls represents a threat to firewalls as control of unwanted entry to a computer network.
Assess Risk – We can rate risk based on a combination of the severity of consequences and probability of occurrence. Typical risk severity ranking categories include acceptable, tolerable, undesirable, and intolerable. Similarly, a typical probability of occurrence categories includes improbable, possible, and probable. Assessment of risk in this manner facilitates decision making on the level of protection to deploy.
The number of probability and severity categories and the nomenclature used is arbitrary. In this simple example, a company may consider a situation (i.e. threat and control) with a risk ranking of 1 as needing no further protection employed. Conversely, a company may consider situations with a risk ranking of 4 as unacceptable. Situations with a risk ranking of 2 o 3 may be deployed conditionally, with additional controls and/or mitigations.
2. Refine Controls and Define Mitigations for Moderate to High Risks
As follow up to risk assessment of the current data management system, we need to determine economic solutions to reduce unacceptable or high risks to tolerable levels. We can reduce risk levels for specific threats in two ways. By refining (i.e. adding or changing) controls, we can reduce the probability that a threat breaks through established barriers and causes harm. We can also implement mitigating solutions to reduce consequences when a natural hazard strikes or a threat breaks through an established barrier.
Add or Modify Controls – We can reduce the probability of a threat breaking through a barrier (i.e. control) by improving the effectiveness of the control or implement additional controls. For example, upgrading system entry requirements from single-factor authentication to multi-factor authentication. Similarly, we could upgrade anti-virus and anti-malware software to protect against a wider range of threats and increase the level of data protection.
Identify Mitigations – Mitigation, simply defined, refers to a method for stopping the escalation of damage upon release of a hazard. The potential always exists for the release of a hazard even with controls in place. In other words, either we cannot reduce the potential of occurrence to a negligible level for some risks. This situation may arise due to the expense of implementing adequate controls, or because no perfect control solution exists. Both of these limitations exist for data backup and recovery and data protection.
Accordingly, we need to implement mitigations for risks that carry significant potential negative consequences, even with refined controls in place. A Disaster Recovery Plan serves as a primary mitigating solution to limit damage when natural or man-made hazards strike and controls fail.
3. Develop a Disaster Recovery Plan
As detailed above, a robust risk assessment enables the identification of relevant risks, effective controls, and mitigations. We can consider a Disaster Recovery Plan as a set of mitigations.
Below we provide a list of some key elements of a Disaster Recovery Plan that will enable your business to stop the escalation of damage due to loss of access to critical data and information.
Develop a Response Strategy – When disaster strikes, we need a quick response, one designed to eliminate the threats to the breached barriers. Without a preplanned strategy, we cannot expect quick problem resolution and containment of the hazard. Some quality response strategies include the definition of RTO (Recovery Time Objective) and RPO (Recovery Point Objective). RTO refers to the amount of time needed to recover all applications and data. RPO refers to the amount of data loss you stand to lose during disaster recovery operations. The specific definition of these terms helps to determine the value associated with different Disaster Recovery systems considered for deployment. Your Disaster Response strategy also should consider the real cost of RPO, including lost sales, etc.
Define Incident Response Team – In order to respond quickly to a disaster, a business needs to have personnel pre-trained and prepared to conduct specific disaster recovery roles.
Define Data Backup Requirements –Value assessment of a Data Recovery Plan includes the determination of the level of data backup needed. Data storage comes at a cost, therefore you should consider the cost of each type of data and compare this to the cost of data protection. You may find that some data sets do not significantly affect your business. In this case, you can decide to forgo more expensive data storage options for less valuable data.
You also should consider where to locate the data infrastructure. Offsite storage provides some protection against threats to the local computing environment. You need to balance the benefits of offsite storage against the need for isolation of mission-critical data from external threats.
Develop Plan for Notification to Affected Parties – When data is compromised, your company needs to notify all affected parties, including suppliers, customers, relevant government agencies, internal communications, etc. With a well thought out communication plan in place, you can reduce the negative impact of the disaster and streamline post data recovery operations.
Define Interim Operating Procedures during Disaster Recovery Operations – Disaster response will almost certainly affect your normal business workflow. Thinking through specific details of your data access that supports critical workflow allows you to formulate means of continuing operations to the fullest extent possible during data recovery operations. You can support this goal by having specific interim operating procedures prepared in advance.
Define Details for Restoring Data to Primary Operating System – Understanding specific operational details for data recovery will help you expedite a return to a fully functional primary operating system.
Develop Strategy for Post Data Recovery Operations – After you have restored access to your data and affected applications, you should not expect an immediate return to 100% operating efficiency. You need to consider the near term activities required to get your business back on track to normal operations and profitability. This includes testing and monitoring the system for signs of lingering problems, such as ensuring a bug-free primary operating system, etc.
CAUTION: Please consider the ‘Disaster Recovery Planning Checklist’ shown above as a checklist from a general perspective only. Every business has a unique business risk profile, which affects the selection of appropriate and economically efficient elements of disaster recovery. Let a seasoned IT MSP help you define the best-detailed solution for your company.
4. Maintain and Test Disaster Recovery Plan
Regardless of the level of effort put into developing a robust Disaster Recovery Plan, you need to train all personnel involved and test the plan through conducting drills. Additionally, you need to ensure preparedness for executing the plan by monitoring specific elements of the Disaster Recovery system on a routine basis.
Filing In the Details
You may have noticed that we presented the Disaster Recovery Planning Checklist in a somewhat generic fashion. This is necessary because of the wide variability of different data and information storage and maintenance solutions. Accordingly, the details within each element of a Disaster Recovery Plan will vary to suit the needs of the client and the client’s computing system architecture and related support systems.
CIS has significant experience helping clients develop robust disaster recovery plans, and can provide the necessary expertise to help you make adjustments identified in your risk assessment in a cost-efficient manner. CONTACT US today to learn more.