ITS 4350 - Disaster Recovery


Chapter 10, Disaster Recovery: Operation and Maintenance

Objectives:

This lesson is about chapter 10. Objectives important to this lesson:

  1. Key challenges
  2. Preparing for DR
  3. Response phase
  4. Recovery phase
  5. Resumption phase
  6. Restoration phase
Concepts:
Chapter 10

small red alertThis chapter begins with our text's illustrative company having a fire. No details are available, but a report is made by a person who just arrived at the site and found out that a disaster has happened. I am reminded of a standard news teaser. In this case it might be "Breaking news! A fire has shut us down! Film at 11!"

Yes there is a time and a place for acting like Paul Revere, but that time passes quickly. Soon, you need more information to do anything worth doing. Our hero reports to a higher authority who acts on unconfirmed information to activate two plans. I hope they are flexible, and that they start with gathering confirmed information.

The text moves on to discuss a different scenario. Sometime the disaster that affects our organization is a bigger disaster that affects everyone near our location. When this happens, we have to take in the big picture. Our disaster plan may assume that standard services in our community (e.g. utilities, transportation services, communication services, sanitation services, standard vendor services) are all available outside our organization. This may not be so in the cases of power outages, huge storms, or worse disasters that affect large areas.

On page 413, the text discuses several locations in the United States that need to prepare for disasters that typically do not happen in the other named locations. The list is not exhaustive. For example, if you have followed the weather news for the last few years, you may have noticed that bad storms are more frequent than they have been in the past. Hurricanes and floods can happen in almost any coastal area. Snow storms, ice storms, and wind storms (or all three at once) can happen quickly and dangerously. An emergency consists of a problem that you are not ready for. Our goal must be to plan ahead, and to be ready for whatever we can imagine.

The text has talked about making plans in several chapters, so we will assume that subject matter experts have been consulted and plans have been created. The text moves on to discuss three distributions for our plans:

  • office/work location - A paper copy of each plan, plus electronic copies on critical systems (computers or networks). Keeping copies on portable devices would also protect accessibility.
  • out of the office - Responsible staff should have copies, paper and electronic, at their homes.
  • online - Depending on the nature of the disaster, electronic files may be accessible on remote systems, which may be web sites that are not hosted at the work location, or storage sites hosted by outside providers.

The text continues with a list of trigger events that can lead to implementation of a plan:

  • management decision - Management may notify all staff or key staff that an event has occurred (or will occur soon), and that we are beginning to run a named/numbered plan.
  • employee notification - An employee may notify management that an event is in progress, which will cause a plan to be put into use. Management decisions are typically required, but the employee notification is the trigger, and a responsible employee may have to begin the plan implementation if the usual authority is not available.
  • emergency management - A state or federal agency may declare that an emergency exists, which may trigger a related plan for our organization.
  • local emergency - As in the example in the text, a fire or other local disaster my affect our organization, causing us to activate an emergency plan.
  • media (news) outlet - A responsible news entity may announce that an emergency, a disaster, or an act of terrorism has occurred. If the event involves our organization, this should also trigger the use of an emergency plan.

The text continues with a useful list. It gives us a working outline that includes principles and tasks that should be part of a disaster plan:

  • Maslow's HierarchyLet's reorder this item. It should say to eliminate or reduce these risks:
      • loss of human life
      • potential for injuries
      • damage to facilities
      • loss of assets and records
        Preserve life, health, safety, and belongings in that order. That matches the image on the right pretty well. It is often called Maslow's Pyramid or Maslow's Hierarchy of Needs.
    • Minimize disruption.
    • Minimize loss.
    • Plan to resume limited operations, then normal operations.
    • Reduce exposure to liability.
  • Invoke the powers that are granted in the plan to those managing the disaster. Minimize the effects and plan to recover as soon as possible.

The next section of the text repeats what has been said before about the material above, and about planning for the inevitable disasters. Let's spend a couple of minutes on Abraham Maslow instead. The book should mention him, and it does not. It gives us a perspective on the hierarchy established above.

As part of the discussion of what to plan for, the author gives us a list of teams that might be needed during the disaster. The list gives us something to think about, regarding what they will do for us and how they will make the situation better.

  • Disaster management team
  • Communications team
  • Computer hardware recovery team
  • Systems recovery team
  • Network recovery team
  • Storage recovery team
  • Applications recovery team
  • Data management team
  • Vendor contact team
  • Damage assessment and salvage team
  • Business interface team
  • Logistics team

The only one that seems to need an explanation is the Business Interface team. They are the interface between the IT department and the rest of the organization. Some of what they do might be included in other teams, depending on how you set up the teams.

The chapter concludes with a series of phases that follow the trigger event.

  • Response phase - The people dealing with the disaster contain it and protect resources according to the plan's hierarchy, which will probably match the one explained above.
  • Recovery phase - The things that keep us in business are recovered first, as addressed in your business continuity plan, or your disaster recovery plan if there is no BCP.
  • Resumption phase - Having dealt with the BCP, the other items in the business impact plan are addressed by recovering them in an order dictated by their dependencies.
  • Restoration phase - This phase has more to do with restoration of the business location. Restore, rebuilt, or relocate? It depends greatly on what went wrong and how badly the original site was damaged.

Assignments

The assignment for this week has not yet been determined.