ITS 305 - Security Policies and Auditing

Chapter 3, Planning for Contingencies

Objectives:

This lesson discusses high level planning for the organization's information security interests. Objectives important to this lesson:

  1. Need for contingency planning
  2. Components of contingency planning
  3. Creating contingency plans
  4. Testing contingency plans
Concepts:

This chapter covers contingency planning, which the text tells us is about plans for unexpected events. This is true, but also a bit naive. We have to recognize that things will go wrong, people will attempt to attack our systems, people will attempt to rob us. These events are not so much unexpected as they are unscheduled. Contingency plans are what we have proactively decided to do when something goes wrong. We must also realize that our plans will need to be flexible, because we cannot predict what will happen. A wise saying that I have heard translated several ways warns us that no plan of battle ever survives first contact with the enemy. (We might refer to this as von Moltke's First Rule.) We must be ready to change our plans as the situation requires.

So, what do we do about some of the things we expect to go wrong?

  • a contingency can be defined as something that depends on something else, or as something that can go wrong
  • a contingency plan is the plan that is followed if an anticipated problem occurs
  • controls are proactive measures you put in place to prevent mistakes, minimize risks, and increase system availability
  • the business cannot continue operating if some necessary technology stops working; we need plans for what to do when different types of technological failures occur
  • severe failure becomes a disaster, which the text warns us we need separate plans to handle
  • contingency plans and disaster plans should both lead to the restoration of normal operations and service levels

Several different kinds of plans are made, called by different names that relate to the circumstances of the event and the scope of the plan.

  • Business Impact Analysis - The green highlight on this bullet is to show that this step should be done when times are good and we can examine our systems performing normally.
    Before you can plan for what to do, you have to figure out what is normal for your business, what can go wrong, and what can be done to minimize the impact of incidents and problems/disasters (see the bullets below).
    • What are the business's critical functions? Can we construct a prioritized list of them?
    • What are the resources (IT and other types as well) that support those functions?
    • What would be the effect of a successful attack on each resource?
    • What controls should be put in place to minimize the effects of an incident or disaster? (Controls are proactive measures to prevent or minimize threat exposure.)
  • Incident Response Planning - The red highlight on this bullet is to acknowledge that the plans made in this step are used when there is an emergency for one or more users. (Shields up, red alert? Why were the shields down?)
    The text is consistent with the ITIL guidelines that call a single occurrence of a negative event an incident. An incident response plan is a procedure that would be followed when a single instance is called in, found, or detected. For example, a user calls a help desk to report a failure of a monitor that is under warranty. (Note that this is an example of an IT incident, not an IT security incident. What further details might make this part of a security incident?) There should be a common plan to follow to repair or replace the monitor. Incident Response Plans (Procedures) may be used on a daily basis.
  • Business Continuity Planning - The orange highlight is meant to indicate that these plans are not concerned with fighting the fire, but with conducting business while the fire is being put out.
    Business continuity means keeping the business running, typically while the effects of a disaster are still being felt. If we have no power, we run generators. If we cannot run generators (or our generators fail), we go where there is power and we set up an alternate business site. Or, if the scope of the event is small (one or two users out of many) maybe we pursue incident management for those users and business continuity is not a problem.
  • Disaster Recovery Planning - The yellow highlight here is to indicate that the crisis should be over and we are cleaning up the crime scene with these plans.
    Despite the emotional response of the person in the text's parable this week, one person having a desk full of paper ruined by water damage is not a disaster. (For perspective, consider the legend about Isaac Newton, who reportedly handled a worse circumstance with more grace. Hint: your dog did not mean it, and he loves you.) A disaster requires widespread effects that must be overcome. A disaster might be most easily understood if you think of a hurricane, consequent loss of power, flooding that follows, and the rotting of your workplace along with the ruined computers and associated equipment, to say nothing of the loss of life that would have occurred as well. A disaster plan is what we do to restore the business to operational status after the disaster is over. There may be specific plans to follow for disasters under the two bullets above, but the disaster recovery plan is used after the crisis, unless this term is applied differently in your working environment.
  • Your text says multiple incidents (or multiple occurrences of a given incident) can become a disaster, or may lead us to realize that there is one, especially if there is no plan to overcome them.
  • By the way, in ITIL terms, a series of incidents may lead us to discover what ITIL calls a problem, something that is inherently wrong in a system that might affect all its users. Your book seems to call this a disaster. The organization you work for may use all three terms, or any two of them to mean different scopes of trouble. You need to know the vocabulary to use in the setting where you work, and you need to call events by the names they use.
  • Is there a condition for a blue highlight? We might pretend there can be, but it is unlikely that the IT Security staff would ever feel that safe and serene. Condition Blue? Maybe someday, but don't expect it.

For all the areas of planning above:

  • What are the contingencies we can plan for, and what plans should we select from the possible plans?
  • What are our official plans and procedures? Have appropriate staff been trained?
  • How will we implement our plans, and how will we test them? This includes a schedule for regular testing.
  • What is our schedule for revisiting and revising our plans?

The text tells us more about Business Impact Assessment, and as you might imagine, expands each threat analysis for each work unit the threat might affect. In fact, the text proposes that we produce a detailed table, like the one in Table 3-1, for each kind of threat we can imagine. It is easier to create tables like this from experience than from imagination. Luckily, we do not have to experience each kind of attack ourselves to learn about it. We can harvest information about them (or use the library of information about them) on the web sites maintained by the various security software vendors.

Creating these charts is a good procedure but it may lead to a false sense of confidence. No matter what we prepare for, there will be an attack in the future that is different from what we have seen before. Having a large number of procedures to follow should not lead us to become complacent. We should monitor our systems, and watch for unexpected behaviors. In "the old days" the IBM company used to put signs on the wall that simply said THINK. I would find such a poster more appropriate and more motivating than the numerous motivational slogans I've seen for many years.

The text proposes that the Business Impact Analysis should include estimated costs of best, worst, and most likely outcomes of threat and attacks. These estimates should be clearly labeled as estimates, not the guarantees that some might hope they are.

More vocabulary lessons under the Incident Response section:

  • threat - a potential form of loss or damage; many threats are only potential threats
  • attack - a threat that has become real event; sometimes called an exploit, but exploit actually means the method that is used
  • incident - The text says an attack is an IT security incident if it is directed at IT assets, if it may succeed, and if it threatens the assets' confidentiality, integrity, or availability. A great deal of help desk work could be considered incident response work, although much of it does not involve IT security. Not all incidents that happen to your users will be security incidents.
  • incident response - The procedure that is followed when an incident is detected or reported. The text confuses the idea by telling us that an incident response plan may have components for what to do before, during, and after an attack. First, not all incidents are attacks. Second, what you do during and after an incident will certainly be a response, but what you do before an incident should really be called something else like defensive measures or controls.

The text continues with several pages about what might be done to detect and handle security incidents. Some useful observations from it:

  • a security incident may be indicated by monitoring software, user reports, presence or execution of unknown programs or processes
  • stronger indicators are use of dormant (inactive) or machine accounts, changes in logs where such changes should not exist, hacker tools found on hard drives
  • strategies for containment of viruses and attackers: disconnect affected devices/circuits, engage firewall rules to limit or refuse traffic, disable user/device accounts as necessary, disable compromised services
  • an analysis of how the incident occurred should be done to determine how to prevent such incidents in the future
  • security incidents typically must be reported to your own organization, and may have to be reported as crimes

The text moves on to discuss Disaster Recovery Plans for a few pages:

  • disaster - The text proposes that a disaster is what you are having if damage cannot be contained or controlled, or if it will take some time to recover to normal status.
  • disasters can be classified as man-made or natural, which might be most interesting to your insurance carrier (covered or not covered)
  • flexibility - the text finally considers the idea that your best plans may not be enough, and should be flexible  to accommodate reality
  • Disaster recovery plans may or may not include procedures to follow while the disaster is occurring. As stated before, the organization you work for may call these plans by different names or put them in a more reasonable order. If the plan does not contain concurrent procedures, those will be in a Business Continuity Plan.

Business Continuity Plans are discussed next.

In the case of a disaster that makes a work site unusable, such as a fire or flood, it becomes necessary to have a plan for alternate means of continuing business. The text lists three types of off site operation plans. Here are four:

  • cold site - a basic site with office space, but without computers or other devices that you would have to supply, without established connectivity, without a data copy unless you can supply it. Sort of like having to take your laptop to a hotel that may or may not have Internet access.
  • warm site - has office space, hardware, and may have connectivity; may have a recent backup of your data, but it will have to be loaded on computers that may also have to be configured. Sort of like borrowing office space from another part of your company.
  • hot site - a functional duplicate of the site that has gone down, including office space, computers, connectivity to the Internet, telephone service, and the capacity to either load a backup of your data that is stored there, or to use a copy of your data that is already in place. Welcome to the mirror universe.
  • mobile by design - an option the text does not cover is being able to work from anywhere you have connectivity, perhaps by using a laptop, using services and drive space on the Internet, and using a software phone that runs on your laptop. Alternatively, some organizations rely on field staff using cell phones, smart phones, or tablets in place of "standard" office equipment to increase mobility and make better use of their time.

Several variables are discussed, but they are best considered as examples. You need to make plans and arrangements based on the realities of your organization.

  • Do you have multiple sites?
  • Are all sites affected by the disaster?
  • Can your staff function away from their regular location?