|
|
ITS 4350 - Disaster Recovery
Chapter 1, An Overview of Information Security and Risk
Management
Objectives:
This chapter presents an overview IT security concepts. and
how they relate to plans for handling incidents and disasters.
Objectives important to this lesson:
- Key concepts
- Risk management
- Contingency planning
- Relating security policy to contingency planning
Concepts:
Chapter 1
Our text defines contingency
planning as being the process that makes us prepared for
incidents and disasters related to our organization's IT assets.
We are given a few examples of historic incidents and a statistic
that makes a good point. The author tells us that "80%
of businesses affected by a disaster either never
reopen or close
within 18 months of the event". These are organizations that
either had no disaster plans, or they had plans that were
inadequate for the disasters they encountered.
Having made this point, the author spends several pages
discussing terms that you will need to know. Many should be
familiar to you.
- Information Security - protection of
information and the systems that collect, store, disperse, and
use it
The next set of terms is one you see in a lot of texts about
security:
- Confidentiality - information should only be
accessible to users who have been granted access to it for valid
reasons. Only authorized users can access data if it is
protected properly, and if authorized users do not violate
security policy.
- Integrity - data may not be changed except by
authorized users or processes. This means that data must be
protected from alteration, deletion, or other changes to its
intended form.
- Availability - authorized users can access
data when they need to do so. Availability includes the idea
that proper access methods are provided to only to authorized
users, not to everyone.
The text confuses the issue with a graphic on page 3 that the
author does not explain. The classic CIA
concept defines security from the point of view of the IT
Security
staff. The text should explain that an expansion of this
concept is called by several names, one being the McCumber
Cube, another being the CNSS Security model.
This is the name used in the text. It provides three
different perspectives on security, which should be
considered together to make better security decisions:
- IT Security perspective:
Confidentiality, Integrity, Availability
How do we protect the information, make sure it is not tampered
with, and provide access to those who need it?
- IT Operations perspective:
Storage, Processing, Transmission
How do we perform the basic IT functions of storing, processing,
and transmitting data? Are our processes secure?
- Business perspective:
Policy, Education, Technology
How do we make the rules
for employees about protecting information, educate
our staff about protecting it, and use the technology
we have to run our business safely?
This
link will take you to a Google search for images that
represent these concepts.
The author continues with more terms:
- Threat - a potential form of loss or
damage; many threats are only potential threats, but we
plan for them because they might happen
- Threat agent - a vector for the threat,
a way for the threat to occur; could be a person, an event, or a
program running an attack
- Vulnerability - a weak spot where an
attack is more likely to succeed
- Exploit - a method of attack
- Control - A process
that we put in place to reduce the impact and/or
probability of a risk. The author mentions that a control
can also be called a safeguard
or a countermeasure.
On page 5, the text presents a list of threat types ranked from 1
to 12 in two different surveys. The rankings changed a bit, but
the list is given an aura of accuracy by showing us the same
categories in both survey years. Don't hope too much that these
are the only threat categories that matter. The surveys were done
by the same people, so they used the same categories. The author
discusses each of the twelve categories to give you a feeling for
what they are. Browse through any that are unfamiliar to you.
On page 13, the author changes topics, and discusses risk.
As some of you know, we have classes just about risk management,
so this section is not all there is to know about it. However, the
graphic on page 13 gives us a nice overview of a workable process
for managing risk.
- In the first phase,
we identify risks, by inventorying and classifying
all our assets, and
then identifying the threats that apply to those
assets, and the vulnerabilities
those threats could use against us.
- The second phase
takes us to the selection
of appropriate controls,
and justification of
their cost and value to decision makers in our organization.

The chapter continues with an expansion on each of the topics in
the graphic.
- Know something about the big picture - Who and what are we
protecting, from whom and what. To know details about these
subjects, do everything on the green chart.
- Identify, classify, and assign values to assets - To know our
exposure to risk, we have to know what we have and how it is
exposed. Assets can be classified by their level of secrecy,
their value to the organization, their need to be protected, or
by combinations of these factors as well as others.
In the chart on page 28, we see five information assets, each on
a separate line. Each is given a rating (from 0 to 1) on each of
three measures of how a compromise of that asset would affect
the company. The three measures in this case are impact on
revenue, impact on profitability, and impact on image. Assuming
these are the most important impacts our organization cares
about, each is given a relative percentage score. In the
example, the organization cares 30% about revenue, 40% about
profitability, and 30% about image. That is the criterion
weight. For each asset, its score for a given criterion is
multiplied by that criterion's weight, producing three weighted
criterion scores for each asset. The asset's total weighted
score is the sum of its three weighted criterion
scores. For instance, the first asset has a score of .8
for revenue (weighted criterion score is .8 times 30 = 24), .9
for profitability (weighted criterion score is .9 times 40 =
36), and .5 for image (weighted criterion score is .5 times 30 =
15), so its total weighted score for the comparison is 75.
Compare that score to the other lines, and you see that this
asset is the third most important asset in this comparison.
Warning: do not compare
scores from one table to another unless they use the same
criteria and the same weights.
The text provides another example of rating assets, based on a
military scale that uses four levels for secrecy. A scale like
this may be more useful for assets that do not have a particular
effect on the organization unless they are compromised.
- Threats must be identified, and matched with assets affected
by them. Not all threats will affect all assets.
- Assets must be examined again, with respect to the threats
that could affect them. How vulnerable is each asset to each of
its possible threats? This evaluation gets us ready to do the
big one in a few pages.
Assuming
you have followed the steps so far, there is an important
calculation to do.
- Each asset
needs to be given a value,
based on its replacement cost,
its current value
to the organization, or the value of the income it
generates. Pick one of those or some other value you care about.
This is the Asset
Value. Let's choose $100 as an example for Asset Value.
- Next, we need to determine, for each exploit, what the probable loss would
be if that exploit occurs successfully. Would we lose the entire asset? Half of
it? Some other percentage?
Which percentage we pick tells us the Exposure
Factor of a single occurrence of that
exploit for this asset. Let's choose 50% as
an example for Exposure
Factor.
- We are still not where we want to be. Asset Value times Exposure Factor equals
the Single
Loss Expectancy. This is the Impact
if the event occurs. In this example, it is $50.
- Now, we still need the Likelihood
the event will occur. The classic way to do this is to
consult your staff about the frequency of successful attacks of
this type, or to consult figures from vendors like Symantec, McAfee,
or Sophos about
expected
attack rates for your industry or environment. Let's assume we
have done that, and we are confident that we expect 10
successful attacks per year in our example. This is the Annualized Rate of Occurrence.
- Taking the numbers we have so far, we should multiply the Annualized Rate of Occurrence times
the Single
Loss Expectancy, which will give us the Annualized Loss Expectancy for
this asset from this kind of attack. This corresponds to the Risk Exposure.
In the example we are considering, that amounts to $500.
All
that
work led us to just one loss expectancy for one asset from one
kind of attack. That gives you an idea of the work involved in
calculating the numbers for each asset, each asset vulnerability,
and each kind of attack on those vulnerabilities.
The next idea is to identify controls that can reduce or
eliminate our risk. The text mentions five control strategies that
are often considered. The terms are a little different from some
other texts:
- Defense - also called
Avoidance, this means to use policies, training, and technology
to avoid the situations
that can be exploited.
- Transferal - this
means to hire expertise
when you do not have it, or to pay a fee to another department
or organization that is in the business of managing risk
- Mitigation - this
means to reduce the damage
that will be done in a successful attack, such as not putting
all assets of a given type in the same place, protected by the
same defenses.
- Acceptance - this is
when you decide that a risk is not as costly to us as the
controls that might be used to avoid or mitigate that risk.
- Termination - this
means that we decide to stop
doing the things that put us at risk; we simply stop dong the
things that use or produce the assets that a risk applies to.
So what do we do if we know that there are risks, and that we can't
protect ourselves from all of them? The text introduces four topics
from the next several chapters.
Business impact analysis
- This process is used to determine the effect that successful
attacks would have on our organization. We determine what could
happen, what the effects of that event would be, and what state
the organization's functions would be in at that time.
Incident response plan -
For known incident types, given the BIA done in the section above,
what should we do to handle the incident? Who do we call? How do
we stop the attack and it effects? This plan is about handling the
incident.
The text has the next two topics out of order. Let's fix that.
Business continuity plan
- How do we continue business when we have an incident? Do we
change our procedures? Do we use alternate locations or resources
to continue business? How do we continue providing products or
services when part of our organization has been damaged,
compromised, disabled, or destroyed? Business continuity plans
discuss keeping the business in business during the incident.
Disaster recovery plan -
A disaster has occurred. How do we get back to normal, or what
will be the new normal? The incident(s) has/have been handled.
What do we do to return to our undamaged state, stronger and wiser
than we were before?
All four of the major topics above are part of Contingency
Planning, what we do when we know things can go wrong.
The level of detail in each of the plans will be determined by the
size and complexity of the organization making that plan. The text
presents more plans identified by the NIST that would apply to
organizations like federal agencies.
Earlier in the chapter there are several pages on Information
Security Policy. This section introduces the components you
might find in a very detailed policy. It begins with some
definitions:
A policy is a rule, or a set of rules,
that affects how we want our organization and its employees
to function. The idea behind a policy may start with a principle,
which is often a broad, general statement of what we believe to be
right, true, or beneficial. A policy is
more detailed, and more specific about what we expect our people
to do. Related concepts:
- Principle - a general statement about what we believe
or require in our area of authority (we will use only two
computer vendors at a time); what we expect
- Policy - rules about the conduct of our organization
with regard to particular actions (we will limit ourselves to
particular models chosen by the IT department); how we will
approach the expectation
- Standard - a method or process that may be procedural
or technical (orders are to be placed by approved requesters
within each work area); what steps are to be followed to assure
general compliance with policy
- Procedure - a detailed set of steps to follow to be in
compliance (requests are to be made to your manager, who will
forward approved requests to your authorized requester);
variations or limitations that apply to specific work areas, to
be followed if they apply to your area
- Guideline - a suggested addition to any of the items
above that is recommended but optional (submit your requests two
weeks before the end of a quarter to allow processing time); do
this to make it work better
On pages 13 through 18, we see a very detailed outline of the parts
of a policy.
- Statement of the policy - what it is, where it applies, and
who has to do what
- Authorized access - who is and is not allowed to use equipment
or software related to the policy, and what is private about any
related data
- Prohibited use - a graduated scale of offenses and discipline
to be applied for violations of various types
- System management - who runs it, who watches it, how it is to
be protected, secured, and/or encrypted
- Violations of policy - a graduated scale of offenses and
discipline to be applied for violations of various types of the
policy itself
- Policy review and modification - how often the review will
take place, who will do it, and the process to change or remove
the policy
- Limitations of liability - standard lawyer section
|
|
|
Chapter 2, Planning for Organizational Readiness
Objectives:
This chapter is the first chapter about contingency
planning. Objectives important to this lesson:
- Support of management
- Forming a planning committee
- Business impact analysis
- Collecting data for a BIA
- Budgeting contingency operations
Concepts:
Chapter 2
If you browse the scenario that opens chapter 2, you will see
several people role playing a contingency. They are doing it in
two parts. One is to run through a plan that is already in their
operations manual. The other part is to involve the people in the
room, to develop questions about the planned response, and to
determine whether they are doing what should be done. In this
case, they have a plan for a contingency, they are walking through
the plan, and they are learning about the merits of the plan. They
should not just be reading their assigned parts. They should be
thinking about the reality of doing what they are assigned and
proposing amendments to improve the plan.
The text talks about forming a body that will be responsible for
creating contingency plans. That body has a number of duties, some
of which should be done before it starts;
- Obtaining commitment from senior management - There
needs to be a commitment to empower the committee to even form,
to do its job, to examine its output, to revise output as
needed, and to require compliance from all employees and
departments to accept and follow the plans. And to allow for
improvisation as needed; things never go exactly as planned.
- Managing the contingency planning process -
Assign members and other staff to gather information and
assemble procedures.
- Writing the master document - There needs to be a
place to start, and once it starts, there may be area/division
specific procedures that will be written by subject matter
specialists.
- Conducting the business impact analysis - document threats,
vulnerabilities, and attacks, and relating them to documented
business functions.
- Develop teams to create manuals for incident response,
business continuity, disaster recovery, and crisis management.
The text offers suggestions about the kinds of knowledge that
will be necessary to create the contingency plans. Note the
section about representatives from other business units, which
include the company's actual business, IT management, and IT
security management. Remember that these are the three axes of the
CNSS security model. The text tells us that there also needs to be
commitment from management in these areas to create useful plans
that will be available when they are needed.
In the scenario at the beginning of the chapter, we are left
wondering whether the manuals being used by people from different
work areas were identical, or different in some ways. There should
be specific pages for some staff, depending mostly on the
specialization or security level of their jobs. However, there
should also be a master copy with all the sections in case someone
needs to step in to do another person's job.
The text delivers a lot of details about a lot of details for
several pages. Moving on to more focused material, let's continue
on page 57 where we see five "Keys to BIA Success":
- Set the scope to cover the necessary work areas of the
organization for each risk to be addressed.
- Get information from experts in your organization about the
impact an exploit will have.
- Keep the information factual where possible, to avoid opinions
that may be mistaken.
- Determine the areas you need to report on before gathering the
data. This will save time, and be of more interest to approvers.
- Get approval of your BIA and your risk assessment. Without it,
your process will stop there.
The first of three phases of a business impact assessment begins
on page 58 with making a list of the critical
processes/functions of the organization you are studying. In
the first column of the example chart on page 59, you see seven
functions performed by a company. (The text notes that this chart
of seven functions is an example, and that a real chart for a real
company would be much longer.)
Function |
Profitability:40% |
Strategic Value:30% |
Internal Ops:20% |
Public Image:10% |
Total Weighted Score (100%) |
New business |
8 |
8 |
3 |
6 |
6.8 |
Maintain old business |
8 |
7 |
6 |
7 |
7.2 |
ISP service |
10 |
8 |
4 |
8 |
8 |
Internet services |
9 |
10 |
4 |
8 |
8.2 |
Help desk |
5 |
6 |
6 |
8 |
5.8 |
Advertising services |
6 |
9 |
4 |
9 |
6.8 |
Public relations |
4 |
6 |
2 |
10 |
4.8 |
In the columns to the right of each function name, four kinds of
impact are considered. Each kind of impact has been given a
percentage rating, reflecting how much the company cares
about it. Note that the sum of the percentages is 100%. If
it were not, we would have to assume we are not measuring the
impacts correctly, or we are measuring the wrong impacts.
Each function is rated on a scale, probably from 1 to 10, on how
much its loss would affect each kind of impact. Note that the columns
and the rows do not add up to a specific value.
There is no presumption that they should. To get the weighted
score for a function, its raw score for each column is multiplied
times the percentage for that column, and each of those
weighted scores is added together. The weighted score for
each function is a measure of its criticality to the
company's ongoing business.
In the chart above, I have marked the three functions that have
the highest weighted scores. They are the ones we need to protect
the most. Preparing a chart of this sort, or a series of them,
leads to our knowing which business functions should get our
attention first and foremost.
Page 60 brings up a related idea. Some functions will have a
different criticality when we consider which to restore from a
damaged or disabled state first. Some functions rely on other
functions to make them possible. In cases like this, it is
necessary to prioritize the recovery of any service that a
critical service depends on. With that in mind, it makes sense to
consider the downtime metrics discussed in text. (This is
considered further in the table on page 60.)
- Maximum Tolerable Downtime
(MTD) - In the text, we
see the example of a system that can only be down as much as 4
hours in a month. A more specific
tolerance would state that the four hours need to be scheduled
as one hour each week, on a day or shift during with the system
is not needed. There is
a difference between that standard, and four hours all at once,
and four hours spread evenly across a month. This measure of
time refers to normal operations.
- Recovery Time Objective
(RTO) - This time is
similar to the measure above, but it refers to the time a system
can be allowed to be down during a recovery. It assumes that a
disaster has occurred, that emergency procedures are in effect,
and that we intend to restore this system to normal operation.
Shorter times are assigned to this measure for systems that are
critical to our operations. The more critical a system is, the
less time we can do without it.
- Recovery Point Objective
(RPO) - This one is
harder. It is a measure of how current the most recent backup
is, and how much data we can expect to have to load to that
backup to become current again. The text expresses it as a
number of work hours that will need to be captured and added to
the most recent backup once it is restored. In the discussion on
page 61, the text gives us another way to look at it. It calls
the RPO the amount of data we can live without during the
recovery process.
In
the small chart on page 61, the text shows us two curves plotted
on the same graph. The Cost to Recover is highest if we have a
constant live copy of the data, and lowest if we use an
old-fashioned tape backup system. The old system looks good until
we note that the time it takes to use it causes a much longer
disruption time, which has associated costs that go higher the
longer the disruption takes.
The simplified version of this chart, shown on the right, makes
the same point, but may be easier to see. The projected cost to
the organization in any scenario using these curves is the total
cost of the red and the blue lines at any given point.
Reasonable expectations about tolerable downtime and recovery
time lead to a compromise that the text shows as the Cost Balance
Point. If we spend more on our recovery system, we can expect less
time that the system will be down, lowering the costs that down
time creates. You need to plot those two curves for your own
organization to determine what the best choice is.
The text returns to the idea of determining the priority of each
process to the business, and each system or asset to the functions
that use or depend on it. This is discussed again on page 62. At
this point, you would think you would know a lot about the
company's assets, functions, and priorities. the text turns to
data collection. Eight data collection methods are listed on page
63, then discussed for several pages. Which one is best? The one
that gets you to the truth. Keep that in mind.
People often answer questions with either your agenda or their own
in mind. Ask without pressure, and you may get better responses.
The last major topic in the chapter is budgeting. The text lists
four operations that will require a budget. They are four familiar
areas by now.
- incident response - The text points out that this is a normal,
expected IT operation. Not every incident is a wide-spread
emergency, but they all need to be handled.
- disaster recovery - The text proposes that the largest cost of
recovery is insurance. The best recommendation we can make about
it may be to consult industry associations to determine what are
considered to be best practices.
- business continuity - This one requires you to estimate and
collect money for extra locations, employees, equipment, and
data devices, to be used during emergencies.
- crisis management - This item concerns large scale disasters
that bring huge physical losses and long term psychological
damage. It may also concern more predictable losses and expenses
to employees, such as funeral costs. A lot depends on the
benefit packages your employees may already have.
|