|
|
ITS 3050 - Security Policies and Auditing
Managing Risk in Information Systems
Chapter 14, Mitigating Risk with a Disaster Recovery Plan
Chapter 15, Mitigating Risk with a Computer Incident Response Team Plan
Objectives:
This lesson presents the last two chapters, discussing the
last two major plans about risk mitigation. Objectives important to
this lesson:
- Disaster recovery plans
- Critical success factors
- Elements of a DRP
- How a DRP mitigates risk
- Best practices
- Computer incident response team plans
- Purpose of a CIRT plan
- Elements of a CIRT plan
- How a CIRT plan mitigates risk
- Best practices for CIRT plans
Concepts:
Chapter 14
In the last chapter, we were treated to a brief exposure to
Disaster Recovery Plans in the recovery phase of a Business Continuity
Plan. The biggest difference about a Disaster
Recovery Plan is that it is activated and used once the disaster is over. In the opinion of
our current author, the DRP may actually be part of the BCP. I think I
see his point, that we begin to transition to normal operations while
still using a BCP. However we ended the last chapter with the thought
that we would stop using the BCP when the disaster ends, and we have
just settled on the idea that the DRP starts once the disaster is over,
so the most I think we should allow is a little overlap from the BCP to
the DRP.
On page 372, the text offers five more terms that the author
associates with DRPs. However, some of them (like business interruption planning)
sound too much like BCP, and not enough like DRP.
The author continues his custom of assuming we did not read
the previous chapter by presenting a set of five critical concepts,
four of which we have seen before (in this book, blast it!). There is
one new one:
- Recovery Time Objectives
(RTO) - we are told that these
are spans of time within which we seek to recover from an outage. The
text explains that each RTO must be lower than the associated Maximum
Tolerable Outage (MTO).
The text briefly addresses reasons for having a DRP, which
should be known to us by now. If there is a disaster, the other plans
we have prepare for it, deal with it, and carry on during it. The DRP
covers returning to normal operations, which our organization needs to
do. The author complicates the issue on page 373, discussing saving
lives and ensuring business continuity, which were covered in other
chapters, in other plans. I suppose he is taking the point of view that
if we only had a DRP, it
would have to include the others plans as well. Aye, and if
my grandmother had wheels, she'd be a wagon. (Go to video time 1:20.)
When our plan book has a chapter for every type of plan, we
are most likely to use a DRP for its intended purpose, which is
returning to our pre-crisis operational state.
The text cautions us to include Critical Success Factors (CSFs) when we write a DRP. These are
things that make it possible to create a useful DRP. You could argue
that they would contribute to the success of any of our plans. These
CSFs are focused to help create a good DRP.
- management support - support to create and test the plan,
endorsement of the project and its usage, prioritization of goals for
the plan
- knowledge and authority for DRP creators - knowledge of
disaster recovery techniques and processes, knowledge of the
organization and its assets, and authority to gather information for
the plan and for the plan to make assignments to staff
- clearly stated primary concerns - Recovery Time Objectives
must be set lower than Maximum Acceptable Outage times; backups must be
available offsite for crises that disable the workplace; alternate
locations must be located, available, and classified as hot, warm, or
cold to make plans for their use; access methods must be arranged for
staff, management, and customers
- a disaster recovery budget - there must be accessible funds
for all of our plans, including this one; no plan will run very long
without a budget to pay for executing it
The text lists nine elements of a DRP on page 384. It
elaborates on them through page 394. You should glance at this material
to get an idea about each area that is not familiar to you. Try not to
get confused by the overlap with other plans. I believe that the author
is working from the point of view that the DRP may be our only contingency
plan.
- Purpose - Typically, to restore our organization to normal
operations
- Scope - You might divide your DRP into segments that each
cover a division of your organization, one or more functions, or one or
more groups of assets
- Disaster/Emergency declaration - As noted on other
documents, you need specific conditions under which to activate the
plan, persons to notify, and actions to take
- Communications - Appropriate notifications should be sent
to required staff, excused staff, system users, and our customers
- Emergency options - Emergencies that are encountered at
different stages of a crisis may be handled differently; the author
leaves me unsure if he is addressing emergencies while we are in
recovery, or emergencies that create the crisis
- Activities - Like all of these bullets, there is a
difference between the disaster recovery activities that are done in
their proper place, and those that are actually done in earlier parts
of the crisis
- Recovery procedures - There should be actual steps to
follow to recover sites, recover operations, recover systems
- Critical operations, customer service, and operations
recovery - Separate recovery steps may be needed for critical
operations, which may or may not include customer service, depending on
what happened to customer service during the crisis and whether it is a
CBF for your organization
- Restoration and normalization - In this last phase, the
author conducts what I would have called the total activity of a DRP,
assuming we also had and used an Incident Response Plan and a Business
Continuity Plan.
As should be familiar by now, the text cautions us to test and
update a DRP as often as we can. When reviewing a DRP, make sure to
review the systems, assets, and locations that are parts of it, to
determine whether changes in them require changes in the DRP.
The text explains, in more words than it takes, that a DRP
reduces risk by reducing the impact of a disaster. A good plan keeps us
going, and puts us back in working order as soon as possible.
The best practices mentioned at the end of the chapter should
sound very familiar:
- Complete BIAs
- Complete your purpose and scope statements so you contain
only the right things in your process.
- Review and update.
- Test the DRP before it is actually needed.
Chapter 15
The last chapter in our first text presents another
opportunity to misunderstand the author. He tells us that there is no
difference between a computer incident
and a computer security incident.
This is not so in all organizations, and it is not so in the one where
I work. We call the part of our help desk that handles calls about problems with computers our Incident Response Team. What my
organization calls incidents,
the text calls events. We have
a separate division that deals with computer
security issues. Customized terminology is not a useful thing if
you and I call the same things by different names. In the context of
this chapter, the author is only talking about the people and
procedures that address computer security incidents, not the day to day
problems that are also called incidents by people taking calls at a
general help desk.
So, in the context of the author's terminology, a computer
incident is a violation of a security policy or a security practice.
The organizations where you will work may or may not have the same
terminology.
The text offers a short list of examples of security incidents:
- Denial of Service attack
- Malware attack
- Unauthorized access
- Inappropriate usage - This one is not in the same realm as
the others. It is about users who violate a policy by using their
equipment for other than intended/allowed purposes. This could also be a
minor legal infraction or a felony, depending on the particulars.
In the context of the chapter, we
are talking about the creation of a Computer
Incident Response Team plan. (Again, he is talking about security incidents. I wish he would
use the word more often. Our second text refers to this team as the Security Incident Response Team. And both acronyms would be pronounced the same way.) On
page 403, the text discusses the purpose of this plan: to prepare for
security incidents. That is a very broad concept, considering the
variety of security incidents that can occur. The author sensibly
proposes that we begin dealing with an incident by seeking to
understand what is happening. He suggests using a familiar framework. Explain
the situation in terms a reporter would use. A good idea is to memorize
a line from Rudyard
Kipling about six honest serving
men:
I keep six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Who.
(The rest of the poem is not important in this context. Yes,
you should read poetry. And science fiction, and detective stories, and
lots of other things. In case you don't know, Kipling was a soldier, a
poet, a writer, and a reporter. A human being should be able to do many
things.)
In the context of this chapter, we are asking:
- what has happened?
- how did it happen?
- where and when did it start, and where is it affecting our
systems?
- who did this, and who is it affecting?
- answering why it was done may not be easy, but it can be
enlightening if the information is available, and it may help us
prevent some attacks in the future
The text moves on to the elements of a CIRT (SIRT, CERT, etc.)
plan. We are informed that a plan
should list people (or job roles) that will be involved,
and should include information on the policies
that will be enforced by this staff.
The text describes three models from NIST SP 800-61 Revision
2, which relate to organizing your CIRT staff with different scopes of
authority and responsibility.
- central incident response team - for a small organization
in one location, a centralized staff responsible for all incidents is
the only real choice
- distributed incident response teams - for organizations
spread over large areas, you may need to distribute your staff across
several locations, and each team in those locations will have upper
management from a central authority
- coordinating team - this method expands on the distributed
method, giving more autonomy to local teams, usually with the central
team acting in an advisory role
The list of roles on page 406 may bear no resemblance to the
roles you may see in an actual organization. The lesson to take from
the list is that there are many jobs in large organizations, some of
which are quite specialized, but everyone contributes in some way to
the security of those organizations. CIRT staff are more directly
involved in security than HR staff, but everyone participates in
his/her own security experience.
The list of duties on page 407 captures the essence of what
people working in this field do:
- develop procedures - basic procedures for all incidents and
detailed procedures for specific types of incidents
- investigate incidents, determining the cause of incidents -
this includes requesting help from other IT staff as needed,
documenting the problems and their solutions, and sharing information
with people who need it
- recommend controls - when you determine that there is a
problem, you may also determine that we need to patch more often or in
a more effective way, or you may determine that we need to do something
that we are not presently doing, which should be documented and
submitted as a request for change
- protect collected evidence, and use chain of custody
protocols - when investigating problems, be mindful that you may need
to preserve evidence of an attack, or of wrongdoing, so that our
attackers might be tracked and/or punished.
In addition to these bullets, the text offers three
suggestions on page 408 about investigating: acquire evidence,
authenticate the evidence (prove it is what you say it is), and analyze
the evidence.
The text provides a lengthy discussion of policies and
procedures that guide the actions of a CIRT employee. Note the
discussion on page 408 about attacking an attacker. This might make
sense to many of us, but the reality is that counter attacking may only
increase the animosity of the attacker. Our objective is to defend,
which is often best done without emotional investment. Some advice is
offered for specific types of frequently seen attacks:
- DoS, ping-based attack - if using an IDS that can change
firewall rules, set it to block ICMP when large numbers of packets of
this type are seen
- DoS, Syn flood - configure IDS to block IP address of
originator after a reasonable number of tries
- Malware - use, configure, and update antivirus software.
Note that the Department of Homeland Security instructed Federal
Agencies to stop
using Kaspersky security products in September of 2017. This is
the kind of news you need to be aware of, and may want to emulate in
your own organization.
- Malware - teach your users to be suspicious of email
attachments, web sites whose URL make no sense, and programs a
"friendly" email asks them to run.
- Malware - when infection is found, isolate the devices and
clean them as needed, getting help from your antivirus vendor if
possible. Contain the problem, eradicate the problem, then recover the
system. (page 425)
The chapter ends with some recommendations for best
practices:
- Define a computer security incident. This needs to be done
to specify the circumstances that will empower the CIRT staff to take
actions that some will consider to be blocking their ability to work.
- Provide policies for the CIRT staff to follow.
- Provide ongoing technical training to CIRT staff and
provide ongoing awareness training to all employees.
- Include checklists as controls in procedures that require
steps be done in a prescribed order, and as easy reference in other
procedures,
- Subscribe to security notifications from recognized
agencies, such as US-CERT.
|