Managing Risk in Information Systems: Chapter 14, Mitigating Risk with a Disaster Recovery Plan; Chapter 15, Mitigating Risk with a Computer Incident Response Team Plan

ITS 3050 - Security Policies and Auditing

Managing Risk in Information Systems
Chapter 14, Mitigating Risk with a Disaster Recovery Plan
Chapter 15, Mitigating Risk with a Computer Incident Response Team Plan

Objectives:

This lesson presents the last two chapters, discussing the last two major plans about risk mitigation. Objectives important to this lesson:

Disaster recovery plans
Critical success factors
Elements of a DRP
How a DRP mitigates risk
Best practices
Computer incident response team plans
Purpose of a CIRT plan
Elements of a CIRT plan
How a CIRT plan mitigates risk
Best practices for CIRT plans

Concepts:

Chapter 14

In the last chapter, we were treated to a brief exposure to Disaster Recovery Plans in the recovery phase of a Business Continuity Plan. The biggest difference about a Disaster Recovery Plan is that it is activated and used once the disaster is over. In the opinion of our current author, the DRP may actually be part of the BCP. I think I see his point, that we begin to transition to normal operations while still using a BCP. However we ended the last chapter with the thought that we would stop using the BCP when the disaster ends, and we have just settled on the idea that the DRP starts once the disaster is over, so the most I think we should allow is a little overlap from the BCP to the DRP.

On page 372, the text offers five more terms that the author associates with DRPs. However, some of them (like business interruption planning) sound too much like BCP, and not enough like DRP.

The author continues his custom of assuming we did not read the previous chapter by presenting a set of five critical concepts, four of which we have seen before (in this book, blast it!). There is one new one:

Recovery Time Objectives (RTO) - we are told that these are spans of time within which we seek to recover from an outage. The text explains that each RTO must be lower than the associated Maximum Tolerable Outage (MTO).

The text briefly addresses reasons for having a DRP, which should be known to us by now. If there is a disaster, the other plans we have prepare for it, deal with it, and carry on during it. The DRP covers returning to normal operations, which our organization needs to do. The author complicates the issue on page 373, discussing saving lives and ensuring business continuity, which were covered in other chapters, in other plans. I suppose he is taking the point of view that if we only had a DRP, it would have to include the others plans as well. Aye, and if my grandmother had wheels, she'd be a wagon. (Go to video time 1:20.)

When our plan book has a chapter for every type of plan, we are most likely to use a DRP for its intended purpose, which is returning to our pre-crisis operational state.

The text cautions us to include Critical Success Factors (CSFs) when we write a DRP. These are things that make it possible to create a useful DRP. You could argue that they would contribute to the success of any of our plans. These CSFs are focused to help create a good DRP.

management support - support to create and test the plan, endorsement of the project and its usage, prioritization of goals for the plan
knowledge and authority for DRP creators - knowledge of disaster recovery techniques and processes, knowledge of the organization and its assets, and authority to gather information for the plan and for the plan to make assignments to staff
clearly stated primary concerns - Recovery Time Objectives must be set lower than Maximum Acceptable Outage times; backups must be available offsite for crises that disable the workplace; alternate locations must be located, available, and classified as hot, warm, or cold to make plans for their use; access methods must be arranged for staff, management, and customers
a disaster recovery budget - there must be accessible funds for all of our plans, including this one; no plan will run very long without a budget to pay for executing it

The text lists nine elements of a DRP on page 384. It elaborates on them through page 394. You should glance at this material to get an idea about each area that is not familiar to you. Try not to get confused by the overlap with other plans. I believe that the author is working from the point of view that the DRP may be our only contingency plan.

Purpose - Typically, to restore our organization to normal operations
Scope - You might divide your DRP into segments that each cover a division of your organization, one or more functions, or one or more groups of assets
Disaster/Emergency declaration - As noted on other documents, you need specific conditions under which to activate the plan, persons to notify, and actions to take
Communications - Appropriate notifications should be sent to required staff, excused staff, system users, and our customers
Emergency options - Emergencies that are encountered at different stages of a crisis may be handled differently; the author leaves me unsure if he is addressing emergencies while we are in recovery, or emergencies that create the crisis
Activities - Like all of these bullets, there is a difference between the disaster recovery activities that are done in their proper place, and those that are actually done in earlier parts of the crisis
Recovery procedures - There should be actual steps to follow to recover sites, recover operations, recover systems
Critical operations, customer service, and operations recovery - Separate recovery steps may be needed for critical operations, which may or may not include customer service, depending on what happened to customer service during the crisis and whether it is a CBF for your organization
Restoration and normalization - In this last phase, the author conducts what I would have called the total activity of a DRP, assuming we also had and used an Incident Response Plan and a Business Continuity Plan.

As should be familiar by now, the text cautions us to test and update a DRP as often as we can. When reviewing a DRP, make sure to review the systems, assets, and locations that are parts of it, to determine whether changes in them require changes in the DRP.

The text explains, in more words than it takes, that a DRP reduces risk by reducing the impact of a disaster. A good plan keeps us going, and puts us back in working order as soon as possible.

The best practices mentioned at the end of the chapter should sound very familiar:

Complete BIAs
Complete your purpose and scope statements so you contain only the right things in your process.
Review and update.
Test the DRP before it is actually needed.

Chapter 15

The last chapter in our first text presents another opportunity to misunderstand the author. He tells us that there is no difference between a computer incident and a computer security incident. This is not so in all organizations, and it is not so in the one where I work. We call the part of our help desk that handles calls about problems with computers our Incident Response Team. What my organization calls incidents, the text calls events. We have a separate division that deals with computer security issues. Customized terminology is not a useful thing if you and I call the same things by different names. In the context of this chapter, the author is only talking about the people and procedures that address computer security incidents, not the day to day problems that are also called incidents by people taking calls at a general help desk.

So, in the context of the author's terminology, a computer incident is a violation of a security policy or a security practice. The organizations where you will work may or may not have the same terminology.

The text offers a short list of examples of security incidents:

Denial of Service attack
Malware attack
Unauthorized access
Inappropriate usage - This one is not in the same realm as the others. It is about users who violate a policy by using their equipment for other than intended/allowed purposes. This could also be a minor legal infraction or a felony, depending on the particulars.

In the context of the chapter, we are talking about the creation of a Computer Incident Response Team plan. (Again, he is talking about security incidents. I wish he would use the word more often. Our second text refers to this team as the Security Incident Response Team. And both acronyms would be pronounced the same way.) On page 403, the text discusses the purpose of this plan: to prepare for security incidents. That is a very broad concept, considering the variety of security incidents that can occur. The author sensibly proposes that we begin dealing with an incident by seeking to understand what is happening. He suggests using a familiar framework. Explain the situation in terms a reporter would use. A good idea is to memorize a line from Rudyard Kipling about six honest serving men:

I keep six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Who.

(The rest of the poem is not important in this context. Yes, you should read poetry. And science fiction, and detective stories, and lots of other things. In case you don't know, Kipling was a soldier, a poet, a writer, and a reporter. A human being should be able to do many things.)

In the context of this chapter, we are asking:

what has happened?
how did it happen?
where and when did it start, and where is it affecting our systems?
who did this, and who is it affecting?
answering why it was done may not be easy, but it can be enlightening if the information is available, and it may help us prevent some attacks in the future

The text moves on to the elements of a CIRT (SIRT, CERT, etc.) plan. We are informed that a plan should list people (or job roles) that will be involved, and should include information on the policies that will be enforced by this staff.

The text describes three models from NIST SP 800-61 Revision 2, which relate to organizing your CIRT staff with different scopes of authority and responsibility.

central incident response team - for a small organization in one location, a centralized staff responsible for all incidents is the only real choice
distributed incident response teams - for organizations spread over large areas, you may need to distribute your staff across several locations, and each team in those locations will have upper management from a central authority
coordinating team - this method expands on the distributed method, giving more autonomy to local teams, usually with the central team acting in an advisory role

The list of roles on page 406 may bear no resemblance to the roles you may see in an actual organization. The lesson to take from the list is that there are many jobs in large organizations, some of which are quite specialized, but everyone contributes in some way to the security of those organizations. CIRT staff are more directly involved in security than HR staff, but everyone participates in his/her own security experience.

The list of duties on page 407 captures the essence of what people working in this field do:

develop procedures - basic procedures for all incidents and detailed procedures for specific types of incidents
investigate incidents, determining the cause of incidents - this includes requesting help from other IT staff as needed, documenting the problems and their solutions, and sharing information with people who need it
recommend controls - when you determine that there is a problem, you may also determine that we need to patch more often or in a more effective way, or you may determine that we need to do something that we are not presently doing, which should be documented and submitted as a request for change
protect collected evidence, and use chain of custody protocols - when investigating problems, be mindful that you may need to preserve evidence of an attack, or of wrongdoing, so that our attackers might be tracked and/or punished.

In addition to these bullets, the text offers three suggestions on page 408 about investigating: acquire evidence, authenticate the evidence (prove it is what you say it is), and analyze the evidence.

The text provides a lengthy discussion of policies and procedures that guide the actions of a CIRT employee. Note the discussion on page 408 about attacking an attacker. This might make sense to many of us, but the reality is that counter attacking may only increase the animosity of the attacker. Our objective is to defend, which is often best done without emotional investment. Some advice is offered for specific types of frequently seen attacks:

DoS, ping-based attack - if using an IDS that can change firewall rules, set it to block ICMP when large numbers of packets of this type are seen
DoS, Syn flood - configure IDS to block IP address of originator after a reasonable number of tries
Malware - use, configure, and update antivirus software. Note that the Department of Homeland Security instructed Federal Agencies to stop using Kaspersky security products in September of 2017. This is the kind of news you need to be aware of, and may want to emulate in your own organization.
Malware - teach your users to be suspicious of email attachments, web sites whose URL make no sense, and programs a "friendly" email asks them to run.
Malware - when infection is found, isolate the devices and clean them as needed, getting help from your antivirus vendor if possible. Contain the problem, eradicate the problem, then recover the system. (page 425)

The chapter ends with some recommendations for best practices:

Define a computer security incident. This needs to be done to specify the circumstances that will empower the CIRT staff to take actions that some will consider to be blocking their ability to work.
Provide policies for the CIRT staff to follow.
Provide ongoing technical training to CIRT staff and provide ongoing awareness training to all employees.
Include checklists as controls in procedures that require steps be done in a prescribed order, and as easy reference in other procedures,
Subscribe to security notifications from recognized agencies, such as US-CERT.

Assignments

Assignments for these chapters will be found in Canvas. We will explore that in class.

ITS 3050 - Security Policies and Auditing

Managing Risk in Information Systems Chapter 14, Mitigating Risk with a Disaster Recovery Plan Chapter 15, Mitigating Risk with a Computer Incident Response Team Plan

Objectives:

Concepts:

Chapter 14

Managing Risk in Information Systems
Chapter 14, Mitigating Risk with a Disaster Recovery Plan
Chapter 15, Mitigating Risk with a Computer Incident Response Team Plan