ITS 4350A - Incident Response and Disaster Recovery


Chapter 4, Incident Response: Planning; Chapter 5, Organizing and Preparing the CSIRT

Objectives:

This lesson is about chapter 4. Objectives important to this lesson:

  1. Incident response policy
  2. Incident response planning
  3. During the incident and after
  4. Before the incident
Concepts:
Chapter 4

Incident response covers all incidents handled by a particular team. In the context of this chapter, incident response will cover the actions of teams that handle security incidents. Be aware that not all incidents that threaten a company involve security, but those are the incidents that concern us in this course.

The text spends several pages on organizing a committee, on choosing a model to follow, and on getting ready to plan for security incidents. We can skim that material and continue on page 107, where the material is still mysteriously engaged in forming an official statement and commitment from the organization that it is going to handle incidents. We should all expect that they will, shouldn't we?

Moving on to page 112, the author considers creating an Incident Response Plan. Some terminology is reviewed:

  • adverse event, incident - These are two terms that mean the same thing in the context of chapter 4. An event that has or may compromise our security. Be aware that there can be events and incidents that have nothing to do with information security.
  • incident response - A set of procedures that will vary with the nature and severity of the incident. The goals begin with containment, identification, and remediation.
  • information security incident - An incident can be classified as an information security incident if three tests are met.
    1. Information assets have been placed at risk.
    2. The threat may succeed.
    3. The IT assets' confidentiality, integrity, and/or availability are at risk.

The text cautions us to remember that we are assuming that an adverse event has occurred. IR is concerned with reacting to the event, not preventing it. This is not helpful, and not very accurate. On page 113, the author introduces the three aspects to the IR documentation that the IR committee must produce, one of which addresses a different attitude:

  • during the incident - Who must do what, given the kind of incident being discussed.
  • after the incident - How do we proceed once the incident is over? Do we go into disaster recovery? Do we simply resume normal operations if the incident was not severe? How about making notes on what went right and what went wrong with the "during the incident" plan? Revising the plan for next time can't wait until the next time it happens.
  • before the incident - With their knowledge of the contents of the documents prepared above, the committee can state what the organization should do in order to be prepared to carry out the steps listed above. How do we store backups, and when are they made? How do we notice that an incident is happening, and how do we handle it as soon as possible? What are the contents of the Business Continuity plan and the Disaster Recovery plan? What should we do during an incident to flow into those plans better?

Up to now, it has not been clear that you should have a separate incident response plan for each kind of incident that you consider possible. It may be useful to regularly consider incidents that have occurred to similar organizations, and to create plans to use should that kind of incident happen to your organization. The text addresses this, mentioning that a different skill set will be needed for different kinds of incidents. This should be addressed in the incident plans as well.

As an example, the text discusses various steps in the life of an incident:

  • a trigger event occurs - There may be a call from a user, a device failure that particular staff notice, a performance degradation, or anything else that causes someone to ask why something is not working.
  • IT staff are contacted - Somehow, possibly through a help desk, a connection is made to IT staff who become what the text calls a reaction force. In the real world, this is not very formal, except in the cases of high priority callers or security related events.
  • actions are taken - The actions taken depend greatly on the nature of the event, but we can assume that they will include gathering facts and evidence, troubleshooting and diagnosing, and containment and remediation. This phase continues until the incident is over.
  • after the incident - Typically, information is shared, documents are created and completed, and recommendations are made for changes to the incident plan, or creation of one for new incidents.
  • before the incident - See below.

The longest section of the chapter covers the time before an incident happens. It is placed at this point in the chapter because a proper plan for any kind of incident is greatly enlightened by the planners having seen such an incident, and having the insights that come from dealing with it. It is likely that the material gathered during and after an incident will include phrases like "if only we had known x, or had watched y". This kind of insight leads to practices that will create faster, more accurate diagnoses in the future, and may lead to prevention of such incidents.

We could also refer to the time before an incident as the time between incidents. Security incidents do not happen every day. Many are stopped at firewalls by good firewall rules. However, following the logic of the text, we will assume that a recent event has been handled well and is now well documented. An IR plan has been revised or written for it. In that case, the text recommends that several steps take place to be ready for next time.

  • desk check - Responsible experts should read copies of the plan, suggest revisions, and pass the revised plan to the next stage.
  • structured walk-through - People who would be involved in carrying out the plan read through it together, making sure everyone understands their intended roles and actions.
  • simulation - An exercise is scheduled that requires each person in the walk-through to "almost" carry out their tasks in the plan. This will bring out problems that would not appear in the earlier stages, such as not having a resource, or not having assumed access to assets.
  • parallel testing - This is not really different from the actions in the simulation, but it is meant to be one step closer to actually carrying out the plan.
  • full interruption - This is a live test of the plan, actually isolating LAN segments or turning off access as the plan requires. One learns more about the actual work involved and about the validity of the plan.
  • war gaming - This is a simulation of real system attack and defense. It is compared in the text to the National Collegiate Cyber Defense Competition, and to other events that provide the chance to use real skills.

The last major portion of the chapter discusses training the writers of plans, the staff who will carry out plans, the general staff of an organization, and general management of the organization. Everyone needs to know that things happen, and what to do when they happen. The chapter finishes with some recommendations about preparing and storing the IR plan documents. Note that they recommend a hard copy, easily identified and readable in case there is an outage that includes limited or no access to your computer systems.


Chapter 5, Organizing and Preparing the CSIRT

Objectives:

This lesson is about chapter 5. Objectives important to this lesson:

  1. Detecting incidents
  2. Intrusion systems
  3. Processes and services
  4. Port scanning
  5. Decision making
Concepts:
Chapter 5

The chapter begins with a phrase that it will use frequently: Computer Security Incident Response Team (CSIRT). It tells us that the same concept may be called different names in various environments, but we should be clear that we are discussing the staff who will address security incidents relating to computer systems.

The text begins a set of eight sections about developing a CSIRT. This is from a plan proposed by the respected Computer Emergency Response Team at Carnegie Mellon University. (The link in the previous sentence will take you to one of their websites, which explains who they are and what they are responsible for.) Some thought has been put into a few of these sections, so we should consider them.

  1. Obtain management support and buy-in - I am getting a little tired of this concept. How many organizations put together their own teams and divisions and then get approval to do so? It is far more likely that someone "upstairs" had a reason to reorganize the outfit, leading to the creation of a new business entity. Let's assume that someone in authority had decreed that such a team should and will exist. Huzzah.
  2. Determine the CSIRT strategic plan - Oh, great, we had a decree, so now we need a proclamation. It's going to have a lot of details.
    • Assuming this is a new group, we declare that it will by created by (fill in a date), and we will need to hire (list of specialties and number of employees).
    • Services will be provided as we promise, so we need to specify operating hours, or specify a way to provide 24/7 coverage. Hiring staff to work around the clock is expensive, but so is paying overtime for too few staff. Make your choices.
    • Staff will need particular skills. The text offers a list: malware identification/elimination, system recovery, system administration, network administration, firewall management, IDPS usage, cryptography skill, and documentation. If you don't spend time and effort here, you will have no product and no service to offer the organization. Either you hire skilled people, or you hire good learners who can become skilled people. Note: this concept is about staffing, not about proclaiming that you have a great staff.
    • An organizational structure for the CSIRT will be needed, one that fits into the existing organizational structure of the business we work for. This will include a plan based on the size of our organization, its geographic scope, and decisions that will be made about full time staff, part time staff, contract staff, and outsourced staff.
    • Ongoing training will be needed as time goes on, regardless of the skill our employees have when they start. We should make plans to provide training or provide incentives to induce employees to take necessary training, education, and skill building courses.
  3. Gather relevant information - This is a background task to be done when the team forms, but gathering information about IT is always relevant because the landscape and the landmarks are always changing. Ask me about a story I heard this week about MAC addresses and assigned IP addresses. Sounds simple, doesn't it? Not any more.
  4. Design the CSIRT vision - This one is too much like item 1. Do you think we are creating this team in a vacuum? Surely the points in this section have been determined before anyone was put in charge of it, much less hired to do the work. Well, we do what needs doing. Dilly Dilly.



  5. Communicate the CSIRT's vision and operational plan - Let's just agree that no one pays any attention to anything unless it comes from their own management (their boss). Some information needs to come through appropriate channels, starting at the top and hitting all levels. That's the part you need to follow up. Did everyone get the memo? You won't know until you contact someone who does not work with or for you. When you do, and they have never heard of you, have your credentials ready, meaning the authorization from upper management that you can and should have cooperation while doing your job.
  6. Begin CSIRT implementation - The text includes hiring, initial training, form creation, and software selection in this step. Some of these tasks will be ongoing. All need to be done before services are provided.
  7. Announce the operational CSIRT - Step 5, rinse and repeat. Tell the world. The world, if it has been paying attention, will probably say it's about time this service actually started.
  8. Evaluate CSIRT effectiveness - You should be doing this with all your teams, so you should expect to do it with this one. Continuous Quality Improvement.

The text revisits the idea of outsourcing. It may be necessary, for example, to hire contract staff who are experts about new equipment or software for a transitional period in which our existing staff should learn about the new assets. It may be cost effective to use contract staff if they are paid from a different budget. The overall cost of operation increases, but the separate funding can make it possible to have additional staff who are not paid by your operating budget.

The text offers a list of concerns that should be considered when outsourcing CSIRT duties. These are a few of them:

  • Quality of work is a consideration because a contract employee may not be as motivated to do a good job as an actual employee. This is not always so, but it is something that should be monitored.
  • If we are concerned about contract staff making decisions about our security problems, the text suggests we should consider having them make recommendations based on their analysis of incidents. Those recommendations should be considered by our senior CSIRT staff, and appropriate decisions can be made. This may slow down the application of solutions, and that may not be acceptable.
  • It is inevitable that anyone working on a CSIRT issues will gain knowledge about sensitive data and operational information. That being so, the question is whether we want a contract employee learning these things when that contractor may be working for a competitor when their contract is over. In my day job, it is common for contract employees to have long term relationships with us, leading to their being familiar with our systems and to their being trusted as much as regular staff. If this situation is possible, there is less reason for concern.
  • As noted above, a contractor with a long association with our organization will be familiar with our operations, policies, and equipment. This is not so when a new contractor joins us, or when a new contract begins with an outside agency. It is also true that a new full time employee will not be familiar with our systems, so the question is again a matter of trust. Do we trust the people we are paying to do our services? If not, can we rely on legal contracts to keep our secrets safe? If the concern is about having the big picture of our organization, do we have an effective onboarding process for new staff, full time, part time, and contract?