ITS 305 - Security Policies and Auditing

Chapter 12, Incident Response Team (IRT) Policies

Objectives:

This lesson covers chapter 12. It discusses policies that relate to incident response teams and procedures. Objectives important to this lesson:

  1. Incident definition
  2. Incident response policy definition
  3. Incident classifications
  4. Incident response teams
  5. Procedures for IRTs

Concepts:

Chapter 12

Incidents

Okay, let's turn this around. Defining an IRT before you define an incident is nonsense. Sorry, Mr. Johnson.

Let's start at the bottom of page 330. The short version is that an incident (actually, a security incident, because there are other types) is an event that significantly violates a security policy. It can be any kind of disruption of service or violation of the CIA standard, as long as it puts the organization at risk. In other environments, the simple word incident is more generic, so be aware that this chapter is about security incidents.

Incident Response Policy

That being understood, the chapter actually opens with the idea that there should be a multidisciplinary team in place to handle significant security issues. This is confusing because it is not uncommon for there to be a general incident response team on the organization's help desk, whose job is to handle any incident that does not have significant security implications. It is important that when, for instance, there is an ongoing attack on some aspect of the organization that the right people be available to defend our assets. The text also discusses the idea that there will be smaller violations of security policies that will not require the attention of this team of specialists. Those problems, such as sharing a password, will be handled by local management, without needing to involve the security incident response team (SIRT).

This is why there is a need for a security incident response policy, a set of criteria that make it easier to determine when there is a security incident, and when the current problem is only a security infraction, which will be handled and properly reported. The essence of the distinction is this: is this an emergency? If so, the SIRT should be consulted

Incident Classification

Your security incident policy should include a classification method, otherwise the SIRT will receive the wrong trouble calls, and will fail to receive the right ones. The text informs us that there is no definitive triage list for this purpose. This is probably because the duties of the SIRT staff vary from one organization to another.

The text tells us that the Visa company's requires a report of any breach of customer information, and that it tracks these reports by exploit type. Tracking this information provides a rough estimate of how many attacks of each type we might expect to see in a given time period. Compare the list of types tracked by Visa on page 331 to the list of types tracked by NIST standards on page 332. There are unique items on each list. Do the Visa merchants never encounter a DoS attack? Do the agencies using NIST standards never have misconfigured networks?  Perhaps those issues are infrequent in their respective environments, or they are unlikely to cause security problems for those organizations.

Regardless of tracking the type of incident, it is a common practice to handle minor incidents at the help desk or within the work area. However, there must be a definition of what is a minor incident, and what is a major incident. As the text says, it is easy to see the difference at the top of the scale. It is more difficult in the middle, so there must be measurements we can use. When there is a potential for loss of life, it is a major incident. When it affects all or a significant number of users, it is a major incident.  In practice, the number of users affected or threatened is often a measure of the severity of the incident. If incidents can be measured on that scale, that is a good method. However, there may be other factors. In small, outlying locations, it may be more meaningful to measure the percentage of staff affected, rather than the raw number. If there are only 10 staff at one location, and 100 at another, the effect of 5 people being unable to work is more significant in the first location than in the second.

Incident Response Teams

On page 333, the text turns to the ways an incident response team might be organized and empowered. It may begin with a charter, which is a commonly used business document that establishes the purpose of a group or project, and the extent of the authority granted to the staff involved in it. The reason for a charter is to make it clear to all staff that the security incident response team has been given authority to take charge of an incident, to act to resolve it, and to expect cooperation from all concerned staff.

The text lists three scopes that a security incident response team might be empowered to act under.

  • On-site response - The security incident response team is empowered to take a hands-on approach to incidents, taking charge of them and performing the necessary tasks to resolve them. The text explains that political realities may require the SIRT staff to advise a local expert about what to do. This is still within the scope of this scenario.
  • Support role - This scope is more likely when the organization is complex, having many systems whose maintenance is done by experts who are the appropriate staff to handle their problems. It is also appropriate  when other staff have expertise in handling incidents. The SIRT members will provide advice and management of the situation.
  • Coordination role - When the organization is larger than one geographic location, it may be best to have the SIRT staff act as a central authority whose role is to manage the activities of local staff at every location. This is a problem if there are many locations, few staff to distribute among them, and no remote management software.

The text moves on to consider the specialties that might be useful for the members of a SIRT. If you are not reading carefully, you might miss the fact that security staff form the core of the team, so I will add a bullet point for them:

  • SIRT core members who are security experts
  • Experts in systems that are affected
  • Human resources staff when needed, such as when there is an internal attack
  • Legal staff who may interface with police agencies and/or advise the organization about legal and regulatory responsibilities

The other staff listed on page 336 are more useful to the business side of the organization than to the technical solution side. Even the data owner may be a business person who has official control of the data, but who does nothing on the technical side. The text beats this concept to death for a few more pages, but we don't have to watch the beating.

On page 340, the text presents a case for having Business Impact Analysis done. It relates to a concept we have seen before, establishing what resources we need to restore, with what speed, and in what order in various incident scenarios. Once this information is prepared, we can write incident policies and procedures that describe what must be done in particular circumstances.

As usual, we find the actual steps to perform during or after an incident in the procedures associated with it. The grapic on page 342 shows a circular set of processes, each of which would have procedures to follow to achieve the desired result. The information in this text in the remaining pages in the chapter was summarized better in our last text. Documenation should take place at all stages:

  • Business Impact Analysis - The green highlight on this bullet is to show that this step should be done when times are good and we can examine our systems performing normally.
    Before you can plan for what to do, you have to figure out what is normal for your business, what can go wrong, and what can be done to minimize the impact of incidents and problems/disasters (see the bullets below).
    • What are the business's critical functions? Can we construct a prioritized list of them?
    • What are the resources (IT and other types as well) that support those functions?
    • What would be the effect of a successful attack on each resource?
    • What controls should be put in place to minimize the effects of an incident or disaster? (Controls are proactive measures to prevent or minimize threat exposure.)

  • Incident Response Planning - The red highlight on this bullet is to acknowledge that the plans made in this step are used when there is an emergency for one or more users. (Shields up, red alert? Why were the shields down?)
    The text is consistent with the ITIL guidelines that call a single occurrence of a negative event an incident. An incident response plan is a procedure that would be followed when a single instance is called in, found, or detected.

    For example, a user calls a help desk to report a failure of a monitor that is under warranty. (Note that this is an example of an IT incident, not an IT security incident. What further details might make this part of a security incident?) There should be a common plan to follow to repair or replace the monitor. Incident Response Plans (Procedures) may be used on a daily basis.

  • Business Continuity Planning - The orange highlight is meant to indicate that these plans are not concerned with fighting the fire, but with conducting business while the fire is being put out.

    Business continuity means keeping the business running, typically while the effects of a disaster are still being felt. If we have no power, we run generators. If we cannot run generators (or our generators fail), we go where there is power and we set up an alternate business site. Or, if the scope of the event is small (one or two users out of many) maybe we pursue incident management for those users and business continuity is not a problem.

  • Disaster Recovery Planning - The yellow highlight here is to indicate that the crisis should be over and we are cleaning up the crime scene with these plans.

    A disaster requires widespread effects that must be overcome. A disaster might be most easily understood if you think of a hurricane, consequent loss of power, flooding that follows, and the rotting of the workplace along with the ruined computers and associated equipment.

    A disaster plan is what we do to restore the business to operational status after the disaster is over. There may be specific plans to follow for disasters under the two bullets above, but the disaster recovery plan is used after the crisis, unless this term is applied differently in your working environment.

  • By the way, in ITIL terms, a series of incidents may lead us to discover what ITIL calls a problem, something that is inherently wrong in a system that might affect all its users. When a problem knocks out a critical service, we have a disaster. The organization you work for may use all three terms, or any two of them to mean different scopes of trouble. You need to know the vocabulary to use in the setting where you work, and you need to call events by the names they use.

The text also mentions analysis of the incident and our response. Analysis of the incident should begin during the incident, to lead us to a good solution. Analysis after the incident can examine what actually happened, whether the steps we took were effective, and what we should recommend or require to avoid such an event in the future.