CSS 211 - Introduction to Network Security

Lesson 8: Chapter 13, Business Continuity

Objectives:

This lesson covers chapter 13 in the text. It discusses business continuity planning and activities. Objectives important to this lesson:

  1. Environmental controls
  2. Redundancy planning
  3. Disaster recovery procedures
  4. Incident response procedures
Concepts:
Business Continuity

Chapter 13 sets a new level for our author being oblivious. Its opening pages discuss much of the destruction that was caused by Hurricane Katrina in 2005 in Florida, Mississippi, Georgia, and Alabama. These pages also ignore the damage this hurricane caused in the state of Louisiana, specifically to the city of New Orleans. Are those victims of the storm so well known that he thought he only needed to mention the other states? A better discussion is on Wikipedia, at the link provided.

Once we are past the author's omission, we may consider his remarks about the effects of a disaster that might be reduced by detailed planning and action based on such planning.

Fire

The first threat considered is fire. Some statistics are given and the author presents a list of four elements that must be present for a fire to exist. His fourth element is the fire itself, so it does not belong on the list. The other three are those we discussed in class earlier in the term.

For a fire to exist, three factors are needed:

  • oxygen
  • fuel
  • heat

If you can eliminate any one of these factors, the fire will go out. This is why Carbon Dioxide extinguishers work: the CO2 replaces the oxygen in the immediate vicinity of a fire, and the fire stops. Smothering a campfire works about the same way.

A fire break is an example of fighting a fire by depriving it of fuel. Forest fires can be fought this way. Somewhat similarly, I once walked into a rest room in an office and found that someone had placed a roll of toilet paper on top of the light fixture over the sink. I noticed it because it was on fire. I grabbed the roll of paper and tossed it into the sink. This established a fire break between the fire and the rest of the building. I then put out the fire on the roll of paper with water (depriving it of oxygen).

Keeping your computer system cool, so that a fire will not ignite, is your most effective form of firefighting: don't let it start.

Fire Extinguishers - Fire extinguishers are classed by the kind of fire they are able to put out. The links below will take you to sites with more information about fire classes and extinguishers. In surveying several sites, I found that there are currently at least four classes of fires, and that the symbols for them have been updated to use pictures instead of letters. Some sites list a Class K for cooking oils (Kitchen fires), but this does not seem to be universal. The chart below contains American symbols:

Description of Extinguisher Class
Letter and Shape Symbol for Class
Picture for Class
Class A: paper, cloth, wood.
A in a triangle symbol
Icon for class A fires
Class B: oil, gasoline, kerosene, propane.
B in a square
Icon for class B fires
Class C: electrical
C in a circle
Icon for class C fires
Class D: combustible metals, such as magnesium, potassium, sodium
D in a sta
Icon for class D fires
Class K: combustible cooking oils
K symbol for kitchen fires
Icon for class K fires

Information from FEMA

Information from Underwriters Laboratories

Information from the University of Oklahoma Police Department

The table below is from a Wikipedia article on fire classes. It shows that the same kind of fire is called by a different name in different places:
Comparison of fire classes
American European Australian/Asian Fuel/Heat source
Class A Class A Class A Ordinary combustibles
Class B Class B Class B Flammable liquids
Class C Class C Flammable gases
Class C UNCLASSIFIED Class E Electrical equipment
Class D Class D Class D Combustible metals
Class K Class F Class F Cooking oil or fat

In most cases, a multiclass extinguisher is preferred. On extinguishers I examined at my workplace, multiple picture symbols were used, showing the pictures for classes A, B, and C.

Electromagnetic Shielding

The text discusses the fact that all kinds of electrical equipment radiate electrons to one degree or another. In this section it is important to know a few facts.

  • faraday cage - a metal enclosure that prohibits an electromagnetic field from crossing it; a metal PC case is a faraday cage that keeps mode emissions from leaving the immediate area; A Faraday Cage is named for Michael Faraday
  • TEMPEST - a standard developed by the NSA, TEMPEST may not actually be an acronym, it is a set of standards to reduce and shield emissions, and it may not actually be necessary (see the disclaimer on page 446)

HVAC

The text discusses some ideas about heating, ventilation, and air conditioning, which include some concerns about humidity. Why do we care about humidity? It inhibits ESD (Electrostatic Discharge).

Static electricity - ESD, or Electrostatic Discharge, can be a serious cause of problems. Some numbers from a previous text may help you understand the situation:

  • A human can't feel a static discharge until it is 3,000 volts or more.
  • Normal motion, like moving a chair or a foot can generate 1,000 volts.
  • Simply walking across a carpeted area can generate 1,500 to 35,000 volts.
  • Handling a plastic envelope can generate 600 to 7,000 volts.
  • Picking up a plastic bag can generate 1,200 to 20,000 volts.
  • Damage can be done to computer parts with 20 to 30 volts.

The damage from low voltage may not cause immediate failure so you may never know the cause of the failure that eventually happens.

Servers

Servers are often set up in a cluster, a group of servers that provide services redundantly. If one of the servers in a cluster goes down, the other (or others) provide the services the down server would have provided. The text lists two types of clusters:

  • asymmetric cluster - one server is designated as the standby (replacement) for another; the standby server does nothing unless the first server fails
  • symmetric cluster - each server in the cluster provides services at all times; if one fails, its services are provided by the remaining servers

RAID

The text discusses RAID, which has been defined several ways. Eventually, all hard drives fail, and RAID allows a system to continue in most cases. One common meaning is Redundant Array of Independent Drives. The word "independent" seems unnecessary, and is in fact misleading. Hard drives set up in a RAID array perform functions that relate to each other. Several kinds of RAID exist to provide for redundant storage of data or to provide for a means to recover lost data. The text discusses four types. Follow the link below to a nice summary of RAID level features not listed in these notes, as well as helpful animations to show how they work. Note that RAID 0 does not provide fault tolerance, the ability to survive a device failure.

RAID levels and features:

  • RAID 0: Disk striping - writes to multiple disks, does not provide fault tolerance. Performance is increased, because each successive block of data in a stream is written to the next device in the array. Failure of one device will affect all data. This will provide a performance enhancement by striping data across multiple disks. This will not improve fault tolerance, it will in fact decrease fault tolerance.
  • RAID 1: Mirroring and Duplexing - provides fault tolerance by writing the same data to two drives. Two mirrored drives use the same controller card. Two duplexed drives each have their own controller card. Aside from that difference, mirroring and duplexing are the same: Two drives are set up so that each is a copy of the other. If one fails, the other is available.
  • RAID 5: Parity saved separately from data - Provides fault tolerance by a different method. Data is striped across several drives, but parity data for each stripe is saved on a drive that does not hold data for that stripe. Workstations cannot use this method. It is only supported by server operating systems.
  • RAID 0+1: Striping and Mirroring - uses a striped array like RAID 0, but mirrors the striped array onto another array, kind of like RAID 1

Network Redundancy

The text mentions that some entities need redundant connections to and through networks. It does not give specific details about this concept.

Power Redundancy

Power can be supplied to a computer system through an Uninterruptible Power Supply (UPS) that is essentially a smart battery that kicks in when the main power is lost. The text describes two kinds of UPS:

  • off-line (also called standby) - keeps a charge on a battery which it uses to supply power in case of a total loss
  • on-line (also called inline) - also has a battery, but it constantly provides power from it, while continuously charging it from the standard electrical power

The off-line (standby) model has a short lag time in the event of a power loss before the battery circuit starts working. The on-line (inline) model does not have this lag time. A typical UPS works with software that detects a power loss and alerts administrators when it occurs. Depending on the capacity of the UPS and the load placed on it, it may allow operation for hours, for minutes, or only long enough to perform a shut down of the system it is protecting.

Backup generators are typical in large installations, such as data centers that support a large population or enterprise.

Emergency Sites

In the case of a disaster that makes a work site unusable, such as a fire or flood, it becomes necessary to have a plan for alternate means of continuing business. The text lists three types of off site operation plans:

  • cold site - a basic site with office space, but without computers or other devices that you would have to supply, without established connectivity, without a data copy unless you can supply it
  • warm site - has office space, hardware, and may have connectivity; may have a recent backup of your data, but it will have to be loaded on computers that may also have to be configured
  • hot site - a functional duplicate of the site that has gone down, including office space, computers, connectivity to the Internet, telephone service, and the capacity to either load a backup of your data that is stored there, or to use a copy of your data that is already in place

The definitions of hot, warm, and cold sites vary between sources, but the basic idea is always the same. The three types of sites provide different levels of service and different time frames in which you would be ready to resume business. Obviously, the hot site is best but it requires the most money and effort to maintain. The cold site is cheapest, but it has additional costs that will be added as soon as you need to use it.

Disaster Recovery Procedures

The previous section was about what could be done while a disaster is occurring. Disaster recovery can be more about what to do once the disaster is over. The text divides this concept into three sections: planning, exercises, and data backup.

Planning

Although a disaster recovery plan is used after the disaster, it should be made well before a disaster occurs. The text provides a general outline that a formal Disaster Recovery Plan document might follow. You should look over the sections of the model in the text. Most of the items in it are those that have been discussed already.

Disaster Exercises

Testing the disaster plan is the purpose of a disaster exercise. It should be carried out regularly, and the outcome of the exercise should be examined to determine what updates need to be made to the plan.

Data Backups

Four backup strategies, or schedules, are explained. You should know them. First some terms:

  • Archive bit - a bit in a file that is turned ON when the file is changed; it is used to flag files that have changed since the last backup
  • Target - the device, volume, folder, or group of files being backed up
  • Full - a backup of all files in the target; sets the archive bit of each file to OFF
  • Incremental - a backup of target files that are new or changed since the last backup; depends on the fact that programs that change files typically set the archive bit to ON when a change is made; sets archive bit to OFF for all files it copies
  • Differential - a backup of all files new or changed since the last Full backup; copies all files whose archive bit is set to ON; does not change the archive bit of files it copies
  • Copy - like a Full backup, but does not change the archive bits of files it copies. This is typically not part of a standard backup strategy, but an option to work around the system.

This needs more explanation. Assume we use a tape drive to make backups. In a Full backup strategy, the entire target is backed up to tape every time we make a backup tape. This strategy consumes the most time and the most tapes to carry out a backup. To restore, we simply restore the most recent tape(s). This is the least time consuming strategy for restoring, but the most time consuming for creating backups.

The second method, Incremental backup, means that we start with a Full backup of the target, and then each successive backup tape we create only backs up the elements that are new or changed since the last backup was created. This means that successive backups will not always be the same length. Therefore, this is the least time consuming backup, but the most time consuming restore. To restore, we must first restore the last Full backup made, and then restore EVERY tape made since then, to ensure getting all changes.

The third strategy, Differential backup, also starts with a Full backup tape. Then each successive tape made will contain all the files changed since the last Full backup was made. This means that we will have to restore only one or two tapes in a restore operation. If the last tape made was a Full tape, we restore only that one. If the last tape made was a Differential tape, we restore the last Full tape, then the last Differential tape.

The fourth strategy, Copy, is no different from Full in terms of backup or restore time. In both Incremental and Differential backup strategies, you will typically use a rotation schedule. For example, you could have a one week cycle. Once a week, you make a Full backup, then every day after that you make the other kind you have chosen to use: Incremental or Differential.

To keep them straight in your mind, remember that:

  • a Full backup copies everything. Resets all archive bits.
  • an Incremental backup copies everything different from the last backup. Resets the archive bits of files it copies.
  • a Differential copies everything "different from Full". (Different from the last Full backup.) Does not reset any archive bits.
  • a Copy makes a Full backup, and does not reset any archive bits.

The time required to create backup tapes should be considered along with the time to restore a backup. When you consider the two concepts as two sides of the answer to a question (What method should I use?), the answer may be the most common choice: Differential. It is the best compromise in terms of backup time versus restore time. Note also, that all three standard methods require a full backup on a regular cycle. The recommendation is usually to run a Full backup tape weekly.

The discussion above assumes that your backups are being written to tapes, which has been the most common method for many years. The text discusses three other methods, each requiring different hardware. Copying to other drives is faster, but only if connected by a fast channel, such as being in the same computer. This leads to a problem of removing the copy from the same location as the original. Copying to a disk in another data center is possible, and fast if they are connected by fiber, but costly in terms of setup.

Incident Response Procedures

An incident can be an event of any sort, but some texts, ours included, call an incident an event caused by an attack. The last five pages of the chapter concern the actions that should be taken when an incident has been detected.

Forensics

A forensic investigation is typically one that concerns a crime. This section is about computer forensics, investigations into crimes that involve computers and other information system equipment. The text discusses four aspects of an investigation:

  • secure the scene - The team mentioned in the text may be called an Incident Response Team or a Forensics Response Team, or another title that means the same thing. They are responsible for taking possession of devices that might hold any data that might contain evidence of the crime being investigated.
  • preserve the evidence - This aspect is closely related to the first, in that the response team may have to take images of data in RAM that would be lost if not recorded before the power is turned off.
  • establish (and maintain) the chain of custody - There must be a continuous documentation of who has had access to seized devices and data, who has done what with it, and who it is turned over to at each change in custody.
  • examine for evidence - Although the other discussions have used the word "evidence" several times, this one brings up the point that not everything you find is actually evidence. Only things that indicate or prove a crime was committed can be considered as evidence that will be presented in court.

The text elaborates on memory and storage locations that should be examined for meaningful data. What you can expect to find there may surprise you:

  • Windows page file - This is a hidden file, typically on the boot drive, that Windows uses to store "memory pages" that it thinks you are not using presently, like memory devoted to an application that is minimized. The file is probably in the root of the drive, and is probably called pagefile.sys. You should expect to see pieces of anything that the computer was used to work on, especially if it was minimized while the user worked on something else.

  • RAM slack - This will take a minute. When Windows saves files, it saves to sectors (track sectors) on a drive. Sectors are arranged in clusters and the number of sectors in a cluster varies by the way a drive was formatted. When a file is saved, it will take a certain number of clusters to hold it, but the file itself may not actually fill the last sector used in the last cluster used to store it. When this happens, Windows does something you may never have heard about. It fills the last sector used for the file with data pulled randomly from RAM. This data is called RAM slack, a copy of a piece of RAM that has been stored in the slack space at the end of a sector. Why does it do this? Windows just works that way: it has to fill the rest of the sector. You never know what you'll find in it.

    In the illustration below a file has been saved to two four-sector clusters, but only fills six and a half sectors. The second half of the seventh sector (item F) is filled with RAM slack.

    Note: for many years, a track sector has held a standard 512 bytes regardless of what device it was on. As of January 2011, this is no longer true. A device using Advanced Format on a system that understands it may use sectors that hold 4096 (4K) bytes. This leads to lots more room in a RAM slack situation. If a computer using such a device is running an older OS, such as Windows XP, the 512 byte limit for sectors still applies.

  • Drive slack - So, if the cluster holds a specific number of sectors, what if the file only used some of those sectors when it was saved? Does Windows fill the rest of those sectors, too? No, but something else interesting happens. If anything was ever written to those sectors before, it remains there undisturbed until there is a need to write to them. This means that some sectors at the end of a cluster may hold old data that the user thought was deleted. The data held in those sectors is called Drive slack. Again, you never know what might be in it.

    In the illustration below, the last sector of the second cluster (item G) is Drive slack. The new file has not overwritten whatever was in that sector already.