The scenario that opens chapter 3 leads us to a convincing story. A mail room employee spills coffee, mops up the mess, and breaks open an envelope filled with white powder. At this moment, we readers should pause to express our gratitude that someone at this company anticipated this event, planned actions to take, trained staff, and made everyone familiar enough with the plan that its execution began correctly and continued without a reported glitch.
The title of the chapter is explained briefly with a review of
earlier material. Incident
Response (IR) is
what we do when something happens. Business
is what we do to keep the business operating while can't operate
the way we normally do. Disaster
Recovery (DR) is
what we do to return to normal operations.
The first half of the chapter is about data and application resumption, methods that help us when we lose or lose access to data and applications. On page 93, the primary topic is backup and retention strategy: what kind of backup do we make, where do we keep it, and how long will we keep it? The text suggests you should find out if there are legal requirements you must follow before you set a policy that winds up being useless.
In the example table on page 93, there are three levels of priority for data: low, moderate, and high. Each higher level represents data that is more important to the organization, whose loss would be more damaging, and whose replacement is needed sooner than data on lower levels.
In case you are not aware, a cold site may just be office space with a potential for computers and data. A warm site has computers that need to be brought up and loaded from recent backups. A hot site has computers that are running, with our data already available on them.
The text considers online backups and cloud services on page 77. It points out that free services usually have no guarantees that they will be available, and they often have file space limits. Cloud storage is simply storing data on someone else's computer, accessed across the Internet or through a dedicated data line. Cloud computing is different: it means that you are running an application on someone else's computer. For example, you may be running virtual machines on computers owned by Amazon. That could be part of a business continuity plan, or it could be a regular part of your business operation.
Regarding cloud storage, the text list three categories that may be helpful:
Assume we use a "tape drive" to make backups. In a Full backup strategy, the entire target is backed up to tape every time we make a backup tape. This strategy consumes the most time and the most tapes to carry out a backup. To restore, we simply restore the most recent tape(s). This is the least time consuming strategy for restoring, but the most time consuming for creating backups.
The second method, Incremental backup, means that we start with a Full backup of the target, and then each successive backup tape we create only backs up the elements that are new or changed since the last backup was created. This means that successive backups will not always be the same length. Therefore, this is the least time consuming backup, but the most time consuming restore. To restore, we must first restore the last Full backup made, and then restore EVERY tape made since then, to ensure getting all changes.
The third strategy, Differential backup, also starts with a Full backup tape. Then each successive tape made will contain all the files changed since the last Full backup was made. This means that we will have to restore only one or two tapes in a restore operation. If the last tape made was a Full tape, we restore only that one. If the last tape made was a Differential tape, we restore the last Full tape, then the last Differential tape.
The fourth strategy, Copy, is no different from Full in terms of backup or restore time, assuming it is a full copy. In both Incremental and Differential backup strategies, you will typically use a rotation schedule. For example, you could have a one week cycle. Once a week, you make a Full backup, then every day after that you make the other kind you have chosen to use: Incremental or Differential.
To keep them straight in your mind, remember these facts:
The time required to create backups should be considered along with the time to restore a backup. When you consider the two concepts as two sides of the answer to a question (What method should I use?), the answer may be the most common choice: Differential. It is the best compromise in terms of backup time versus restore time. Note also, that all standard methods require a full backup on a regular cycle. The recommendation is usually to run a Full backup weekly.
The text discusses fault tolerance, by which it means the ability of a system to tolerate the failure of a part. In particular, it is concerned about the failure of a hard drive that holds important data. Systems that provide tolerance for this kind of event typically use a form of RAID, which has been defined several ways. Eventually, all hard drives fail. RAID allows a system to continue in most cases. One common meaning is Redundant Array of Independent Drives. The word "independent" seems unnecessary, and is in fact misleading. Hard drives set up in a RAID array perform functions that relate to each other. Several kinds of RAID exist to provide for redundant storage of data or to provide for a means to recover lost data. The text lists several types and discusses a few. Follow the link below to a nice summary of RAID level features not listed in these notes, as well as helpful animations to show how they work. Note that RAID 0 does not provide fault tolerance, the ability to survive a device failure. It only improves read and write times.
The chapter mentions that backups for database
systems are more complex, partly due to needing to take
them down for the backup, and partly due to needing to use a
specialized backup program that maintains the relations between
data elements. For a database that is in constant use, a lock and
copy method is not practical. It is better to pursue a live copy
method, such as the continuous database protection scheme
mentioned on page 88.
Backup plans should be part or normal operations, but recovery plans are typically part of contingency planning. Since recovery cannot be done if backups were never done, the text includes both concepts in this chapter.
Another new concept is on page 86. Electronic vaulting is off-site storage of large volumes of your data. It may use old fashioned leased or dedicated data lines. In the images on the right, users at multiple locations are using data lines whose use has been purchased from a telephone company. This was a common method before people began to rely on the Internet for transmission of public and private data.
Note that the general concept of passing data in this image is
not really different from using Internet connections, except that
there is a guarantee of quality of service and bandwidth available
on a leased line. The text remarks that electronic vaulting will
probably be slower than local solutions due to the WAN links
involved in it. Leased lines with sufficient bandwidth can
overcome this problem.
A less detailed but more robust solution is Remote Journaling, discussed on page 87. This solution is a transaction recording and copying system, so it tracks transactions on your system, but it does not copy your entire database. In order to restore to a lost state using the journal would also require a reference copy of the databases and systems whose transactions were saved in the journal.
A more complete solution is discussed on page 88. Database (Databank) Shadowing records transactions, and it also keeps a copy of the relevant data, so it is more like a live copy of your operational system.
The text briefly discusses NAS and SAN, two technologies that
provide more storage solutions to network users. Network
Attached Storage is typically provided by a device that
is added to the network, or by a dedicated server that provides
storage space to users. Standard network protocols are used to
read and write to the new storage space, and access can be granted
by normal means. There is typically some latency in these systems.
Storage Area Network
technology requires direct
connection to a dedicated storage network, typically through wide
bandwidth connections. Only users connected to the SAN can use it.
The author closes this section of the chapter with a discussion of virtual servers. A virtual machine is like a program that runs on an actual machine. The virtual machine can run any operating system that the hardware of the actual (host) machine can support. The attraction to doing this is that you can run several virtual machines on one well equipped host machine, and if any of them go down, they can be brought back up very quickly without having to worry about the damage to the OS that might have happened on a dedicated server. Virtual machines typically run in a memory management environment provided by one of the three products listed on page 91: