|
|
ITS 4350 - Disaster Recovery
Chapter 3, Contingency Strategies for IR/DR/BC
Objectives:
This lesson is about chapter 3. The chapter is divided into
two parts,
each with its own objective. Objectives important to this lesson:
- Data and application resumption
- Site resumption
Concepts:
Chapter 3
The
scenario that opens chapter 3 leads us to a convincing story. A
mail room employee spills coffee, mops up the mess, and breaks open an
envelope filled with white powder. At this moment, we readers should
pause to express our gratitude that someone at this company
anticipated this event, planned actions to take, trained staff, and
made everyone familiar enough with the plan that its execution began
correctly and continued without a reported glitch.
The title of the chapter is explained briefly with a review of
earlier material. Incident Response
(IR) is what we do when
something happens. Business Continuity
(BC) is what we do to keep the
business operating while can't operate the way we normally do. Disaster Recovery (DR) is what we do to return to
normal operations.
The first half of the chapter is about data and application
resumption, methods that help us when we lose or lose access to data
and applications. On page 93, the primary topic is backup and retention
strategy: what kind of backup do we make, where do we keep it, and how
long will we keep it? The text suggests you should find out if there
are legal requirements you must follow before you set a policy that
winds up being useless.
In the example table on page 93, there are three levels of
priority for data: low, moderate, and high. Each higher level
represents data that is more important to the organization, whose loss
would be more damaging, and whose replacement is needed sooner than
data on lower levels.
Priority
|
Backup/Recovery
|
Low. We might need it eventually.
|
Tape backup, and put it
where we can find it when we have time, at a cold site.
|
Moderate. We need it, but we can
bring it back in a little while.
|
Optical disc, copied over
a WAN, stored in a warm site
if the boss cares, a cold one if
not.
|
High. We need this right away.
|
Live copy (mirror) system,
advanced RAID, kept at a hot site.
|
In case you are not aware, a cold
site may just be office space with a potential for computers and data.
A warm site has computers that
need to be brought up and loaded from recent backups. A hot site has computers that are
running, with our data already available on them.
The text considers online backups and cloud services on page
94. It points out that free services usually have no guarantees that
they will be available, and they often have file space limits. Cloud
storage is simply storing data on someone else's computer,
accessed
across the Internet or through a dedicated data line. Cloud computing
is different: it means that you are running an application on someone
else's computer. For example, you may be running virtual machines on
computers owned by Amazon. That could be part of a business continuity
plan, or it could be a regular part of your business operation.
Regarding cloud storage, the text list three categories that
may be helpful:
- public cloud - service is available to the public over
Internet connections
- community cloud - a shared solution using common equipment
that is not accessible by the public; example: a shared cloud for local
and county government offices, funded by each of them for their common
use
- private cloud - a solution that is run by and for a
particular organization, but accessible without members having to be on
the corporate network to use it
The text discusses modified backup
strategies, and most seem to be
layered concepts, such as backups going to a local cluster of drives
before being sent to tape or cloud storage. This leads to the
discussion of some classic backup
strategies. Each strategy is the same
regardless of the medium being
used. It doesn't matter whether we are
using tape, disc, hard drive, or external storage. First some terms,
and the names of the strategies:
- Target - the device, volume, folder, or
group of files being backed up; the source of the material in a backup
operation
- Archive bit - a binary digit in a file
that is
turned ON when the file is changed;
it is used to flag files that have changed since the last backup; most
backup programs look for files whose archive bits are set to ON, copy
those files, then reset the archive bits (turn them OFF) on the target
files
- Full - a backup of all
files in the target; sets the archive bit of each file to OFF
once the backup is made; your text assumes you know what a full backup
is
- Incremental - a backup of target
files that are new or changed since the last
backup; depends on the fact that programs that change files typically
set the archive bit to ON when a change is made; sets
archive bit to OFF for all files it copies
- Differential - a backup of all
files new or changed since the last Full
backup; copies all files whose archive bit is set to ON;
does not change the archive bit of files it copies
because they will be copied again in the next differential backup
- Copy - like a Full backup, but it
does not change the archive bits of files it copies.
This is typically not part of a standard backup strategy, but an option
to work around the system.
Assume we use a "tape drive" to
make backups. In a Full backup strategy, the entire target is
backed up to tape every time we make a backup tape. This strategy
consumes the most time and the most tapes to carry out
a backup. To restore, we simply restore the most recent
tape(s). This is the least time consuming strategy for restoring,
but the most time consuming for creating backups.
The second method, Incremental backup, means that we
start with a Full backup of the target, and then each
successive backup tape we create only backs up the elements that are
new or changed since the last backup was created. This means
that successive backups will not always be the same length. Therefore,
this is the least time consuming backup, but the most time
consuming restore. To restore, we must first restore the last
Full backup made, and then restore EVERY tape made since
then, to ensure getting all changes.
The third strategy, Differential backup, also starts
with a Full backup tape. Then each successive tape made will
contain all the files changed since the last Full backup was
made. This means that we will have to restore only one or two
tapes in a restore operation. If the last tape made was a Full tape, we
restore only that one. If the last tape made was a Differential tape,
we restore the last Full tape, then the last Differential tape.
The fourth strategy, Copy,
is no different from Full in
terms of backup or restore time, assuming it is a full copy. In both
Incremental and Differential
backup strategies, you will typically use a rotation schedule.
For example, you could have a one week cycle. Once a
week, you make a Full backup, then every day after that you make the
other kind you have chosen to use: Incremental or Differential.
To keep them straight in your mind, remember these facts:
Backup type |
What does it back up? |
What does it do to the archive bit? |
Full |
copies everything |
Resets all archive bits in the target
set. |
Incremental |
everything different from the last backup |
Resets the archive bits of the target
files it copies. |
Differential |
copies everything "different from Full"
(Different from the last Full backup.) |
Does not reset any archive bits. |
Copy |
makes a Full or selected items backup |
Does not reset any archive bits. |
The time required to create backups should be
considered along with the time to restore a backup. When you
consider the two concepts as two sides of the answer to a question
(What method should I use?), the answer may be the most common choice: Differential.
It is the best compromise in terms of backup time versus restore time.
Note also, that all standard methods require a full backup on a regular
cycle. The recommendation is usually to run a Full backup weekly.
The text
discusses fault tolerance, by which it means the ability of a system to
tolerate the failure of a part. In particular, it is concerned about
the failure of a hard drive that holds important data. Systems that
provide tolerance for this kind of event typically use a form of RAID,
which has been defined several ways. Eventually, all hard drives fail.
RAID allows a system to continue in most cases. One common meaning is Redundant
Array of Independent Drives.
The word "independent" seems unnecessary, and is in fact misleading.
Hard drives set up in a RAID array perform functions that relate to
each other. Several kinds of RAID exist to provide for redundant
storage of data or to provide for a means to recover lost data. The
text lists several types and discusses a few. Follow the link below to
a nice summary of RAID level features not listed in these notes, as
well as helpful animations to show how they work. Note that RAID
0 does not provide fault tolerance, the
ability to survive a device failure. It only improves read and write
times.
RAID levels and
features:
- RAID 0: Disk striping
- writes to multiple disks, does not provide fault tolerance.
Performance is increased, because each successive block of data in a
stream is written to the next device in the array. Failure of one
device will affect all data. This will provide a performance
enhancement by striping data across multiple disks. This will not
improve fault tolerance, it will in fact decrease
fault tolerance.
- RAID 1: Mirroring and Duplexing
- provides fault tolerance by writing the same data
to two drives. Two mirrored drives
use the same controller card. Two duplexed
drives each have their own controller card. Aside from that difference,
mirroring and duplexing are the same: Two drives are set up so that
each is a copy of the other. If one fails, the other is available.
- RAID 2: Disk
striping with parity. Not widely used. Neither is RAID 3 or RAID 4.
- RAID 5: Parity saved
separately from data - Provides fault tolerance by a different method. Data
is striped across several drives, but parity
data for each stripe is saved on a drive that does not hold data for
that stripe. Workstations cannot use this method. It is only supported
by server operating systems.
- RAID 0+1: Striping and Mirroring
- uses a striped array like RAID 0,
but mirrors the striped array onto another
array, kind of like RAID 1
The chapter mentions that backups for database systems
are more complex, partly due to needing to take them down for the
backup, and partly due to needing to use a specialized backup program
that maintains the relations between data elements. For a database that
is in constant use, a lock and copy method is not practical. It is
better to pursue a live copy method, such as the continuous database
protection scheme mentioned on page 101.
Backup plans should be part or normal operations, but recovery
plans are typically part of contingency planning. Since recovery cannot
be done if backups were never done, the text includes both concepts in
this chapter.
Another new concept is on page 103. Electronic vaulting is off-site
storage of large volumes of your data. It may use old fashioned leased
or dedicated data lines. In the images on the right, users at multiple
locations are using data lines whose use has been purchased from a
telephone company. This was a common method before people began to rely
on the Internet for transmission of public and private data.
Note
that the general concept of passing data in this image is
not really different from using Internet connections, except that there
is a guarantee of quality of service and bandwidth available on a
leased line. The text remarks that electronic vaulting will probably be
slower than local solutions due to the WAN links involved in it. Leased
lines with sufficient bandwidth can overcome this problem.
A less detailed but more robust solution is Remote Journaling, discussed on page 105. This solution is a transaction
recording and copying system, so it tracks transactions on your system,
but it does not copy your entire database. In order to restore to a
lost state using the journal would also require a reference copy of the
databases and systems whose transactions were saved in the journal.
A more complete solution is discussed on page 106. Database (Databank) Shadowing records transactions, and it also keeps a copy of the relevant data, so it is more like a live copy of your operational system.
The text briefly discusses NAS and SAN, two technologies that provide more storage solutions to network users. Network Attached Storage
is typically provided by a device that is added to the network, or by a
dedicated server that provides storage space to users. Standard network
protocols are used to read and write to the new storage space, and
access can be granted by normal means. There is typically some latency
in these systems.
Storage Area Network technology requires direct
connection to a dedicated storage network, typically through wide
bandwidth connections. Only users connected to the SAN can use it.
The author closes this section of the chapter with a
discussion of virtual servers. A virtual machine is like a program that
runs on an actual machine. The virtual machine can run any operating
system that the hardware of the actual (host) machine can support. The
attraction to doing this is that you can run several virtual machines
on one well equipped host machine, and if any of them go down, they can
be brought back up very quickly without having to worry about the
damage to the OS that might have happened on a dedicated server.
Virtual machines typically run in a memory management environment
provided by one of the three products listed on page 109:
- Microsoft's Virtual Server
- VMware's VMware Server
- Oracle VM VirtualBox
|