|
|
ITS 4350 - Disaster Recovery
Chapter 3, Contingency Strategies for IR/DR/BC
Objectives:
This lesson is about chapter 3. The chapter is divided into two
parts, each with its own objective. Objectives important to this
lesson:
- Data and application resumption
- Site resumption
Concepts:
Chapter 3
The scenario that opens chapter 3 leads us to a convincing story.
A mail room employee spills coffee, mops up the mess, and breaks
open an envelope filled with white powder. At this moment, we
readers should pause to express our gratitude that someone at this
company anticipated this event, planned actions to take, trained
staff, and made everyone familiar enough with the plan that its
execution began correctly and continued without a reported glitch.
The title of the chapter is explained briefly with a review of
earlier material. Incident
Response (IR) is
what we do when something happens. Business
Continuity (BC)
is what we do to keep the business operating while can't operate
the way we normally do. Disaster
Recovery (DR) is
what we do to return to normal operations.
The first half of the chapter is about data and application
resumption, methods that help us when we lose or lose access to
data and applications. On page 93, the primary topic is backup and
retention strategy: what kind of backup do we make, where do we
keep it, and how long will we keep it? The text suggests you
should find out if there are legal requirements you must follow
before you set a policy that winds up being useless.
In the example table on page 93, there are three levels of
priority for data: low, moderate, and high. Each higher level
represents data that is more important to the organization, whose
loss would be more damaging, and whose replacement is needed
sooner than data on lower levels.
Priority
|
Backup/Recovery
|
Low.
We might need it eventually.
|
Tape backup, and put it where
we can find it when we have time, at a cold
site.
|
Moderate.
We need it, but we can bring it back in a little while.
|
Optical disc, copied over a
WAN, stored in a warm
site if the boss cares, a cold
one if not.
|
High.
We need this right away.
|
Live copy (mirror) system,
advanced RAID, kept at a hot
site.
|
In case you are not aware, a cold
site may just be office space with a potential for computers and
data. A warm site has
computers that need to be brought up and loaded from recent
backups. A hot site has
computers that are running, with our data already available on
them.
The text considers online backups and cloud services on page 77.
It points out that free services usually have no guarantees that
they will be available, and they often have file space limits. Cloud storage is simply
storing data on someone else's computer, accessed across the
Internet or through a dedicated data line. Cloud
computing is different: it means that you are running an
application on someone else's computer. For example, you may be
running virtual machines on computers owned by Amazon. That could
be part of a business continuity plan, or it could be a regular
part of your business operation.
Regarding cloud storage, the text list three categories that may
be helpful:
- public cloud - service is available to the public over
Internet connections
- community cloud - a shared solution using common equipment
that is not accessible by the public; example: a shared cloud
for local and county government offices, funded by each of them
for their common use
- private cloud - a solution that is run by and for a particular
organization, but accessible without members having to be on the
corporate network to use it
The text discusses modified backup
strategies, and most seem to be layered concepts, such as
backups going to a local cluster of drives before being sent to tape
or cloud storage. This leads to the discussion of some classic
backup strategies. Each strategy is the same regardless of
the medium being used. It
doesn't matter whether we are using tape, disc, hard drive, or
external storage. First some terms, and the names of the strategies:
- Target - the device, volume, folder, or group
of files being backed up; the source of the material in a backup
operation
- Archive bit - a binary digit in a file that
is turned ON when the file is changed;
it is used to flag files that have changed since the last
backup; most backup programs look for files
whose archive bits are set to ON, copy those files, then reset
the archive bits (turn them OFF) on the target
files
- Full - a backup of all files
in the target; sets the archive bit of each file to OFF
once the backup is made; your text assumes you know what a full
backup is
- Incremental - a backup of target
files that are new or changed since the last
backup; depends on the fact that programs that change files
typically set the archive bit to ON when a
change is made; sets archive bit to OFF for
all files it copies
- Differential - a backup of all files
new or changed since the last Full
backup; copies all files whose archive bit is set to ON;
does not change the archive bit of files it
copies because they will be copied again in the next
differential backup
- Copy - like a Full backup, but it does
not change the archive bits of files it copies.
This is typically not part of a standard backup strategy, but an
option to work around the system.
Assume we use a "tape drive" to make backups. In a Full
backup strategy, the entire target is backed up to tape every time
we make a backup tape. This strategy consumes the most time
and the most tapes to carry out a backup. To restore,
we simply restore the most recent tape(s). This is the least
time consuming strategy for restoring, but the most
time consuming for creating backups.
The second method, Incremental backup, means that we
start with a Full backup of the target, and then each
successive backup tape we create only backs up the elements that
are new or changed since the last backup was created. This
means that successive backups will not always be the same length.
Therefore, this is the least time consuming backup, but
the most time consuming restore. To restore, we
must first restore the last Full backup made, and then
restore EVERY tape made since then, to ensure getting all
changes.
The third strategy, Differential backup, also starts
with a Full backup tape. Then each successive tape made
will contain all the files changed since the last Full backup
was made. This means that we will have to restore only one
or two tapes in a restore operation. If the last tape made was a
Full tape, we restore only that one. If the last tape made was a
Differential tape, we restore the last Full tape, then the last
Differential tape.
The fourth strategy, Copy, is no different from
Full in terms of backup or restore time, assuming it is a full
copy. In both Incremental and Differential backup strategies, you
will typically use a rotation schedule. For example,
you could have a one week cycle. Once a week, you make a Full
backup, then every day after that you make the other kind you have
chosen to use: Incremental or Differential.
To keep them straight in your mind, remember these facts:
Backup type |
What does it back up? |
What does it do to the archive bit? |
Full |
copies everything |
Resets all archive bits in the target
set. |
Incremental |
everything different from the last backup |
Resets the archive bits of the target
files it copies. |
Differential |
copies everything "different from Full"
(Different from the last Full backup.) |
Does not reset any archive bits. |
Copy |
makes a Full or selected items backup |
Does not reset any archive bits. |
The time required to create backups should be
considered along with the time to restore a backup. When
you consider the two concepts as two sides of the answer to a
question (What method should I use?), the answer may be the most
common choice: Differential. It is the best compromise in
terms of backup time versus restore time. Note also, that all
standard methods require a full backup on a regular cycle. The
recommendation is usually to run a Full backup weekly.
The text discusses fault tolerance, by which it means the ability
of a system to tolerate the failure of a part. In particular, it
is concerned about the failure of a hard drive that holds
important data. Systems that provide tolerance for this kind of
event typically use a form of RAID, which has
been defined several ways. Eventually, all hard drives fail. RAID
allows a system to continue in most cases. One common meaning is Redundant
Array of Independent Drives. The word "independent"
seems unnecessary, and is in fact misleading. Hard drives set up
in a RAID array perform functions that relate to each other.
Several kinds of RAID exist to provide for redundant storage of
data or to provide for a means to recover lost data. The text
lists several types and discusses a few. Follow the link below to
a nice summary of RAID level features not listed in these notes,
as well as helpful animations to show how they work. Note that RAID
0 does not provide fault tolerance,
the ability to survive a device failure. It only improves read and
write times.
RAID levels and features:
- RAID 0: Disk striping
- writes to multiple disks, does not provide fault tolerance.
Performance is increased, because each successive block of data
in a stream is written to the next device in the array. Failure
of one device will affect all
data. This will provide a performance enhancement
by striping data across multiple disks. This will not
improve fault tolerance, it will in fact decrease
fault tolerance.
- RAID 1: Mirroring and Duplexing
- provides fault tolerance by writing the same
data to two drives. Two mirrored
drives use the same controller card. Two duplexed
drives each have their own controller card. Aside from that
difference, mirroring and duplexing are the same: Two drives are
set up so that each is a copy of the other. If one fails, the
other is available.
- RAID 2: Disk striping
with parity. Not widely used. Neither is RAID 3 or RAID 4.
- RAID 5: Parity saved
separately from data - Provides fault tolerance by a different
method. Data is striped
across several drives, but parity data for
each stripe is saved on a drive that does not hold data for that
stripe. Workstations cannot use this method. It is only
supported by server operating systems.
- RAID 0+1: Striping and Mirroring
- uses a striped array like
RAID 0, but mirrors the striped array onto another
array, kind of like RAID 1
The chapter mentions that backups for database
systems are more complex, partly due to needing to take
them down for the backup, and partly due to needing to use a
specialized backup program that maintains the relations between
data elements. For a database that is in constant use, a lock and
copy method is not practical. It is better to pursue a live copy
method, such as the continuous database protection scheme
mentioned on page 88.
Backup plans should be part or normal operations, but recovery
plans are typically part of contingency planning. Since recovery
cannot be done if backups were never done, the text includes both
concepts in this chapter.
Another new concept is
on page 86. Electronic vaulting
is off-site storage of large volumes of your data. It may use old
fashioned leased or dedicated data lines. In the images on the
right, users at multiple locations are using data lines whose use
has been purchased from a telephone company. This was a common
method before people began to rely on the Internet for
transmission of public and private data.
Note that the general concept of passing data in this image is
not really different from using Internet connections, except that
there is a guarantee of quality of service and bandwidth available
on a leased line. The text remarks that electronic vaulting will
probably be slower than local solutions due to the WAN links
involved in it. Leased lines with sufficient bandwidth can
overcome this problem.
A less detailed but more robust solution is Remote
Journaling, discussed on page 87. This solution is a transaction recording and
copying system, so it tracks transactions on your system, but it
does not copy your entire database. In order to restore to a lost
state using the journal would also require a reference copy of the
databases and systems whose transactions were saved in the
journal.
A more complete solution is discussed on page 88. Database
(Databank) Shadowing
records transactions, and
it also keeps a copy of the relevant data,
so it is more like a live copy of your operational system.
The text briefly discusses NAS and SAN, two technologies that
provide more storage solutions to network users. Network
Attached Storage is typically provided by a device that
is added to the network, or by a dedicated server that provides
storage space to users. Standard network protocols are used to
read and write to the new storage space, and access can be granted
by normal means. There is typically some latency in these systems.
Storage Area Network
technology requires direct
connection to a dedicated storage network, typically through wide
bandwidth connections. Only users connected to the SAN can use it.
The author closes this section of the chapter with a discussion
of virtual servers. A virtual machine is like a program that runs
on an actual machine. The virtual machine can run any operating
system that the hardware of the actual (host) machine can support.
The attraction to doing this is that you can run several virtual
machines on one well equipped host machine, and if any of them go
down, they can be brought back up very quickly without having to
worry about the damage to the OS that might have happened on a
dedicated server. Virtual machines typically run in a memory
management environment provided by one of the three products
listed on page 91:
- Microsoft's Hyper-V Virtual Server
- VMware's vSphere/ESXi
- Oracle VM VirtualBox
- Citrix XenServer
|