Chapter 9, Disaster Recovery: Preparation and Implementation
Objectives:
This lesson is about chapter 9. Objectives important to this
lesson:
Disaster recovery
Maintenance
Investigations
Concepts:
Chapter 9
The major concepts in this chapter were covered, to a large part,
in the notes for last week. Let's start with some background that
tries to put recovery operations in perspective.
Here are some alarming statistics from a similar text:
90% of companies having a data center disruption lasting 10
days or more went into bankruptcy
40% of companies that have disasters shut down and never
reopen
30% of companies that have disasters fail within two years
The three statistics above should tell you to have a plan in
place that will not let
the disaster close your company.
Disasters can be classified based on some of their
characteristics. Here are two points of view:
natural disaster or man-made disaster - Is the
disaster caused by nature, or by an act of a human being?
Consider the list in table 9-1. Most of the disasters listed are
clearly natural (e.g. earthquake, tornado, tsunami) while others
require more detail to determine their class. Can a fire be
caused by a person? How about a flood? How about other kinds of
mayhem? (Cue videos...)
Mayhem comes in both classifications: natural and man-made. The
question is what can we do to avoid or minimize the mayhem?
Another point of view is time based. Does the disaster have a
rapid onset or a slow onset? Is it more like
a storm or more like global warming? This can lead to different
types of responses, and should suggest different sorts of
causes.
What should be crossing your mind now is why
the category you place a
disaster in should matter? Are the plans in the red
books or the blue books
depending on the classification? If that is how your organization
works, go for it.
The text often offers the thrilling advice to back up your data.
Yawn, and sip some more caffeine, it's a necessary part of business.
There are a lot of solutions from a lot of vendors. This is a
lecture from an industry professional with ten reasons to do it
better than you are. Others did not, and it was not pretty.
The text changes topics to explain why backups of data are
needed. Can't we just save everything in the cloud? "The cloud" is
just somebody else's server, and it may be exposed to the same
threats as our own equipment. Even the natural disasters that
might happen to us should put fear into the best of us.
Natural disasters can disrupt any business, including data
centers and shrimp boat businesses.
The text continues with some remarks that relate to business
continuity, which means that we keep running the business even
though we have had a problem, a setback,
or a disaster. We should consider the author's
remarks about the effects of a business disruption that might be
reduced by detailed planning and action
based on such planning.
Some definitions from the text may be helpful in understanding
the point of the chapter.
business continuity - continuation
of operations and services despite a disruptive event
continuity of operation - same
as business continuity
risk assessment - analyzing risks,
their effects, and what we can do to reduce
the probability of their occurrence and of their effects
planning and testing - identification
of risks and threats, creating plans to deal
with them, and conducting tests of those plans
business impact analysis - identifying
mission critical business functions to prioritize
our continuity plans
IT contingency plan - a plan to continue
to provide services in case a particular
incident disrupts normal service; a separate
plan will be needed for each type of incident that can occur
disaster recovery plan - restoring
services provided by the enterprise to their standard state;
this is not just about IT services
As we saw in the chapter about the CSIRT, the author thinks we
should form a team for Disaster Recovery. I wonder if he has
considered the possibility that it is the same
team? In a small company, you should expect that this is so. You
should form a group to handle the concerns of each
item in the BIA,
since those are the elements that you decided were important at
the beginning of the planning process. If you need a separate
group for each BIA element, you probably have a very complex
organization. If one team can do the whole thing, good for them.
You are better served if all the teams (however many there are) cooperate and communicate
their intentions to each other. They are all building (or
rebuilding) the same company. Management oversight should be quite
thorough because the end result of this process is our new,
improved organization.
You may want to look over NIST SP 800-34,
Revision 1. This may be the first time we have been
referred to this document, but it seems very much like the one
that was used as a plan for the CSIRT team and its duties. Get
support to make the plan, consult the BIA, propose and add
controls to reduce the need for the plan, plan to end and repair
the disaster, write out the plan, test the plan, then review and
maintain the plan. Familiar enough?
Forensics
The last chapter did not say enough about forensics, so let's
have a bit more from another text. A forensic
investigation is typically one that concerns a crime. This section
is about computer forensics, investigations into
crimes that involve computers and other information system
equipment. From one point of view, there are four aspects of an
investigation:
secure the scene - The team that does this
may be called an Incident Response Team or a Forensics Response
Team, or another title that means the same thing. They are
responsible for taking possession of devices that might hold any
data that might contain evidence of the crime being
investigated. In addition, they should photograph the scene,
document their observations, and record interviews with
witnesses. (Pick your favorite police procedural and study it a
bit.)
preserve and collect the evidence - This
aspect is closely related to the first, in that the response
team may have to take images of data in RAM that would be lost
if not recorded before the power is turned off. Note this Order
of Volatility, which indicates the order in which to
capture data from a running system:
Register, cache, and peripheral memory first
Random Access Memory (RAM) second
Network state third
Running processes fourth
establish (and maintain) the
chain of custody - There must be a continuous
documentation of who has had access to seized devices and data,
who has done what with it, and who it is turned over to at each
change in custody.
examine for legal evidence - Although other
discussions have used the word "evidence" several times, this
one brings up the point that not everything you find is actually
legal evidence. Only things that indicate or prove a crime was
or was not committed can be considered as evidence that will be
presented in court.
Several texts elaborate on memory and storage locations that
should be examined for meaningful data. What you can expect to
find there may surprise you:
Windows page file - This is a hidden
file, typically on the boot drive, that Windows uses to store
"memory pages" that it thinks you are not using presently, like
memory devoted to an application that is minimized. The file is
probably in the root of the drive, and is probably called pagefile.sys.
You should expect to see pieces of anything file that the
computer worked on, especially if it was minimized while the
user worked on something else.
RAM slack - This will take a
minute. When Windows saves files, it saves to sectors
(track sectors) on a drive. Sectors are logically arranged in clusters,
which are the smallest storage area a file system can use. The
number of sectors in a cluster varies depending on the way a
drive was formatted. When a file is saved, it will take a
certain number of clusters to hold it, but the file itself may
not actually fill the last sector used in the last cluster used
to store it. When this happens, older
versions of Windows (before NT) did something you may never have
heard about. They filled the last sector used
for the file with data pulled randomly from RAM.
This data is called RAM slack, a copy of a
piece of RAM that has been stored in the slack space
at the end of a sector. Why did it do this? Windows just worked
that way: it had to fill the rest of the sector. You never knew
what you'd find in it. Since NT, the RAM slack space has been
filled with zeros, so this is less of a problem as time goes by,
except for stand-alone, legacy systems that use older versions
of Windows.
In the simplified illustration below, a file has been saved to
two four-sector clusters, but it only fills six and a half
sectors. The cluster marked in cyan
is full: all four sectors have
been used by the file. The second cluster is
not full. The second half of the seventh sector (item F)
is filled with RAM slack. The eighth
sector has not been used at all, but we will cover that in a
minute.
Note: for many years, a track sector has held a
standard 512 bytes regardless of what device
it was on. As of January 2011, this was no
longer true. A device using Advanced Format on
a system that understands it may use sectors that hold
4096 (4K) bytes. This leads to lots more room
in a RAM slack situation. If a computer using such a device is
running an older OS, the 512 byte limit for sectors still
applies.
Drive slack - So, if the
cluster holds a specific number of sectors, what if the file
only used some of those sectors when it was
saved? Does Windows fill the rest of those sectors, too? No, but
something else interesting happens. If anything was ever
written to those sectors before, it remains there undisturbed
until there is a need to write to them. This means that some sectors
at the end of a cluster may hold old data that the
user thought was deleted. The data held in those sectors is
called Drive slack. You never
know what might be in it.
In the illustration below, the last sector of the second cluster
(item G) is Drive slack. The new file has not
overwritten whatever was in that sector already.