Understand the business -
what do they do, how do they do it, and how do they control their
Identify possible frauds -
where are controls weak? what kinds of fraud could be attempted?
Identify symptoms of possible
frauds - how would each of the possible frauds appear in the data?
what would we see in employee behavior data? (are we tracking that?)
Gather data with technology
- what are the expected anomalies? what can we find that is unexpected?
run queries on databases to extract relevant data
Analyze data - find indicators
of fraud and other problems, then investigate to make sure
Investigate symptoms - if
fraud found go to 8. follow-up,
if not, go to 7. create new controls
to detect and correct symptoms
The process should be considered a cycle that should be repeated on a regular schedule and again whenever needed.
The text briefly discusses three software packages used in data audits. It mentions two others. This is a moving target, in that software changes regularly, and products come and go in every category. When choosing a product, you should look at reviews of currently available software, and you should consult reputable sources, like business partners and trade associations, to find out what products are recommended by people you respect.
The text continues with a discussion of access to data. It includes the
idea that an analyst should be happy to have read
only access to data because analysis can still be done, and it
prevents accusations about the analyst changing the data.
the section on analysis, the text introduces a theory called Benford's
Law. Frank Benford, a physicist, examined a theory proposed
by Simon Newcomb, an astronomer. The theory says that the first digits
of actual numbers in a set will most often follow a nonrandom
distribution. This distribution, shown in the graphic on the right, says
that there is a 30% chance that the first digit in a number will be a
1, an 18% chance that it will be a 2, and so on down the indicated curve.
This is counter intuitive. You might think that digits in real, naturally
occurring numbers would be more random. You can follow the link above
to a Wikipedia article about the subject.
The text tells us that financial data often follow Benford's Law, and
that examination of financial data sets usually shows this to be true,
except where numbers are assigned
or where fraud is taking place.
In the image above, we see seven data sets examined for the applicability of Benford's Law. Benford's Law is shown as the black curve. All of the data sets in this example seem to fit, with the exception of lottery numbers, which are meant to be completely random. Lottery numbers fit the horizontal line predicting the same probability for each digit. You can click the image above to follow a link to another article about this concept.
The text discusses outliers,
which are values that do not seem to belong to their data set. The text
proposes increasingly higher numbers for the cost of a broom ($10, $25,
$100, $1500), asking us where we would call for an investigation of such
charges. In order to answer the question, we should be asking what the
average cost of a broom is in the locality where we buy them. If we had
the prices for brooms from a couple of dozen vendors, we could z-score
analysis of each price to show whether a specific price is
well outside the data set it is supposed to belong in. This type of analysis
assumes that most elements in the data set will fit a bell curve, one
in which there are more data points in the middle of a range, and fewer
at each end. A z-score tells us how many standard deviations away from
the mean a value is. Obviously, this only applies to data expected to
fit under a bell curve.
The next concept is stratification, which means simplifying data sets into collections of tables. The text explains that we could not get a meaningful z-score for brooms if the table in question also included the cost of uniforms, buildings, and parties. We need to examine just the data for like cases, in this example, the prices of brooms. The text warns that this method creates lots of tables from a data set. In the same part of the chapter, the author explains summarization, which is easier to understand. Summarization computes a representative statistic for each case in the tables formed by stratification, which is why the author explained that first. It could, for instance, generate an average price for each product we buy, regardless of the number we buy or the number of vendors we use.
In the same section, the author introduces time trend analysis. Look
at the graph on page 185, which shows how much a specific employees spent
every two weeks. If a company has huge peaks in production at certain
times of the year (candy companies often do), then there may be higher
spending in those periods. The graph in the text shows a growing number
of dollars spent, increasing from late October through March. Are we making
jelly beans? In this case, there was no good reason for the purchases
other than to steal from the company.
Let's move ahead to financial statements, starting on page 187. It lists several common statements that companies compile about their operations, as well as some analysis methods commonly used to examine such statements.