|
|
CIS 2210 - Database Management and Design
Chapter 13: Business Intelligence and Data Warehouses
Objectives:
This lesson discusses material from chapter 13. Objectives
important to this lesson:
- Business intelligence
- Decision support data
- Data warehouse
- Star schemas
- Online analytical processing (OLAP)
- SQL extensions for OLAP
Concepts:
Chapter 13
Business Intelligence
This chapter is about a Business Intelligence (BI), which we
are told means a system of tools and information that
helps make business decisions. This is a benefit that has been promised
in most of the chapters in this book, starting with the first one.
On page 592, the text provides a graphic of a BI framework,
which is explained briefly as being people and processes,
using technology to work with external and operational
(internal) data to facilitate and enhance the management
and governance of a business. Some of the items in the
illustration carry over into the next concept, the BI architectural
components.
The text lists six components of the BI architecture:
- ETL tools - Tools for Extraction, Transformation, and
Loading of data.
- Data stores - Large data stores are often referred to as
data warehouses, which will contain lots of data to analyze.
- Query and reporting tools - Tools that we find in any good
database to generate information and reports on the data it holds.
- Data visualization - This simply means tools that present
data in graphic ways, like charts and graphs, reports and dashboards.
- Data monitoring and alerting - Tools that allow the system
to call attention to meaningful events.
- Data
analytics - Tools that help users pick the right tools to build and
operate business models that show where the business has been and where
it might go next. Once you have a idea about what to monitor, analytics
can be used to explain a situation, good or bad, and to predict what
may happen with or without change.
Page 594 presents a table of tool types, descriptions of what
they do, and a list of vendors who supply tools like that.
On page 595, the text tells us that the usual information
system (it means a DBMS) is concerned with operational
data, information that assists and records the functions performed by
employees of an organization. This is what every example database in
the text has been about. The example systems hold information about
transactions, inventory, personnel, customers, vendors, and other
critical entities associated with the organization.
A BI system uses operational data, but it requires more, so
that it can focus on strategical
and tactical information. A
critical component is master data
management (MDM),
which holds processes that support generally accepted best practices
for business. The outputs of this subsystem can be used to make governance
decisions, which are decisions about the rules used to run an
organization. The MDM processes will look specifically at key
performance indicators (KPIs).
The text provides a list of commonly used KPIs. Organizations tend to
use specific KPIs that make sense for the kind of business that they do.
- General KPIs are profits, sales, product turnover, product
recalls, sales from promotions, and sales by each unit, employee or
area.
- Financial KPIs are earnings per share of stock, profit,
sales by each employee, percentage of sales not yet received, and
assets.
- Human resources KPIs are number of applicants per job
opening, employee turnover, and employee retention.
- Education KPIs are related mostly to schools: graduation
rates, incoming students, student retention rates, publications by
staff, and staff evaluations.
As the text points out, the appropriate KPIs for an
organization can't be defined before defining what that organization
does, and what its goals are. Goals and KPIs should both be expressed
as things to be completed within a specific time limit. This is the T
in SMART objectives.
Follow the link in the previous sentence to get a quick lesson on SMART
goals and objectives. They make sense whether you are trying to make
plans for a business or for your own life and happiness.
The text gives us a few pages about the evolution of Business
Intelligence systems, but we can skip over it without missing any
necessary information. It is more useful to move to page 601, where the
text briefly discusses some newer versions of BI systems.
- SSD storage devices
improve search time on any DBMS, and that goes for BI systems as well.
- BI appliances are
devices that can be mounted on your network, can access existing data
warehouses, and can be lower cost solutions to adding BI to your
organization.
- BI as a service is
a cloud based service that lets you buy/lease as much as you need on a
vendor's system, from a vendor's system developer, and from the
software packages available from the cloud vendor.
- Big Data analytics
is an approach that is discussed in the next chapter. It uses social
media, which may be applicable to your organization, or it may be a
solution for someone else, not for you.
- Personal analytics - OLAP
systems are Online Analytical Processing systems. These are mentioned
earlier in the chapter as forerunners to BI systems. The text tells
us that mobile BI systems access
processing in the cloud to allow data gathering and decision making
closer to the customer.
Decision Support Data
The next section begins with a warning that a BI system
depends greatly on the operational data that it is given. We are
reminded that operational data is stored in a typical database. That
storage is different from a decision
support database in several ways:
- Time span -
Decision support data must cover longer time frames than operational
data. It is more strategic. Operational data is about specific points
in time, and decision data is about history and projections into the
future.
- Granularity -
Decisions vary from highly granular through highly aggregated. A
decision maker needs to be able to start at any level and move to a
more atomic or more aggregated level, drilling down or drilling up
through the data as needed to do so.
- Dimensionality -
The text explains that this relates to examining the data across
different parameters, not just across time. You may want to examine
your data as it relates to any size area, to customers, to products,
and more, depending on the dimensions stored in your database.
- Currency and volume
- Operational data is about keeping the individual data up to date, but
decision support data is about large numbers of transactions done for
long periods.
- Table nature -
Operation data is typically stored in normalized relations, which
results in many tables. Decision support data is stored in aggregated
results from those normalized tables.
- Data models -
Operational data models use the normalized tables noted above, and have
lots of detail. Decision support models rely on querying that data to
create the tables they really want.
More characteristics of the differences between operational
and decision support data are shown in the table at the top of page
605, Table 13.5. On page 606, the text shows us two tables that
illustrate a difference that causes a need for a decision support
system. The first one shows sales figures for each of ten years at a
single department in a single store. This data is easy to read. The
second table shows some of the data for the same years from each of two
stores, each of which has two departments in it. This small change
makes the table much longer, and makes it much harder to spot trends.
That is what a decision support system is for, especially in more
complex organizations.
Data Warehouse
The next part of the chapter talks about the meaning of a data
warehouse. In short, it is a huge collection of data for a complex
organization that covers many entities over long periods of time.
- It has to be integrated.
In this context that means that data about various entities that are
similar is stored in the same way, so it can be compared from one place
to another, and from one time to another. For a given attribute, the
same values are used across all parts of the organization, in order to
make comparisons possible.
- It has to be subject
oriented.
Data elements are collected and organized according to topics of
interest to the decision makers and to the processes in the BI system.
- It has to be time variant.
This means it has to continue to exist over long periods, so we can
examine the history and trends of what happened in the company.
- It has to be nonvolatile.
This means that we have to be certain of the information, and that it
is not subject to revision or change. We need to know the truth about
what happened, so we can compare any truth to any other.
Table 13.8 on page 609 summarized the text's discussion of
these characteristics.
Earlier in the text, a line implied that a data warehouse and a data mart were the same thing. This
is clarified on page 610, where we learn that a data mart is a limited data warehouse, about a single subject
such as one store in a chain, or one division in a large company. It
may be easier to create a BI system from a data mart, but the danger of
doing so is its loss of the larger scope: you won't be able to compare
the subject to similar subjects across the organization.
Star Schemas
A star schema is a different way to look at your data that is
useful in moving from a relational database to a BI database.

In the example above from Wikipedia, you see one Fact table in the center, with three
Dimension tables arranged
in a star around it. That is the basic system.
- Facts - objective
numeric values about some entity. In the example above, you see four
values that apply to sales in our company.
- Dimensions - These
are items in other tables that the elements in the fact table relate
to. We ask questions like "are the sales higher or lower on specific
days at specific stores for any kind of product?"
- Attributes - As you
can see in the example, a dimension table has attributes that may be of
importance in making decisions about our company.
- Attribute hierarchies
- The truth to be gained from the data iis likely to be found in
combinations of the attributes across the dimensions. In each
dimension, we expect to find what we see in the example above, an
hierarchy of data that goes from the general to the specific in that
dimension. The illustration on page 615 shows the idea of considering
specific combinations of attributes in each dimension. Moving up and
down the hierarchies can lead us to new information we can use to make
choices about what to do next.
- What is not obvious from the example is that the Facts table will be very large,
containing all the numeric data from all the Dimension table
comparisons, in all their logical combinations. This is accomplished by
making the primary key of the Facts table a composite of all the
primary keys from the Dimension tables.
The more stars (Dimension tables) you have around
the Fact table, the more combinations
of attributes this creates in the Facts table. The text illustrates an
example with three Dimension tables and only three attributes to care
about in each one. That is not realistic. Real star schemas will have
many potential stars and lots of attributes in each of them, leading to
multidimensional geometry. Take a look at the increasingly complex
geometry on this page in Wikipedia. It is a lot more
digestible than most of the material I found in a quick search for a
lesson on multidimensional geometries. (You can click the image on the
right instead of the link. You go to the same page.)
The text offers some advice for improving performance in star
schemas:
- Normalize your
dimensional tables.
- Denormalize your
fact tables. They are result tables, so normalizing them is not
practical.
- Create more fact tables to represent different
aggregations. This means new schemas.
- Partition and replicate tables. They get big
enough to want to do this.
For further reference, take a look at this Microsoft
article that describes star schemas and snowflake schemas.
Online Analytical Processing (OLAP)
On page 621, we start a new topic that covers a concept
mentioned earlier: Online Analytical Processing (OLAP). We are told
that it concerns three main ideas, which continue to relate to the
material above.
Multidimensional data
analysis - Having led us into multiple Dimensions, the text
offers hope that we can analyze the data held in them. Users of the
database will have different interests, causing them to want different
views of the data available. The data are more meaningful when
presented in different ways, when combined from different dimensions in
different aggregations, and when viewed through PKIs that interest the
user requesting the data.
- Advanced database support
- OLAP tools need to access data from diifferent kinds (disparate)
systems. A useful feature mentioned in the text is the ability to query
the system in business terms, what developers have typically called
using "natural language". Well, it is natural to the speaker of that
language. The idea is that the user can request data in terms the user
knows that have been defined in the database as parameters it can
actually use.
- Easy-to-use
end-user interfaces - We always have to simplify reality for the
users. It is often best if they do not know how the database actually
functions. You may wish you were insulated from it as well, at this
point.
The text goes on to describe the intensive processing that
OLAP requires. This is not directly related to the material you are
expected to absorb from the chapter, so let's stop the presentation of
new material here.
|