|
|
CIS 2210 - Database Management and Design
Chapter 1: Database Systems
Objectives:
This lesson discusses material from chapter 1. Objectives
important to this lesson:
- Data and information
- Types of databases
- The importance of good design
- Evolution from file data systems
- Problems with file data systems
- Components of a database system
- Functions of a DBMS
Concepts:
Introduction
The chapter begins with a discussion of two important terms:
data and information. When these terms are used in database
discussions, they usually have two different meanings.
- Data means a collection of facts, like
customer names, a company's gross sales, the average temperatures for a
series of dates at some location, or the price of a share of stock on
specific dates. Data needs to be collected, stored, and
retrieved to have any value.
- Information is created by examining data, processing
it, so that an analyst can see the importance of related data.
When did we have larger or smaller income? What were
the changes related to? Are our business processes profitable
as they are, or should they be changed? For information to be useful,
it needs to be true (accurate), it needs to matter to the customer
(relevant), and it needs to lead to making better choices about
business practices (decision making).
One of the objectives of having a database system is to
provide a tool that takes us from having only data to having both data
and information from that data. A database system is defined by the
text as a "shared, integrated computer structure that stores a
collection of data and metadata. Metadata is information about the
regular data: what kind of data is it, where did it come from, who is
allowed to update it, and more information about the management of the
data itself.
Most modern databases are constructed and used inside a
database management system (DBMS), which includes a collection of
programs to manage and manipulate the data, and to control who is
allowed to access, change, and delete specific data. A DBMS should be
able to provide several standard features to its users and
administrators:
- provide one interface for users, and another for
administrators
- allow data to be shared or restricted
- present users with custom views of the data they are
allowed to use
- accept requests from users and from applications to perform
allowed operations on the data
- provide a simpler method of requesting data than the actual
programming language that can be used by advanced users/administrators
Types of Databases
The text provides a list of types of databases, distinguished
by their use, location, or intended purpose:
- single-user databases can be used by one user at a
time; an example would be a database that runs on a single workstation
- multiuser databases can be used by multiple users at
the same time; some can be used by the members of a workgroup, other
can be used by many users in multiple locations
- centralized databases are located on a centrally
located server, like databases that are run from a single mainframe
- distributed databases are located in multiple
locations, like the Active Directory databases in Windows networks
- cloud databases are run from cloud based data
servers, which can simple mean that they are housed on someone else's
network or servers
- general-purpose databases are designed to be useful
for many kinds of businesses or services
- discipline-specific databases are designed for a
single kind of use, such as hospital or medical office databases
- operational databases are designed to support the
functions of a particular business or user
- analytical databases are designed to support
decision making from analysis of historical data
- data warehouse databases also support decision making
- online analytical processing (OLAP) systems
provide tools that are meant to be used with data warehouse databases;
a particular brand of such tools is Business Intelligence
Databases can also be categorized by the way they treat or
store the data that they use:
- unstructured data is kept in whatever state is is
collected in
- structured data is formatted to fit the needs of the
database
- semistructured data is a compromise between the two
methods above
- extensible markup language (XML) is a way of
representing data that is more useful to web based tools
Good Design
The text mentions some goals that should be pursued when
designing a database:
- focus on the structure that will store and manage the data
- make sure the system makes management easy
- make sure that information generated by the system has value
and is accurate
- make sure that errors can be traced easily in the system
Evolution of Database Systems
The
most basic kind of database is simply a
list of data. Lists become tables when we track several kinds of
information in simple data files. The text reminds us briefly that
there were databases long before there were computers. At one point
they were paper files in filing cabinets. Specialists in using such
systems could create meaningful reports, but such things were produced
much more slowly than even simple computer systems can do.
Paper systems evolved to be files stored and maintained in
computers. Spreadsheet programs were, and can still be, used as good
tools to create what are often called flat file databases. They handle
simple data needs well, but fall short of the capabilities of more full
featured DBMS databases that are common now.
Flat file databases store many information items redundantly.
For instance, each order placed by a given customer features that
customer's name and ID number. Needed information, of course, but it
would take much less storage space to use a database instead that can
store customer information one, and use it as needed. Each order
presents the need to enter this same information in flat files,
presenting the opportunity for error, which would cause data to be left
out of reports that might be generated based on matching any one
variation of the customer's name. A database allows us to store the
customer's information in one place. This information is then accessed
each time it is needed, providing for consistency.
Shortcomings
of spreadsheets are described:
- spreadsheets have limited security
- spreadsheets have limited ability to allow multiple users
to access or update the data
- spreadsheets become hard to manage when very large
File based systems suffer from redundancy in one system, and
even worse redundancy when multiple departments of a company each have
their own file based system.
Basic Database Terms
On the other hand, the use of tables, with or without
spreadsheets, has led to some commonly understood database terms:
- entity - something that you want to store data
about, such as customers, accounts, employees, inventory, places, or
events
- attribute - a characteristic of an entity; also
called a field, column, or property;
in spreadsheets, attributes are typically stored in columns of data
- record - all the data about one instance of an
entity, such as the attributes of one order, one student, or one
customer in a database
- relationship - an association between entities in
your database. Customers are associated with their sales reps, in that
each customer has one and only one sales rep. This is a relationship.
From the point of view of the sales rep, this is a one to many
relationship, since each sales rep has many customers,
but each customer has only one sales rep. Relationships can be one to
one, one to many, many to one, or many to many.
- data file - all data in a database is stored in
files, including lists of entities, their attributes, and the
relationships between entities
File based systems were useful in the days before computers
made other choices possible. Even while they were the only choice,
their points of failure were recognized. Most of those shortcomings
were present even when computers were used with a file based system:
- it could take a long time to create, learn, and use one
- it always took a long time to harvest information from such
systems
- maintenance of file systems was difficult
- the only security on the old systems was physical security,
due to the physical nature of the paper records; the nature of the
records also meant anyone using the system had to be present at the
site where it was kept
- when file systems are used on computers, there is still
minimal security on them, and sharing such a system means copying the
entire file structure and its contents to another location
- using such a system typically requires a lot of programming
to make it work as a system
File systems do not support features that are commonly found
in DBMS systems:
- Structural dependence means that accessing the data
in a file depends on the structure of the file itself. File systems
typically require their data files to be structured in the same way,
and a change in structure would require a change in all the programs
your staff have created that use those data files.
- Structural independence means that a file's
structure can be changed, or a new file can have a different structure,
and neither of those things require changing the programs that use the
data in those files. This is generally true when using DBMS systems.
- Data independence means that we can change the
structure of the database, and add functions to the DBMS, without
having to reenter all the data. This is a characteristic of DBMS
systems.
- Data dependence means that we have to reenter data
for those kinds of changes, which is common for file systems.
The thing that allows DBMS systems to function more freely is
the fact that they look for data according to logical names and
locations, instead of looking for data by physical names.
It is important to control redundancy and other database
problems by having and maintaining a good design. The text lists three
kinds of anomalies that indicate that a design change is needed. Each
of them is encountered with a change to the system's data:
- update anomalies
- insertion anomalies
- deletion anomalies
Next, the text reintroduces the idea of a database
management system (DBMS). A DBMS is any software solution
that allows users to store and retrieve data, and produces reports
needed by the users. Users interact with the DBMS directly or through
an interface of some sort. Users typically can add, delete, and modify
records through a DBMS. For example, users of a sales database should
be able to ask the DBMS to retrieve all orders for a specific part, or
all commission payments to specific sales people. Making such a request
is called running a query. (Query is another word for question.)
Some examples of DBMSs are Access, Oracle, DB2, and SQL Server.
Advantages and Disadvantages of Database Processing
'This is a standard list of virtues of database
systems:
- Getting more information from the same amount of data
- You need to understand what the authorr means by those two terms. Data
may be considered to be raw facts. Information is
what we get when we process data to gain knowledge. A DBMS may help us
see a relation between data elements that we did not think were related.
- Sharing data - A DBMS will allow more users to
access the stored data than a spreadsheet system can.
- Balancing conflicting requirements - One of the jobs
of a DBA is to impose structure on the database so
that the company can get the most use out of it. Serving the needs of
the greatest number of users (the company as a whole) is usually the
goal.
- Controlling redundancy - Redundancy means saving the
same thing in several places. Saving a data element in only one place
conserves storage space, and makes updating the system much simpler.
Redundancy cannot be eliminated, but it can be reduced. A data
anomaly is a problem that occurs with data. One kind of data
anomaly happens when redundant data does not match, which will cause
data processing to fail.
- Facilitating consistency -
Consistency is a direct result of reducing redundancy. When a data
element is stored in fewer places, it is less likely to be stored
incorrectly, or fail to be updated.
- Improving integrity - Integrity constraints
are rules for databases, such as a rule that a
certain field must contain only numeric data, or a field used in a
query must contain a data element that is in the database (think of a
drop down list). DBMSs allow integrity constraints.
- Expanding security - Security can be the assignment
of passwords for DBMS users, or the assignment of different rights to
various users or groups. A DBMS will typically include security
components for all of its data structures.
- Increasing productivity - In the old days, a
programmer would have to write a program for every query a user wanted
to run on a database. A DBMS will allow that programmer to work on
something else, once the users understand how to use the DBMS. Advanced
users may use the same query language that administrators use to
retrieve data or create reports. This is often some version of SQL,
Structured Query Language.
- Providing data independence - This means that we can
change the structure of the database, and add functions to the DBMS,
without having to reenter all the data.
A short list of disadvantages to using a DBMS:
- Larger file size - The DBMS itself is probably a
larger program file than the simpler system we are converting from.
This may mean upgrading some hardware for a company that is changing to
a DBMS.
- Increased complexity - Since the DBMS is capable of
doing a great deal, it has many features that might be confusing to new
users. (Maybe they should read this book.)
- Greater impact of failure - If we centralize all of
the company's data in one system, the impact of the failure of this
system is greater than the failure of a single system in a company
where each user has part of the data. This means we need to practice
making backups on a regular basis.
- More difficult recovery - Restoration after a data
disaster is more difficult than with a simple spreadsheet system. It is
not impossible, just harder. Think of it like this: If a storm blows
down a house and a tent, it is more difficult to rebuild the house than
it is to put up a new tent. However, when it is done the house will be
a house again, and the tent is still only a tent.
- Vendor dependence - You may be committed to
continuing to use one vendor's products because all of your data is in
one of their DBMSs. This problem occurs when there is too much
dependence on one product.
- Version changes - When there is a major upgrade to a
DBMS product, you may have to schedule downtime for the system while
upgrades take place.
|