CIS 2210 - Database Management and Design

Chapter 1: Database Systems

Objectives:

This lesson discusses material from chapter 1. Objectives important to this lesson:

  1. Data and information
  2. Types of databases
  3. The importance of good design
  4. Evolution from file data systems
  5. Problems with file data systems
  6. Components of a database system
  7. Functions of a DBMS
Concepts:

Introduction

The chapter begins with a discussion of two important terms: data and information. When these terms are used in database discussions, they usually have two different meanings.

  • Data means a collection of facts, like customer names, a company's gross sales, the average temperatures for a series of dates at some location, or the price of a share of stock on specific dates. Data needs to be collected, stored, and retrieved to have any value.
  • Information is created by examining data, processing it, so that an analyst can see the importance of related data. When did we have larger or smaller income? What were the changes related to? Are our business processes profitable as they are, or should they be changed? For information to be useful, it needs to be true (accurate), it needs to matter to the customer (relevant), and it needs to lead to making better choices about business practices (decision making).

One of the objectives of having a database system is to provide a tool that takes us from having only data to having both data and information from that data. A database system is defined by the text as a "shared, integrated computer structure that stores a collection of data and metadata. Metadata is information about the regular data: what kind of data is it, where did it come from, who is allowed to update it, and more information about the management of the data itself.

Most modern databases are constructed and used inside a database management system (DBMS), which includes a collection of programs to manage and manipulate the data, and to control who is allowed to access, change, and delete specific data. A DBMS should be able to provide several standard features to its users and administrators:

  • provide one interface for users, and another for administrators
  • allow data to be shared or restricted
  • present users with custom views of the data they are allowed to use
  • accept requests from users and from applications to perform allowed operations on the data
  • provide a simpler method of requesting data than the actual programming language that can be used by advanced users/administrators

Types of Databases

The text provides a list of types of databases, distinguished by their use, location, or intended purpose:

  • single-user databases can be used by one user at a time; an example would be a database that runs on a single workstation
  • multiuser databases can be used by multiple users at the same time; some can be used by the members of a workgroup, other can be used by many users in multiple locations

  • centralized databases are located on a centrally located server, like databases that are run from a single mainframe
  • distributed databases are located in multiple locations, like the Active Directory databases in Windows networks
  • cloud databases are run from cloud based data servers, which can simple mean that they are housed on someone else's network or servers

  • general-purpose databases are designed to be useful for many kinds of businesses or services
  • discipline-specific databases are designed for a single kind of use, such as hospital or medical office databases
  • operational databases are designed to support the functions of a particular business or user

  • analytical databases are designed to support decision making from analysis of historical data
  • data warehouse databases also support decision making
  • online analytical processing (OLAP) systems provide tools that are meant to be used with data warehouse databases; a particular brand of such tools is Business Intelligence

Databases can also be categorized by the way they treat or store the data that they use:

  • unstructured data is kept in whatever state is is collected in
  • structured data is formatted to fit the needs of the database
  • semistructured data is a compromise between the two methods above
  • extensible markup language (XML) is a way of representing data that is more useful to web based tools

Good Design

The text mentions some goals that should be pursued when designing a database:

  • focus on the structure that will store and manage the data
  • make sure the system makes management easy
  • make sure that information generated by the system has value and is accurate
  • make sure that errors can be traced easily in the system

Evolution of Database Systems

image of a shopping listThe most basic kind of database is simply a list of data. Lists become tables when we track several kinds of information in simple data files. The text reminds us briefly that there were databases long before there were computers. At one point they were paper files in filing cabinets. Specialists in using such systems could create meaningful reports, but such things were produced much more slowly than even simple computer systems can do.

Paper systems evolved to be files stored and maintained in computers. Spreadsheet programs were, and can still be, used as good tools to create what are often called flat file databases. They handle simple data needs well, but fall short of the capabilities of more full featured DBMS databases that are common now.

Flat file databases store many information items redundantly. For instance, each order placed by a given customer features that customer's name and ID number. Needed information, of course, but it would take much less storage space to use a database instead that can store customer information one, and use it as needed. Each order presents the need to enter this same information in flat files, presenting the opportunity for error, which would cause data to be left out of reports that might be generated based on matching any one variation of the customer's name. A database allows us to store the customer's information in one place. This information is then accessed each time it is needed, providing for consistency.


spreadsheet as a file databaseShortcomings of spreadsheets are described:

  • spreadsheets have limited security
  • spreadsheets have limited ability to allow multiple users to access or update the data
  • spreadsheets become hard to manage when very large

File based systems suffer from redundancy in one system, and even worse redundancy when multiple departments of a company each have their own file based system.

Basic Database Terms

On the other hand, the use of tables, with or without spreadsheets, has led to some commonly understood database terms:

  • entity - something that you want to store data about, such as customers, accounts, employees, inventory, places, or events
  • attribute - a characteristic of an entity; also called a field, column, or property; in spreadsheets, attributes are typically stored in columns of data
  • record - all the data about one instance of an entity, such as the attributes of one order, one student, or one customer in a database
  • relationship - an association between entities in your database. Customers are associated with their sales reps, in that each customer has one and only one sales rep. This is a relationship. From the point of view of the sales rep, this is a one to many relationship, since each sales rep has many customers, but each customer has only one sales rep. Relationships can be one to one, one to many, many to one, or many to many.
  • data file - all data in a database is stored in files, including lists of entities, their attributes, and the relationships between entities

File based systems were useful in the days before computers made other choices possible. Even while they were the only choice, their points of failure were recognized. Most of those shortcomings were present even when computers were used with a file based system:

  • it could take a long time to create, learn, and use one
  • it always took a long time to harvest information from such systems
  • maintenance of file systems was difficult
  • the only security on the old systems was physical security, due to the physical nature of the paper records; the nature of the records also meant anyone using the system had to be present at the site where it was kept
  • when file systems are used on computers, there is still minimal security on them, and sharing such a system means copying the entire file structure and its contents to another location
  • using such a system typically requires a lot of programming to make it work as a system

File systems do not support features that are commonly found in DBMS systems:

  • Structural dependence means that accessing the data in a file depends on the structure of the file itself. File systems typically require their data files to be structured in the same way, and a change in structure would require a change in all the programs your staff have created that use those data files.
  • Structural independence means that a file's structure can be changed, or a new file can have a different structure, and neither of those things require changing the programs that use the data in those files. This is generally true when using DBMS systems.
  • Data independence means that we can change the structure of the database, and add functions to the DBMS, without having to reenter all the data. This is a characteristic of DBMS systems.
  • Data dependence means that we have to reenter data for those kinds of changes, which is common for file systems.

The thing that allows DBMS systems to function more freely is the fact that they look for data according to logical names and locations, instead of looking for data by physical names.

It is important to control redundancy and other database problems by having and maintaining a good design. The text lists three kinds of anomalies that indicate that a design change is needed. Each of them is encountered with a change to the system's data:

  • update anomalies
  • insertion anomalies
  • deletion anomalies

Next, the text reintroduces the idea of a database management system (DBMS). A DBMS is any software solution that allows users to store and retrieve data, and produces reports needed by the users. Users interact with the DBMS directly or through an interface of some sort. Users typically can add, delete, and modify records through a DBMS. For example, users of a sales database should be able to ask the DBMS to retrieve all orders for a specific part, or all commission payments to specific sales people. Making such a request is called running a query. (Query is another word for question.)

Some examples of DBMSs are Access, Oracle, DB2, and SQL Server.

Advantages and Disadvantages of Database Processing

'This is a standard list of virtues of database systems:

  1. Getting more information from the same amount of data - You need to understand what the authorr means by those two terms. Data may be considered to be raw facts. Information is what we get when we process data to gain knowledge. A DBMS may help us see a relation between data elements that we did not think were related.
  2. Sharing data - A DBMS will allow more users to access the stored data than a spreadsheet system can.
  3. Balancing conflicting requirements - One of the jobs of a DBA is to impose structure on the database so that the company can get the most use out of it. Serving the needs of the greatest number of users (the company as a whole) is usually the goal.
  4. Controlling redundancy - Redundancy means saving the same thing in several places. Saving a data element in only one place conserves storage space, and makes updating the system much simpler. Redundancy cannot be eliminated, but it can be reduced. A data anomaly is a problem that occurs with data. One kind of data anomaly happens when redundant data does not match, which will cause data processing to fail.
  5. Facilitating consistency - Consistency is a direct result of reducing redundancy. When a data element is stored in fewer places, it is less likely to be stored incorrectly, or fail to be updated.
  6. Improving integrity - Integrity constraints are rules for databases, such as a rule that a certain field must contain only numeric data, or a field used in a query must contain a data element that is in the database (think of a drop down list). DBMSs allow integrity constraints.
  7. Expanding security - Security can be the assignment of passwords for DBMS users, or the assignment of different rights to various users or groups. A DBMS will typically include security components for all of its data structures.
  8. Increasing productivity - In the old days, a programmer would have to write a program for every query a user wanted to run on a database. A DBMS will allow that programmer to work on something else, once the users understand how to use the DBMS. Advanced users may use the same query language that administrators use to retrieve data or create reports. This is often some version of SQL, Structured Query Language.
  9. Providing data independence - This means that we can change the structure of the database, and add functions to the DBMS, without having to reenter all the data.

A short list of disadvantages to using a DBMS:

  1. Larger file size - The DBMS itself is probably a larger program file than the simpler system we are converting from. This may mean upgrading some hardware for a company that is changing to a DBMS.
  2. Increased complexity - Since the DBMS is capable of doing a great deal, it has many features that might be confusing to new users. (Maybe they should read this book.)
  3. Greater impact of failure - If we centralize all of the company's data in one system, the impact of the failure of this system is greater than the failure of a single system in a company where each user has part of the data. This means we need to practice making backups on a regular basis.
  4. More difficult recovery - Restoration after a data disaster is more difficult than with a simple spreadsheet system. It is not impossible, just harder. Think of it like this: If a storm blows down a house and a tent, it is more difficult to rebuild the house than it is to put up a new tent. However, when it is done the house will be a house again, and the tent is still only a tent.
  5. Vendor dependence - You may be committed to continuing to use one vendor's products because all of your data is in one of their DBMSs. This problem occurs when there is too much dependence on one product.
  6. Version changes - When there is a major upgrade to a DBMS product, you may have to schedule downtime for the system while upgrades take place.