CIS 331: Database Management Using SQL

Chapter 2: The Relational Model 1: Introduction, QBE, and Relational Algebra

Objectives:

This chapter introduces the student to the relations, relational algebra, and several functions common to queries. The objectives important to this chapter are:

1. Describe the relational model
2. Understand Query-By-Example (QBE)
3. Use criteria in QBE
4. Create calculated columns in QBE
5. Calculate statistics in QBE
6. Sort data in QBE
7. Join tables in QBE
8. Update data using QBE
9. Understand relational algebra
Relational Model

The chapter begins with a description of relational databases. A relational database can be considered as a collection of tables. For some reason, the tables are called relations, hence the name relational database. The information about each entity in this kind of database is stored in a table dedicated to that entity. The attributes of that entity are represented as columns in the table, and the rows represent unique instances (cases) of the entity. In other words, in a table about students, each column represents a characteristic common to students, and each row represents an individual student.
Student Number Last Name First Name Grade
1100123 Jones Shane B
1100136 Smith Kim A

Rows can also be called records or tuples. (No one likes that last word.) Columns represent characteristics, also called attributes or fields.

Relationships between entities are shown by common columns in two or more tables. Our student table could be linked to a table of majors by having a column from the majors table in each table. In the majors table, each major would appear once. In the student table, each major might appear any number of times. This puts majors in a one-to-many relationship to students.
1100123 Jones Shane B 1
1100136 Smith Kim A 3

Major Description
1 Accounting
2 Computer Science
3 Nursing

The text presents a formal definition of a relation, which has several parts. A relation is a two dimensional table that meets these rules:

1. Each entry in the table contains one and only one value. No cell may contain more than one value.
2. Each column represents one attribute of the entity, and must have a unique name. Each column name is used only once in the table.
3. All values in a column are values of the attribute that column is about.
4. The order of columns in the table is not important.
5. Each row is unique in some way.
6. The order of the rows is not important.

The statements about row and column order being unimportant do not mean that you can't sort the table. They mean that sorting the table does not affect the data stored in the table. If you sort a table, you sort entire rows, so the information in any row stays with the rest of the information in that row. Think of it like this: a row is a record. You want each record to stay intact when the table is sorted. Would you want information from someone else's record swapped with your record? (No, you wouldn't.)

If a relation does not meet the first rule, that relation is unnormalized. The text does not explain at this point that we will be putting relations into normal form. That comes up soon.

A formal notation for tables is presented. Start with the name of the table, follow it with a list of the attributes in parentheses, separated by commas. This notation helps analyze the structure of the table without being distracted by the data elements.

Student (StudentNum, LastName, FirstName, Major, Address, City, State, Zip)

When writing about columns in a database, it is possible to make a mistake about what is meant. Since relations are linked by having the same column in two relations, it is necessary to have a notation that tells us which column in which table is meant. The notation is the table name, followed by a period, followed by the column name. (To a programmer, this notation means object.attribute.)

Student.Major

Tables may have any number of columns. In most cases, the entries in one column will uniquely identify each row. That column is the primary key of that table. Sometimes the primary key is not just one column, but a combination of two or more. In the notation above, primary keys would be shown by underlining them.

Student (StudentNum, LastName, FirstName, Major, Address, City, State, Zip)

QBE

The text presents a series of examples of using Query by Example with an Access database. In general, a query is a question that you ask the DBMS. Whatever DBMS you use, there will be a structured way to make a query. QBE allows you to make a query by filling in a grid or template. The variations presented in the text:

• simple queries - You specify which table to use, and what fields to report from the table
• criteria - (criteria is plural, criterion is singular) you not only specify which field to search on, but specify a value that must be matched. Multiple criteria may be used, or just one.
• compound criteria - It is common for QBE templates to allow you to use multiple rows and columns for criteria. If you put multiple criteria in one row, all criteria in that row must be matched. This is the same as a Boolean AND condition. If you put criteria in multiple rows, each row becomes one acceptable set of criteria, and the rows can be considered to be separated by the Boolean operator OR.
• computed fields - If you have fields containing numeric data, you can perform math function on those fields. Standard computer symbols are used for math operations. (+ for addition, - for subtraction, * for multiplication, and / for division)
• calculated statistics - The text give an example of counting the number of accounts assigned to a sales rep. We are not asking for each row that matches, we just want to know how many there are. The query uses a reserved word that works in this DBMS (count) as well as specifying which sales rep to match.
• grouping and sorting - Output from queries can be made to meet particular needs. Grouping would put all records that match each sales rep together, allowing you to create a sub-report for each sales rep, or a summary line for each of them. Sorting would arrange the output as desired. It is common to be able to perform a primary and secondary sort.
• joining tables - Some information must be obtained from multiple tables, such as a report showing the full name of customers and sales reps. The tables in question must be related by a common field. A query that runs on two tables is said to join the tables.
• joining multiple tables (more than two) - An extension of joining two tables, each table must be related to at least one of the other tables.
• update query - This is essentially a find and replace command. It could be used to update or delete records.
• make-table query - This results in a new table with elements copied from existing tables.
Relational Algebra

Relational algebra, as stated in the text, is not used in current DBMS systems. It is a theoretical system of getting data from databases. It is presented here to aid your understanding of SQL.

Commands and syntax in relational algebra:

• SELECT - This command is followed by the name of a data file. WHERE is followed by a field name and a criterion. GIVING is followed by the name of a file to create with the result.
Example:
SELECT Student WHERE StudentNum=102477
GIVING SelectedStudent

• PROJECT - This command is followed by the name of a data file. OVER is followed by a parenthetical list of fields to copy from the file. GIVING is followed by the name of a file to create with the result.
Example:
PROJECT Student OVER (StudentNum, LastName, FirstName)
GIVING StudentNamesNumbers

• JOIN - This command is typically followed by two data file names. WHERE is followed by a statement of which fields in each file match. GIVING is followed by the name of a file to create containing the result.
Example: (assume the Student table above, and a Major table that lists majors, advisors, and department office numbers)
JOIN Student Major
WHERE Student.Major=Major.Major
GIVING StudentMajor

Relational algebra follows set theory (from mathematics).

A union of two tables is the result of taking each unique row from each table, and each row that matches from each table. (Note that this is not possible unless the tables have the same number of columns, and the columns contain the same kind of data. The columns in each original table do not have to have the same names.) In the diagram here, the union of tables A and B is represented by the red, blue, and yellow areas: the data that is unique to each table, and the data that is common to both.

The intersection of two tables is the set of all rows that appear in both tables. This is the result of the INTERSECT command in relational algebra. In the illustration here, the intersection of tables A and B is represented by the blue area: the data that appears in both tables.

The difference of two tables is the set of all rows that are unique to the first table named in the equation. This is the result of the SUBTRACT command in relational algebra. In the illustration here, if we execute the command table A SUBTRACT table B, the result is the red area: the portion of table A that does not intersect with table B.