This text presents a challenge to the practiced reader. Normally you look for the important information in a text which may stand out because of the font or organization of the paragraphs. The author seems to have taken an different approach in the first chapter, where we see that every idea has a new paragraph with only a few sentences, and every paragraph has a new section heading. Too much emphasis?
Let's try to figure out what might be important in chapter 1.
Oracle database architecture
Oracle is a DBMS (Database Management System), also called an RDBMS (Relational DBMS). An Oracle DBMS includes data files (tables), control files, and log files. It is used by two components in the simplest installation: the client and the server. You typically install the server portion on an actual application server on a network, and you install the client software on a workstation. When you use the student version of the product, you will probably install both parts on the same computer. This is fine for learning the product, but not very functional for an enterprise.
In an enterprise environment, we would expect to see the three components listed on page 1-3:
When the DBMS runs, it is running an instance of the database on the server. Think of an instance as one copy of the program running in memory. An instance implies a copy of the RDBMS, memory allocated to the instance, and access to data files. A powerful server can run multiple instances of the RDBMS at the same time. This is similar to a user running multiple instances of a browser or a word processor on one computer. A server can be:
In each case above, the various instances are called into existence by a client making a request to connect to the database. A user typically runs an application on a workstation, which creates a user process, which serves as the user's connection to a server process whose purpose is to support this connection.
Physical database structures
The text discusses five types of physical files that you may find in an Oracle database:
Logical database structures
Oracle stores data in blocks.
There are typically several blocks in an extent.
There could be several extents for an object as it grows. All of an
object's extents are contained in its segment.
Segments usually have a one to one correspondence with schema
There could be any number of schema objects in a tablespace.
Each schema is owned by a
user, and contains all the database objects they have created.
Schemas and schema objects
As noted above, a schema in Oracle relates to the collection of database objects a user has created. Several types are listed in the text:
The text considers several features of Oracle through page 1-9. These features are not unique to Oracle, so we do not need to stress them.
Relational models and normalization
The text spends several pages on general database design and normalization, which I hope you have seen before. In case you have not, here are some thoughts about those ideas from another text:
Normalization is the process of making your data tables comply with several rules. Compliance with each successive rule takes the tables to the next level of normalization. The levels are called normal forms, specifically first, second, third, and fourth normal form. The forms can also be referred to by the symbols 1NF, 2NF, 3NF, and 4NF.
The first objective of the chapter is to define functional dependence. A column is functionally dependent on another column if the value found in the second column determines the value found in the first column. Suppose we had a database about customers who live in either Canada or the United States. Let's have a column in the customer table called State/Province, and another column called Country. If the value in State/Province for a customer is Michigan, the value in Country for that customer will be United States. If the value in State/Province is Ontario, the value in Country will be Canada. So the Country column is functionally dependent on State/Province. Likewise, the State/Province column functionally determines Country. Note that the relationship does not have to flow both ways: knowing a customer's country of residence will not determine their state of residence.
In small tables, it is tempting to find dependencies that may not be real. The text suggests that since we have only three sales people in the Rep table, we could conclude that RepNum determines LastName, and LastName determines RepNum. The key is to ask about the future, and to imagine what can and may happen. Hiring a new sales person with the same last name as a current sales rep shows that RepNum will determine LastName, but LastName will not determine any other column.
RepNum, in fact, determines every other field in the Rep table, since each sales person has a unique number. This makes RepNum a key for the Rep table. It is in fact the primary key for this table.
The rule for primary keys states that a column is the primary key if two things are true:
This means that a primary key must determine all other columns. It also means that if the primary key consists of more than one column, no portion of the primary key may act as a primary key. If the subset portion of the rule is violated, it means that you have not identified the correct primary key.
Candidate keys are those columns that could be chosen as the primary key. It is possible that more than one candidate key exists in a table, for instance, the book suggests that a table that holds both employee ID numbers and Social Security numbers could use either column as a primary key. Any candidate key that is not chosen to be the primary key is designated an alternate key.
The definition of first normal form says that a table may not contain repeating groups. This was included in earlier chapters as the statement that cells in a table may only contain single values. A standard cure for repeating groups split the records with repeating groups into multiple records.
It may also be necessary to expand the primary key. The second problem with repeating groups is that the column representing the primary key does not determine the value of the field containing the repeating groups. This becomes obvious when you expand such a row, making it multiple rows containing the same value for the primary key. Assume the following table:
The Class and Room columns contain repeating values. The Instructor column is meant to be the primary key, but it does not meet the rule. We can fix this by expanding the primary key, adding a column for the second part of the new primary key, and splitting the values of the repeating groups into separate rows.
In the expanded table, the new primary key is the combination of the Instructor and Day columns. The combination determines the contents of the remaining columns. This table is now in first normal form.
Second normal form says that a table is in first normal form, and it has a new quality: No column that is not part of the primary key is determined by a subset of the primary key. Let's say that another way: if the primary key is made of more than one column, no column in the table can be determined by a mere portion of primary key. In the example above, the rule requires that Class and Room cannot be determined by Instructor or Day alone. This rule is met, so this table is in second normal form.
Note that since this rule describes tables whose primary key is a combination of columns, any table in first normal form that has a single column as its primary key is automatically in second normal form.
So, how do you find and fix tables that are not in 2NF? You examine tables for dependencies. Which columns are determined by which other columns? How much data in the table is repeated from one row to another? You do the same thing when imposing Third normal form, so let's think about it for a minute.
Third normal form lets us use another term from the text. A determinant is a column (or set of columns) that determines the contents of another column. A table is in third normal form if the only determinants in it are candidate keys. (Remember that the primary key is also a candidate key.) The procedure to cure a violation of this rule is simple: we copy the non-candidate determinant into a new table and move the columns that it determines to the new table. The non-candidate determinant becomes the primary key of the new table. It continues to exist in the old table, relating the two tables.
Problems are created when the correction procedures are not done correctly. Students should examine the examples in the text. Note that when the solution is to decompose a table into two or more tables, each resultant table must be in 3NF when you are done.
Consider this example from About.com:
Like the example in the text, there is repeated data that could be a problem when updating the database. Assume that we have lots of customers in each of a dozen cities. There is the potential for entering an error each time the same data element is entered. We can simplify data entry by removing the columns that are dependent on a single nonkey column and creating a new table. Since ZIP determines City and State, we remove the City and State columns from the table, and create a new table containing ZIP, City, and State.
This allows us to simplify the original table:
The two resultant tables are related on the ZIP column. (Using color theory, I have related the blue table above to the yellow table below on the green column.)
When we place our data tables in 3NF, we avoid some classic problems found in tables that are not in first, second, or third normal form:
The discussion of fourth normal form requires another term. If each value of column A has several associated values of column B, then B is multidependent on A. This is also called a multivalued dependency. Fourth normal form requires that we have third normal form, and no multivalued dependencies.
To avoid the multidependency problem, the text shows us a new method for attaining 1NF. In the first examples, we removed repeating groups and expanded the primary key. That will not work if we have multiple repeating groups. In the text example, we have a Faculty table whose primary key is FacultyNum. It includes FacultyName, and two repeating groups: StudentNum, StudentName, and CommitteeCode, CommitteeName. The idea is that each faculty member will belong to several committees, and will be the adviser for several students. In addition, a student may have several advisers due to multiple majors.
The method to resolve the problems of such a table is to spin off several tables. Solution: remove each repeating group to a new separate table, copy the primary key of the original table to the new table, and establish a new primary key for that table consisting of the copied key column and the key for the repeating group.
This results in simpler tables:
Continue with the rest of the chapter and complete the assignment below.