CIS 2210 - Database Management and Design
Chapter 6: Normalization
This lesson discusses material from chapter 6.
Objectives important to this lesson:
- What is normalization?
- Normal forms
- Transforming relations to higher normal forms
- Using normalization with ER modeling
The chapter begins with several bullet points about normalization:
- definition: The text says it is the process of changing data structures to minimize data redundancy. It is more than that, but controlling redundancy is a primary goal.
- benefit: Normalization reduces the number of data anomalies by controlling how tables are related.
- control: Tables are adapted from their initial states to meet specific standards (1NF, 2NF, 3NF, etc.)
- benefit: Each successive level of normalization is an improvement of the lower numbered forms.
- economy: Once you achieve 3NF, you may already be in 4NF.
- reversal: Tables can be denormalized, reduced to a lower level of normalization, to increase performance in the DBMS.
- benefit: Using normalization in the design process improves the table designs, their attributes, and their relationships.
The text presents an overview of a normalization process:
- Each table is about a single subject (entity).
- We will reduce the number of places individual data items are stored.
- All attributes in a table that are not part of the primary
key should be dependent on the primary key. If not, they should be
moved to another table.
- Inserting, updating, and deleting rows from the table will not cause anomalies.
- All tables are raised to 3NF or higher.
- The process addresses one table at a time, but considers
all tables in a relationship and breaks out attributes into new tables
Table 6.2 offers a summary of five normal forms, but
it uses language that has not been discussed yet. Let's cover some
of the necessary terms.
- The summary for 1NF
says the data is in a table, there are no repeating groups, and the
Primary Key has been identified. This is accurate, but it leaves out
some of the other features of 1NF.
- The summary for 2NF says that our table is in 1NF, and it has no partial dependencies. This means that if your are using a composite primary key, there is no column in the table that can be determined by a portion of the primary key. If there was such a column, it should be moved
to a new table, along with a copy of the part of the primary key that
determines the dependent column. Remember that if your primary key is a
single attribute, and your table is in 1NF, you can't be in violation
- The summary for 3NF says the table is in 2NF, and it has no transitive dependencies. This says that you can't have a column in the table that is determined by another column, unless the determinant column is the primary key. This is a lot like the 2NF problem, but this time we are looking for determinants that are not part of the primary key.
- The summary for Boyce-Codd normal form
says that every determinant in the table is a candidate key. Another
way to say it is that we don't have any attributes that depend on other
attributes unless those attributes are candidate keys. This was
previously taught as the second half of 3NF, so we should be there
already if we are paying attention. If our table has this problem, we
resolve it by the same method used in the two bullets above.
- The summary for 4NF
is quite murky. We can say that 4NF requires that there be no
dependencies on combinations of non-key fields. This is like an
expanded version of Boyce-Codd, but it involves composite determinants.
More terminology that is related to dependence:
- functional dependence
- an attribute is functionally dependentt on another attribute (its
determinant) if each value in the determinant has a 1:1 correspondence
with a specific value in the dependent attribute; if we know a value in
the determinant, and we know a corresponding value in the dependent, we
can be sure that additional rows having the same value in the
determinant will have the same corresponding value in the dependent
- fully functional dependence
- this is the same condition as functionnal dependence, except that the
dependent is determined by a composite of multiple attributes
- transitive dependence
- as noted above, a transitive dependencce is when an attribute is
dependent on a determinant that is not a primary key or a candidate key
The text offers some examples of normalizing tables, most of
which make sense. We will talk about them in class and in discussion
I would like to observe that dependence is a tricky thing.
Some fields are obviously dependent on other fields. Others are
dependent on a primary key only because they are part of the record
that a specific value in the primary key identifies. There is nothing
about my ID number that predicts my name, my address, or any other
attribute of mine. The value in my address field is determined by the fact that it is mine and it belongs in my record. Dependency can simply mean that a data value is in its proper record.
Denormalization can be
considered as leaving one form of normalization to use a simpler form.
You may have noticed (if not, notice it now) that moving to each higher
level of normalization may mean adding more tables to the database.
Queries that use those tables will have to open each table that is
needed for each query, slowing the processes that are needed to
complete it. In order to speed up a system, you may want to consider
skipping a level or two of normalization. In table 6.6, we see some
examples of "problems" that happen when a table is not normalized as
much as possible. The table provides reasons for choosing the
denormalized version, and suggestions to handle the less normalized
The chapter ends with a check list of advice about conducting and completing the data modeling portion of a project:
- business rules - document and verify all; make sure they are clear and simple; make sure all rules are valid and current
- naming conventions for entities, attributes, and relationships
- entity concerns
- attribute concerns
- relationship concerns
- ER model concerns