CIS 2210 - Database Management and Design
Chapter 6: Normalization
This lesson discusses material from chapter 6.
Objectives important to this lesson:
- What is normalization?
- Normal forms
- Transforming relations to higher normal forms
- Using normalization with ER modeling
The chapter begins with several bullet points about normalization:
- definition: The text says it is the process of changing data structures to minimize data redundancy. It is more than that, but controlling redundancy is a primary goal.
- benefit: Normalization reduces the number of data anomalies by controlling how tables are related.
- control: Tables are adapted from their initial states to meet specific standards (1NF, 2NF, 3NF, etc.)
- benefit: Each successive level of normalization is an improvement of the lower numbered forms.
- economy: Once you achieve 3NF, you may already be in 4NF.
- reversal: Tables can be denormalized,
reduced to a lower level of normalization, to increase performance in
the DBMS. This is not intuitive. It is sometimes done because it is
easier for the DBMS to handle fewer tables that are not as normalized
as they could be.
- Benefit: Using normalization in the design process improves the table
designs, their attributes,
and their relationships.
The text presents an overview of a normalization process:
- Each table is about a single subject (entity).
- We will reduce the number of places individual data items are stored.
- All attributes in a table that are not part of the primary key should
be dependent on the primary key. If not, the dependent items should
be moved to another table, and the column that determines them should
be copied to the new table as its primary key.
- Inserting, updating, and deleting rows from the table will not cause anomalies.
- All tables are raised to 3NF or higher.
- The process addresses one table at a time, but considers
all tables in a relationship and breaks out attributes into new tables
Table 6.2 offers a summary of five normal forms, but
it uses language that has not been discussed yet. Let's cover some
of the necessary terms.
- The summary for 1NF
says the data is in a table, there are no repeating groups, and the
Primary Key has been identified. This is accurate, but it leaves out
some of the other features of 1NF.
- The summary for 2NF says that
our table is in 1NF, and it has no partial
dependencies. This means that if you are using a composite
primary key, there is no column in the table that can be determined
by a portion of the primary
key. If there was such a column, it should be moved
to a new table, along with a copy of the part of the primary key that
determines the dependent column. Remember that if your primary key is
a single attribute, and your table is in 1NF, you can't be in
violation of 2NF.
- The summary for 3NF says the
table is in 2NF, and it has no transitive
dependencies. This says that you can't have a column in the table
that is determined by another column, unless the determinant column
is the primary key. (Alternate
keys are still okay.) This is a lot like the 2NF problem, but this time
we are looking for determinants that are not part of the primary key.
- The summary for Boyce-Codd normal form
says that every determinant in the table is a candidate key. Another
way to say it is that we don't have any attributes that depend on other
attributes unless those attributes are candidate keys. This was
previously taught as the second half of 3NF, so we should be there
already if we are paying attention. If our table has this problem, we
resolve it by the same method used in the two bullets above.
- The summary for 4NF is quite
murky. We can say that 4NF requires that there be no dependencies on
combinations of non-key fields. This is like an expanded version of
Boyce-Codd, but it involves composite non-key determinants.
More terminology that is related to dependence:
- functional dependence
- an attribute is functionally dependentt on another attribute (its
determinant) if each value in the determinant has a 1:1 correspondence
with a specific value in the dependent attribute; if we know a value in
the determinant, and we know a corresponding value in the dependent, we
can be sure that additional rows having the same value in the
determinant will have the same corresponding value in the dependent
- fully functional dependence
- this is the same condition as functionnal dependence, except that the
dependent is determined by a composite of multiple attributes
- transitive dependence
- as noted above, a transitive dependencce is when an attribute is
dependent on a determinant that is not a primary key or a candidate key
The text offers some examples of normalizing tables, most of
which make sense. We will talk about them in class and in discussion
I would like to observe that dependence is a tricky thing.
There are really two meanings.
- Some fields are obviously dependent on other fields. The
value of a field can predict what value must occur in another field.
This is one meaning of dependency and determinance.
- Other fields are
dependent on a primary key only because they are part of the record
that a specific value in the primary key identifies. There is nothing
about my ID number that predicts my name, my address, or any other
attribute of mine. The value in my address field is determined by the fact that it is mine
and it belongs in my record. Dependency and determinance can simply
mean that a data value is in its proper record. This meaning is less
dramatic than the first meaning, but it is the more common meaning for
data in relations: what are the values that are part of the record we
are talking about.
Denormalization can be
considered as leaving one form of normalization to use a simpler form.
You may have noticed (if not, notice it now) that moving to each higher
level of normalization may mean adding more tables to the database.
Queries that use those tables will have to open each table that is
needed for each query, slowing the processes that are needed to
complete it. In order to speed up a system, you may want to consider
skipping a level or two of normalization. In table 6.6 (or nearby if the book has changed again), we see some
examples of "problems" that happen when a table is not normalized as
much as possible. The table provides reasons for choosing the
denormalized version, and suggestions to handle the less normalized
The chapter ends with a check list of advice about conducting and completing the data modeling portion of a project:
- business rules - document and verify all; make sure they are clear and simple; make sure all rules are valid and current
- naming conventions for entities, attributes, and relationships
- entity concerns
- attribute concerns
- relationship concerns
- ER model concerns