|
|
CIS 2210 - Database Management and Design
Chapter 4: Entity Relationship (ER) Modeling
Chapter 5: Advanced Data Modeling
Objectives:
This lesson discusses material from chapters 4 and 5.
Objectives important to this lesson:
- Entity relationship components
- Defining and refining relationships
- Database design with ERD components
- Reconciling design conflicts
- ERD diagrams
- Extended entity relationship model
- Entity clusters
- Selecting good primary keys
- Special cases
Concepts:
Chapter 4
The chapter begins with some review, and throws in some new
terms:
- A relational database has three main components - entities, attributes, and relationships.
- Entities are concepts, and instances are cases/examples of those concepts.
- Attributes may be required (these have a NOT NULL
constraint) or optional.
- Attributes should
have a defined domain of
values.
- The text uses the word identifier
as a synonym for a key. We have
talked about composite keys,
so be aware that there are also composite
identifiers.
- Composite attributes
are attributes that we may/should consider dividing into more
attributes, such as the parts of a name or the parts of an address.
- Simple attributes
are attributes that do not have subcomponents.
- Derived attributes are calculated from other
attributes. For example, a person's current age can be calculated from
their date of birth and the current date with simple arithmetic.
When a derived attribute is likely to change often, it
should be recalculated with each use. When it is unlikely
to ever change, it can be calculated once, then stored
as a value, instead of as a formula, for its next use. When it
is possible that it should change due to new data being available, the
designer must make a judgment on how to handle it.
The text returns to the Entity Relationship Diagram,
and teaches us more terms about it.
- We can refer to two entities that are linked by a
relationship as participants in the relationship.
- Connectivity is the word used to refer to the kind
of relationship that exists, the verb that causes the linkage.
- Cardinality refers to maximum and minimum values
that may apply to the relationship. For example, an instructor may be
limited to teach no more than some number of classes, or no fewer than
some number, or both. These numbers define the cardinality of the
relationship. If the instructor must teach at least one, and no
more than four classes, the cardinality of the relationship would
be (1, 4), written beside the Class entity. If each class must
have no less than one, and no more than one instructor, we
would write (1, 1) beside the Instructor entity.
Some entities will exist in a database regardless of the
existence of others. These are called strong or regular
entities, and they have existence independence. Other entities
will not exist unless another entity exists to create them. These
are called existence dependent entities.
- In the database for a school, Course would be an example of
a strong entity that would exist as a concept in the school catalog.
This entity has existence independence.
- A Section would be an entity that exists only in relation to
a course, and only when a section of a particular course was scheduled
to be held. This entity has existence dependence.
- As an example, the school always has the potential to hold
a section of a particular class if that class is an instance of a
defined course. The course definition must exist first, or the section
will not/can not exist. This places the Course relation in a parent
role regarding the Section relation.
Relationships can be called strong or weak,
depending on the linkage between the relations (tables). The text
defines this situation for relations in a parent-child relationship.
- From the perspective of a parent relation, the linkage
to its child relation is strong if the child relation
contains a component of the parent relation's primary key in the
child's primary key. For this to be true, one or both of the relations
will need to have a composite key. A strong relationship is also called
an identifying relationship.
- If this condition does not exist, the tables may
still be related as parent and child, but the relationship is weak.
A weak relationship is also called a non-identifying
relationship.
- In an ERD, a strong relationship is shown
as a solid line, and a weak relationship is shown as a dashed
line.
The text makes the issue of strong or weak harder to remember
by discussing the fact that entities can be strong or weak
as well.
- A weak entity has a strong relationship to
its parent, and it is existence-dependent on its parent.
- A strong entity is one that does not meet both
of the criteria above.
Since you are trying to learn this symbol system, take a look
at the
one page reference posted by Vivek Chawla. This is a
useful diagram that we will discuss in class.
- Note the Crow's foot notation that looks like two pipe
characters: ||. This means one and only one. In the
reference sheet, it shows that for every student there must be one and
only one student ID.
- The symbol that looks like a one just means one.
- The symbol that looks like a circle means zero is a
possibility.
- The symbol that looks most like a crow's foot (or
most birds' feet) means many.
- The symbols are typically combined to give you a
visual statement about the kind of relationship they depict.
- When symbols appear adjacent to an entity object,
they mean that this is the relationship from the far object to the
adjacent object. For instance, in the relationship between prescription
and patient, the symbol for one and only one appears adjacent
to the patient object. A prescription is written for/given
to one and only one patient.
With the basic tools for an ERD diagram somewhat established, the text
suggests that we should use a process for creating them.
- Create a document describing the operations of the
client/organization/organizational unit.
- Create business rules from the operations document.
- Use the business rules to identify entities and
relationships.
- Draw a first draft of an ERD.
- Determine attributes and primary keys for the entities.
- Review the prototype ERD, and revise it as needed.
The text walks through the creation of several ERDs. Each shows a portion
of how the organization works. It is a little like the classic poem, The
Blind Men and the Elephant. When you work with people who have no grasp
of the big picture, you have misunderstandings and needless battles.

One should take this as a cautionary note. If you don't try to develop
a sense of the big picture of the organization, you are quite likely to
capture only snapshots of processes that do not depict the whole story.
Your product will be flawed, and you will be unhappy because your customers
will be unhappy.
The chapter closes with an observation that
the database will be slower as more and more relationships are
established. It suggests that we may have to enable faster processing
by leaving out relationships that are not necessary to the business
processes.
Chapter 5
Having explained a bit about the Entity Relationship (ER)
model, chapter 5 turns to the Extended Entity Relationship (EER)
model. It includes the concepts of supertypes and subtypes, which
are a lot like classes and subclasses in object oriented programming.
Unless you have entities that are essentially similar, and that also
have specific differences from each other, you do not need to use this
concept.
- entity supertype - an entity that is constructed
with attributes that will be common to all of the subtypes
related to it
- entity subtype - an entity that is based on a
supertype, but is different from the supertype because
of its specialized attributes and purpose
The text presents the example of a supertype called Employee,
which will have all the attributes that are common to various
categories of employees in an organization. Subtypes of this entity
might include managers, salaried staff, commissioned staff, part time
staff, full time staff, probationary staff, or whatever categories the
organization actually uses. It may be obvious that each subtype
is related to one supertype, and each supertype
may be related to no subtypes, one subtype, or many subtypes. A
supertype with no subtypes may exist as preparation for the future.
Supertypes and related subtypes support specialization
hierarchy, which is another set of features that entity types may
have.
- inheritance from the supertype to a subtype
- subtypes have the same primary key as their supertype
- if a supertype has relationships, the subtypes inherit
those as well
- subtype discriminator - This is an attribute in
a supertype that tells us which subtype an instance in the
supertype is related to
- subtypes can be disjoint or overlapping
- Disjoint subtypes are also called non-overlapping.
They contain attributes that can be one of two (or more) values, but
cannot be a combination of those values, such as an employee
being hourly or salaried, but never being both.
- Overlapping subtypes contain contain a
combination of attribute states which prevent there being one attribute
to represent the state. For example, an employee who has been promoted
will be in a probationary status, but will also be either
hourly or salaried. In this case the employee may be hourly, hourly and
probationary, salaried, or salaried and probationary. These are
overlapping states and they are said to be non-unique because of the
overlap.
The text offers an example of an employee of a school who may be an administrator,
a professor, both, or neither. It may be best
to express these states in two logical fields, showing the states of
each as True or False. The text refers to the two fields (Administrator
and Professor) as discriminator attributes.
- Completeness - A supertype may have a completeness
constraint, a requirement that each instance must be a member of at
least one subtype. When this is so, the constraint is total
completeness. When some instances are allowed to not
be members of a subtype, that state is partial completeness.
- To make things more fun, it is possible that a
specialization may be overlapping or disjoint, as well as having total
or partial completeness. Table 5.2 in the text summarizes the
features of each combination.
The text differentiates between two procedures that you may
follow when identifying specializations.
- You may find that you learn about a supertype
first, then learn about the subtypes that occur in the data.
When this happens, you are following the specialization
process, going down from the top of the hierarchy to the bottom,
proceeding from the general to the specific.
- On the other hand, you may learn about the subtypes first,
then discover that a supertype should (and does) exist that covers
all of those types. In this case, you are following the generalization
process, starting at the bottom of the hierarchy and going up, proceeding
from the specific cases to the general case.
It may be useful to take a less granular approach in a
diagram, which can be done using entity clusters, in which
several entities are represented by one virtual entity. In
figure 5.5 in the text, the school database diagram has used a Location
entity to represent both the Room and Building entities.
We
can say that the Location entity has those two entities as its
attributes. The text warns us that trying to use a cluster object as an
entity can produce loss of keys, changes in inheritance, and loss of
relationships. It may simplify a diagram to use cluster objects, but
you should not depend on them in the DBMS.
The text returns to the subject of keys, to show us
some general principles about them. Each relation needs a primary key.
The text mentions that a good design often uses a natural key,
a natural identifier that is recognized by system users as part
of their existing vocabulary. This is good advice for all attributes.
Using terms that the users already know will make the system more
accessible to them.
The text offers five characteristics of a good
primary key:
- non-intelligent - This only means that a good key
has no semantic, informational meaning. An ID number is only that, it
is not imbued with a dictionary definition, a contextual meaning, or an
association to history or feelings that may be associated with a name,
a nickname, or any other label that has intrinsic meaning.
- no change over time - Good ID numbers do not need to
change when a person, place, or thing changes names, ownership, value,
or any other characteristic that might be associated with an entity.
For example, I have had the same checking account number for many
years, despite my bank having been merged repeatedly into larger and
larger bank entities.
- preferably single attribute - As we discussed last week, a
good key should be a single attribute whenever possible. It is less
likely to cause problems. It is impossible to violate the rules of Second
Normal Form if your primary key is not a composite key. When forming
a strong relation, as noted above, it is easier to copy a single attribute
key to a related table than to copy a composite key to it.
- preferably numeric - As noted above, numbers do not
typically have semantic value, and there is an infinite supply of
unique instances of them. Auto-incrementation of record numbers is a
function of most DBMSs.
- security compliant - A good key is an attribute that is not
valuable in and of itself, as a Social Security number would be. This
is one of the reasons we do not use Social Security numbers as keys
in relations. That, and it is against the law.
There will be times when you will need to create a key for a data table. When there is an entity, there are instances, and the instances are different from each other, it should be possible to determine an existing key. When it is not, create what we will call a surrogate key.
Add an attribute to the table that has as many characteristics of a
good key as possible. This will typically be a numeric key, created by
numbering the rows in the data table. The text offers some advice about
surrogate keys:
- create a surrogate key when there is no natural key in the data
- create a surrogate key when the best candidate key is a long, complicated composite
- create a surrogate key when the best candidate has embedded semantic content that distracts the users
- when creating a surrogate key, make sure it performs
correctly and meets necessary constraints (not null, unique, and
anything else you need)
In table 5.4, we see an example of a table in need of a key.
It is a list of events, each having a date, a start time, an end time,
a room, an event name, and count of guests. As you can see by examining
the table, or by just imagining it, every column may contain data that
is not unique. Adding an event number (or an event ID) field would
solve the problem, and allow this table to function as a relation.
The text continues with a discussion of linking relations by adding the primary key of one relation to the attributes of the other relation. Since it is the primary key of the first table, we refer to it as a foreign key in the second table. The text mentions that a similar method is to add the primary key of each table to the attributes of the other table. Since there is a choice to be made in the first method, some advice is offered:
- When one side of the relationship is mandatory in the system, and the other is optional, add the key of the mandatory entity to the optional entity as a foreign key.
- When both sides of a relationship are optional, one of the keys may cause fewer nulls (no match) in the other table. If this is so, make that key the foreign key in the other table.
- When both sides are mandatory,
make sure that the two entities should
be two, not one entity. If they should be separate, use the advice in
the bullet items above.
Sometimes, you will need to create new entities/tables to meet a need
for historical data. Consider the example of a an Employee
table that includes an attribute for salary.
Salaries/hourly rates tend to change over time. This field will need an
update now and then. However, the employee, the business office, and the
various tax agencies will want to know when
the change was made. The text proposes a new
table that contains Employee Number
(the primary key of the Employee table), Salary
Change Date (when the change is/was effective), and Salary
Amount (the new value). In this data table, the history of salary
changes for the employee are available. Can you see any other data we
might want to include in this table?
In similar examples, the text shows us that new entities could be created
to maintain history on managers of departments or teams, and on job history
for employees.
The chapter ends with some troubleshooting examples that we will use as material on our discussion board this week.
|