CIS 2210 - Database Management and Design
Chapter 4: Entity Relationship (ER) Modeling
Chapter 5: Advanced Data Modeling
This lesson discusses material from chapters 4 and 5.
Objectives important to this lesson:
- Entity relationship components
- Defining and refining relationships
- Database design with ERD components
- Reconciling design conflicts
- ERD diagrams
- Extended entity relationship model
- Entity clusters
- Selecting good primary keys
- Special cases
The chapter begins with some review, and throws in some new
- A relational database has three main components - entities, attributes, and relationships.
- Entities are concepts, and instances are cases/examples of those concepts.
- Attributes may be required (these have a NOT NULL
constraint) or optional.
- Attributes should
have a defined domain of
- The text uses the word identifier
as a synonym for a key. We had
talked about composite keys,
so there are also composite
- Composite attributes
are attributes that we may/should consider dividing into more
attributes, such as the parts of a name or the parts of an address.
- Simple attributes
are attributes that do not have subcomponents.
- Derived attributes are calculated from other
attributes. For example, a person's current age can be calculated from
their date of birth and the the current date with simple arithmetic.
When a derived attribute is likely to change often, it
should be recalculated with each use. When it is unlikely
to ever change, it can be calculated once, then stored
as a value, instead of as a formula, for its next use. When it
is possible that it should change due to new data being available, the
designer must make a judgment on how to handle it.
The text returns to the Entity Relationship Diagram,
and teaches us more terms about it.
- We can refer to two entities that are linked by a
relationship as participants in the relationship.
- Connectivity is the word used to refer to the kind
of relationship that exists, the verb that causes the linkage.
- Cardinality refers to maximum and minimum values
that may apply to the relationship. For example, an instructor may be
limited to teach no more than some number of classes, or no fewer than
some number, or both. These numbers define the cardinality of the
relationship. If the instructor must teach at least one, and no
more than four classes, the cardinality of the relationship would
be (1, 4), written beside the Class entity. If each class must
have no less than one, and no more than one instructor, we
would write (1, 1) beside the Instructor entity.
Some entities will exist in a database regardless
of the existence of others. These are called strong or regular
entities, and they have existence independence. Other entities
will not exist unless anther entity exists to create them.
These are called existence dependent entities.
- In the database for a school, Course would be an
example of a strong entity that would exist as a concept in the school
catalog. This entity has existence independence.
- A Section would be an entity that exists only in
relation to a course, and only when a section of a particular course
was scheduled to be held. This entity has existence dependence.
- As an example, the school always has the potential to hold
a section of a particular class if that class is an instance of a
defined course. The course definition must exist first, or the section
will not/can not exist. This places the Course relation in a parent
role regarding the Section relation.
Relationships can be called strong or weak,
depending on the linkage between the relations (tables). The text
defines this situation for relations in a parent-child relationship.
- From the perspective of a parent relation,
the linkage to a child relation is strong if the child
relation contains a component of the parent relation's primary key
in the child's primary key. For this to be true, one or both of the
relations will need to have a composite key. A strong relationship is
also called an identifying relationship.
- If this condition does not exist, the tables may
still be related as parent and child, but the relationship is weak.
A weak relationship is also called a non-identifying
- In an ERD, a strong relationship is shown
as a solid line, and a weak relationship is shown as a dashed
The text makes the issue of strong or weak harder to remember
by discussing the fact that entities can be strong or weak
- A weak entity has a strong relationship to
its parent, and it is existence-dependent on its parent.
- A strong entity is one that does not meet both
of the criteria above.
Since you are trying to learn this symbol system, take a look
one page reference posted by Vivek Chawla. This is a
useful diagram that we will discuss in class.
- Note the Crow's foot notation that looks like two pipe
characters: ||. This means one and only one. In the
reference sheet, it shows that for every student there must be one and
only one student ID.
- The symbol that looks like a one just means one.
- The symbol that looks like a circle is means zero
is a possibility.
- The symbol that looks most like a crow's foot (or
most birds' feet) means many.
- The symbols are typically combined to give you a
visual statement about the kind of relationship they depict.
- When symbols appear adjacent to an entity object,
they mean that this is the relationship from the far object to the
adjacent object. For instance, in the relationship between prescription
and patient, the symbol for one and only one appears adjacent
to the patient object. A prescription is written for/given
to one and only one patient.
With the basic tools for an ERD diagram somewhat established,
the text suggest we should use a process for creating them.
- Create a document describing the operations of the
- Create business rules from the operations document.
- Use the business rules to identify entities and
- Draw a first draft of an ERD.
- Determine attributes and primary keys for the entities.
- Review the prototype ERD, and revise it as needed
The text walks through the creation of several ERDs. Each
shows a portion of how the organization works. It is a little like the
classic poem, The Blind Men and the Elephant.
One should take this as a cautionary
note. If you don't try to develop a sense of the big picture of the
organization, you are quite likely to capture only snapshots of
processes that do not depict the whole story.
The chapter closes with an observation that
the database will be slower as more and more relationships are
established. It suggests that we may have to enable faster processing
by leaving out relationships that are not necessary to the business
Having explained a bit about the Entity Relationship (ER)
model, chapter 5 turns to the Extended Entity Relationship (EER)
model. It includes the concepts of supertypes and subtypes, which
are a lot like classes and subclasses in object oriented programming.
Unless you have entities that are essentially similar, and that also
have specific differences from each other, you do not need to use this
- entity supertype - an entity that is constructed
with attributes that will be common to all of the subtypes
related to it
- entity subtype - an entity that is based on a
supertype, but is different from the supertype because
of its specialized attributes and purpose
The text presents the example of a supertype called Employee,
which will have all the attributes that are common to various
categories of employees in an organization. Subtypes of this entity
might include managers, salaried staff, commissioned staff, part time
staff, full time staff, probationary staff, or whatever categories the
organization actually uses. It may be obvious that each subtype
is related to one supertype, and each supertype
may be related to no subtypes, one subtype, or many subtypes. A
supertype with no subtypes may exist as preparation for the future.
Supertypes and related subtypes support specialization
hierarchy, which is another set of features that entity types may
- inheritance from the supertype to a subtype
- subtypes have the same primary key as their supertype
- if a supertype has relationships, the subtypes inherit
those as well
- subtype discriminator - This is an attribute in
a supertype that tells us which subtype an instance in the
supertype is related to.
- subtypes can be disjoint or overlapping
- Disjoint subtypes are also called non-overlapping.
They contain attributes that can be one of two (or more) values, but
cannot be a combination of those values, such as an employee
being hourly or salaried, but never being both.
- Overlapping subtypes contain contain a
combination of attribute states which prevent there being one attribute
to represent the state. For example, an employee who has been promoted
will be in a probationary status, but will also be either
hourly or salaried. In this case the employee may be hourly, hourly and
probationary, salaried, or salaried and probationary. These are
overlapping states and they are said to be non-unique because of the
The text offers an example of an employee of a school who may be an administrator,
a professor, both, or neither. It may be best
to express these states in two logical fields, showing the states of
each as True or False. The text refers to the two fields (Administrator
and Professor) as discriminator attributes.
- Completeness - A supertype may have a completeness
constraint, a requirement that each instance must be a member of at
least one subtype. When this is so, the constraint is total
completeness. When some instances are allowed to not
be members of a subtype, that state is partial completeness.
- To make things more fun, it is possible that a
specialization may be overlapping or disjoint, as well as having total
or partial completeness. Table 5.2 in the text summarizes the
features of each combination.
The text differentiates between two procedures that you may
follow when identifying specializations.
- You may find that you learn about a supertype
first, then learn about the subtypes that occur in the data.
When this happens, you are following the specialization
process, going down from the top of the hierarchy to the bottom,
proceeding from the general to the specific.
- On the other hand, you may learn about the subtypes
first, then discover that a supertype should (and does) exist
the cover all of those types. In this case, you are following the generalization
process, starting at the bottom of the hierarchy and going up,
proceeding from the specific cases to the general case.
It may be useful to take a less granular approach in a
diagram, which can be done using entity clusters, in which
several entities are represented by one virtual entity. In
figure 5.5 in the text, the school database diagram has used a Location
entity to represent both the Room and Building entities.
can say that the Location entity has those two entities as its
attributes. The text warns us that trying to use a cluster object as an
entity can produce loss of keys, changes in inheritance, and loss of
relationships. It may simplify a diagram to use cluster objects, but
you should not depend on them in the DBMS.
The text returns to the subject of keys, to show us
some general principles about them. Each relation needs a primary key.
The text mentions that a good design often uses a natural key,
a natural identifier that is recognized by system users as part
of their existing vocabulary. This is good advice for all attributes.
Using terms that the users already know will make the system more
accessible to them.
The text offers five characteristics of a good
- non-intelligent - This only means that a good key
has no semantic, informational meaning. Perhaps non-semantic would have been a better label. An ID number is only that, it
is not imbued with a dictionary definition, a contextual meaning, or an
association to history or feelings that may be associated with a name,
a nickname, or any other label that has intrinsic meaning.
- no change over time - Good ID numbers do not need to
change when a person, place, or thing changes names, ownership, value,
or any other characteristic that might be associated with an entity.
For example, I have had the same checking account number for many
years, despite my bank having been merged repeatedly into larger and
larger bank entities.
- preferably single attribute - As we discussed last
week, a good key should be a single attribute whenever possible. It is
less likely to cause problems. It is impossible to violate the rules of
Second Normal Form is your primary key is not a composite key. When
forming a strong relation, as noted above, it is easier to copy a
single attribute key to related table than to copy a composite key to
- preferably numeric - As noted above, numbers do not
typically have semantic value, and there is an infinite supply of
unique instances of them. Auto-incrementation of record numbers is a
function of most DBMSs.
- security compliant - A good key is an attribute that
is not valuable in and of itself, as a Social Security number would be.
This is one of the reasons we do not use Social Security numbers as
keys in relations.
There will be times when you will need to create a key for a data table. When there is an entity, there are instances, and the instances are different from each other, so it should be possible to determine an existing key. When it is not, create what we will call a surrogate key.
Add an attribute to the table that has as many characteristics of a
good key as possible. This will typically be a numeric key, created by
numbering the rows in the data table. The text offers some advice about
- create a surrogate key when there is no natural key in the data
- create a surrogate key when the best candidate key is a long, complicated composite
- create a surrogate key when the best candidate has embedded semantic content that distracts the users
- when creating a surrogate key, make sure it performs
correctly and meets necessary constraints (not null, unique, and
anything else you need)
In table 5.4, we see an example of a table in need of a key.
It is a list of events, each having a date, a start time, an end time,
a room designator, an event name, and a count of expected guests. As you can see by examining
the table, or just by imagining it, every column may contain data that
is not unique. Adding an event number (or an event ID) field would
solve the problem, and allow this table to function as a relation.
The text continues with a discussion of linking relations by adding the primary key of one relation to the attributes of the other relation. Since it is the primary key of the first table, we refer to it as a foreign key in the second table. The text mentions that a similar method is to add the primary key of each table to the attributes of the other table. Since there is a choice to be made in the first method, some advice is offered:
- When one side of the relationship is mandatory in the system (must be 1 or more), and the other is optional (may be 0), add the key of the mandatory entity to the optional entity as a foreign key.
- When both sides of a relationship are optional, one of the keys may cause fewer nulls (no match) in the other table. If this is so, make that key the foreign key in the other table.
- When both sides are mandatory, make sure that the two entities should
be two, not one entity. If they should be separate, use the advice in
the bullet item above. If not, you should look at combining the two
Sometimes, you will need to create new entities/tables to meet a need for historical data. Consider the example of a an Employee table that includes an attribute for salary.
Salaries/hourly rates tend to change over time. This field will need an
update now and then. However, the employee, the business office, and
the various tax agencies will want to know when the change was made. The text proposes a new table that contains Employee Number (the primary key of the Employee table), Salary Start Date (when the change is/was effective), and Salary Amount
(the new value). In this data table, the history of salary changes for
the employee are available. Can you see any other data we might want to
include in this table?
In similar examples, the text shows us that new entities could
be created to maintain history on managers of departments or teams, and
job history for all employees
The chapter ends with some troubleshooting example that we will use as material on our discussion board this week.