CIS 2210 - Database Management and Design

Chapter 4: Entity Relationship (ER) Modeling
Chapter 5: Advanced Data Modeling

Objectives:

This lesson discusses material from chapters 4 and 5. Objectives important to this lesson:

  1. Entity relationship components
  2. Defining and refining relationships
  3. Database design with ERD components
  4. Reconciling design conflicts
  5. ERD diagrams

  6. Extended entity relationship model
  7. Entity clusters
  8. Selecting good primary keys
  9. Special cases
Concepts:

Chapter 4

The chapter begins with some review, and throws in some new terms:

  • A relational database has three main components - entities, attributes, and relationships.
  • Entities are concepts, and instances are cases/examples of those concepts.
  • Attributes may be required (these have a NOT NULL constraint) or optional.
  • Attributes should have a defined domain of values.
  • The text uses the word identifier as a synonym for a key. We have talked about composite keys, so be aware that there are also composite identifiers.
  • Composite attributes are attributes that we may/should consider dividing into more attributes, such as the parts of a name or the parts of an address.
  • Simple attributes are attributes that do not have subcomponents.
  • Derived attributes are calculated from other attributes. For example, a person's current age can be calculated from their date of birth and the current date with simple arithmetic. When a derived attribute is likely to change often, it should be recalculated with each use. When it is unlikely to ever change, it can be calculated once, then stored as a value, instead of as a formula, for its next use. When it is possible that it should change due to new data being available, the designer must make a judgment on how to handle it.

The text returns to the Entity Relationship Diagram, and teaches us more terms about it.

  • We can refer to two entities that are linked by a relationship as participants in the relationship.
  • Connectivity is the word used to refer to the kind of relationship that exists, the verb that causes the linkage.
  • Cardinality refers to maximum and minimum values that may apply to the relationship. For example, an instructor may be limited to teach no more than some number of classes, or no fewer than some number, or both. These numbers define the cardinality of the relationship. If the instructor must teach at least one, and no more than four classes, the cardinality of the relationship would be (1, 4), written beside the Class entity. If each class must have no less than one, and no more than one instructor, we would write (1, 1) beside the Instructor entity.

Some entities will exist in a database regardless of the existence of others. These are called strong or regular entities, and they have existence independence. Other entities will not exist unless another entity exists to create them. These are called existence dependent entities.

  • In the database for a school, Course would be an example of a strong entity that would exist as a concept in the school catalog. This entity has existence independence.
  • A Section would be an entity that exists only in relation to a course, and only when a section of a particular course was scheduled to be held. This entity has existence dependence.
  • As an example, the school always has the potential to hold a section of a particular class if that class is an instance of a defined course. The course definition must exist first, or the section will not/can not exist. This places the Course relation in a parent role regarding the Section relation.

Relationships can be called strong or weak, depending on the linkage between the relations (tables). The text defines this situation for relations in a parent-child relationship.

  • From the perspective of a parent relation, the linkage to its child relation is strong if the child relation contains a component of the parent relation's primary key in the child's primary key. For this to be true, one or both of the relations will need to have a composite key. A strong relationship is also called an identifying relationship.
  • If this condition does not exist, the tables may still be related as parent and child, but the relationship is weak. A weak relationship is also called a non-identifying relationship.
  • In an ERD, a strong relationship is shown as a solid line, and a weak relationship is shown as a dashed line.

The text makes the issue of strong or weak harder to remember by discussing the fact that entities can be strong or weak as well.

  • A weak entity has a strong relationship to its parent, and it is existence-dependent on its parent.
  • A strong entity is one that does not meet both of the criteria above.

Since you are trying to learn this symbol system, take a look at the one page reference posted by Vivek Chawla. This is a useful diagram that we will discuss in class.

  • Note the Crow's foot notation that looks like two pipe characters: ||. This means one and only one. In the reference sheet, it shows that for every student there must be one and only one student ID.
  • The symbol that looks like a one just means one.
  • The symbol that looks like a circle means zero is a possibility.
  • The symbol that looks most like a crow's foot (or most birds' feet) means many.
  • The symbols are typically combined to give you a visual statement about the kind of relationship they depict.
  • When symbols appear adjacent to an entity object, they mean that this is the relationship from the far object to the adjacent object. For instance, in the relationship between prescription and patient, the symbol for one and only one appears adjacent to the patient object. A prescription is written for/given to one and only one patient.

With the basic tools for an ERD diagram somewhat established, the text suggests that we should use a process for creating them.

  1. Create a document describing the operations of the client/organization/organizational unit.
  2. Create business rules from the operations document.
  3. Use the business rules to identify entities and relationships.
  4. Draw a first draft of an ERD.
  5. Determine attributes and primary keys for the entities.
  6. Review the prototype ERD, and revise it as needed.

The text walks through the creation of several ERDs. Each shows a portion of how the organization works. It is a little like the classic poem, The Blind Men and the Elephant. When you work with people who have no grasp of the big picture, you have misunderstandings and needless battles.

Blind Men and the Elephant video

One should take this as a cautionary note. If you don't try to develop a sense of the big picture of the organization, you are quite likely to capture only snapshots of processes that do not depict the whole story. Your product will be flawed, and you will be unhappy because your customers will be unhappy.

The chapter closes with an observation that the database will be slower as more and more relationships are established. It suggests that we may have to enable faster processing by leaving out relationships that are not necessary to the business processes.

Chapter 5

Having explained a bit about the Entity Relationship (ER) model, chapter 5 turns to the Extended Entity Relationship (EER) model. It includes the concepts of supertypes and subtypes, which are a lot like classes and subclasses in object oriented programming. Unless you have entities that are essentially similar, and that also have specific differences from each other, you do not need to use this concept.

  • entity supertype - an entity that is constructed with attributes that will be common to all of the subtypes related to it
  • entity subtype - an entity that is based on a supertype, but is different from the supertype because of its specialized attributes and purpose

The text presents the example of a supertype called Employee, which will have all the attributes that are common to various categories of employees in an organization. Subtypes of this entity might include managers, salaried staff, commissioned staff, part time staff, full time staff, probationary staff, or whatever categories the organization actually uses. It may be obvious that each subtype is related to one supertype, and each supertype may be related to no subtypes, one subtype, or many subtypes. A supertype with no subtypes may exist as preparation for the future.

Supertypes and related subtypes support specialization hierarchy, which is another set of features that entity types may have.

  • inheritance from the supertype to a subtype
    • subtypes have the same primary key as their supertype
    • if a supertype has relationships, the subtypes inherit those as well
  • subtype discriminator - This is an attribute in a supertype that tells us which subtype an instance in the supertype is related to
  • subtypes can be disjoint or overlapping
    • Disjoint subtypes are also called non-overlapping. They contain attributes that can be one of two (or more) values, but cannot be a combination of those values, such as an employee being hourly or salaried, but never being both.
    • Overlapping subtypes contain contain a combination of attribute states which prevent there being one attribute to represent the state. For example, an employee who has been promoted will be in a probationary status, but will also be either hourly or salaried. In this case the employee may be hourly, hourly and probationary, salaried, or salaried and probationary. These are overlapping states and they are said to be non-unique because of the overlap.
      The text offers an example of an employee of a school who may be an administrator, a professor, both, or neither. It may be best to express these states in two logical fields, showing the states of each as True or False. The text refers to the two fields (Administrator and Professor) as discriminator attributes.
    • Completeness - A supertype may have a completeness constraint, a requirement that each instance must be a member of at least one subtype. When this is so, the constraint is total completeness. When some instances are allowed to not be members of a subtype, that state is partial completeness.
    • To make things more fun, it is possible that a specialization may be overlapping or disjoint, as well as having total or partial completeness. Table 5.2 in the text summarizes the features of each combination.

The text differentiates between two procedures that you may follow when identifying specializations.

  • You may find that you learn about a supertype first, then learn about the subtypes that occur in the data. When this happens, you are following the specialization process, going down from the top of the hierarchy to the bottom, proceeding from the general to the specific.
  • On the other hand, you may learn about the subtypes first, then discover that a supertype should (and does) exist that covers all of those types. In this case, you are following the generalization process, starting at the bottom of the hierarchy and going up, proceeding from the specific cases to the general case.

It may be useful to take a less granular approach in a diagram, which can be done using entity clusters, in which several entities are represented by one virtual entity. In figure 5.5 in the text, the school database diagram has used a Location entity to represent both the Room and Building entities. We can say that the Location entity has those two entities as its attributes. The text warns us that trying to use a cluster object as an entity can produce loss of keys, changes in inheritance, and loss of relationships. It may simplify a diagram to use cluster objects, but you should not depend on them in the DBMS.

The text returns to the subject of keys, to show us some general principles about them. Each relation needs a primary key. The text mentions that a good design often uses a natural key, a natural identifier that is recognized by system users as part of their existing vocabulary. This is good advice for all attributes. Using terms that the users already know will make the system more accessible to them.

The text offers five characteristics of a good primary key:

  • non-intelligent - This only means that a good key has no semantic, informational meaning. An ID number is only that, it is not imbued with a dictionary definition, a contextual meaning, or an association to history or feelings that may be associated with a name, a nickname, or any other label that has intrinsic meaning.
  • no change over time - Good ID numbers do not need to change when a person, place, or thing changes names, ownership, value, or any other characteristic that might be associated with an entity. For example, I have had the same checking account number for many years, despite my bank having been merged repeatedly into larger and larger bank entities.
  • preferably single attribute - As we discussed last week, a good key should be a single attribute whenever possible. It is less likely to cause problems. It is impossible to violate the rules of Second Normal Form if your primary key is not a composite key. When forming a strong relation, as noted above, it is easier to copy a single attribute key to a related table than to copy a composite key to it.
  • preferably numeric - As noted above, numbers do not typically have semantic value, and there is an infinite supply of unique instances of them. Auto-incrementation of record numbers is a function of most DBMSs.
  • security compliant - A good key is an attribute that is not valuable in and of itself, as a Social Security number would be. This is one of the reasons we do not use Social Security numbers as keys in relations. That, and it is against the law.

There will be times when you will need to create a key for a data table. When there is an entity, there are instances, and the instances are different from each other, it should be possible to determine an existing key. When it is not, create what we will call a surrogate key. Add an attribute to the table that has as many characteristics of a good key as possible. This will typically be a numeric key, created by numbering the rows in the data table. The text offers some advice about surrogate keys:

  • create a surrogate key when there is no natural key in the data
  • create a surrogate key when the best candidate key is a long, complicated composite
  • create a surrogate key when the best candidate has embedded semantic content that distracts the users
  • when creating a surrogate key, make sure it performs correctly and meets necessary constraints (not null, unique, and anything else you need)

In table 5.4, we see an example of a table in need of a key. It is a list of events, each having a date, a start time, an end time, a room, an event name, and count of guests. As you can see by examining the table, or by just imagining it, every column may contain data that is not unique. Adding an event number (or an event ID) field would solve the problem, and allow this table to function as a relation.

The text continues with a discussion of linking relations by adding the primary key of one relation to the attributes of the other relation. Since it is the primary key of the first table, we refer to it as a foreign key in the second table. The text mentions that a similar method is to add the primary key of each table to the attributes of the other table. Since there is a choice to be made in the first method, some advice is offered:

  • When one side of the relationship is mandatory in the system, and the other is optional, add the key of the mandatory entity to the optional entity as a foreign key.
  • When both sides of a relationship are optional, one of the keys may cause fewer nulls (no match) in the other table. If this is so, make that key the foreign key in the other table.
  • When both sides are mandatory, make sure that the two entities should be two, not one entity. If they should be separate, use the advice in the bullet items above.

Sometimes, you will need to create new entities/tables to meet a need for historical data.  Consider the example of a an Employee table that includes an attribute for salary. Salaries/hourly rates tend to change over time. This field will need an update now and then. However, the employee, the business office, and the various tax agencies will want to know when the change was made. The text proposes a new table that contains Employee Number (the primary key of the Employee table), Salary Change Date (when the change is/was effective), and Salary Amount (the new value). In this data table, the history of salary changes for the employee are available. Can you see any other data we might want to include in this table?

In similar examples, the text shows us that new entities could be created to maintain history on managers of departments or teams, and on job history for employees.

The chapter ends with some troubleshooting examples that we will use as material on our discussion board this week.