CIS 251 - System Development Methods

Chapter 5: Data and Process Modeling


This lesson discusses material from chapter 5. Objectives important to this lesson:

  1. Data and process modeling concepts
  2. Data flow diagrams
  3. Level and balance data flow diagrams
  4. Data dictionaries
  5. Decision tables and trees
  6. Logical and physical models
  7. Assignments for the week

The chapter jumps immediately to objective six in the introduction by discussing logical and physical models. The terms are a bit misleading, so a definition is required.

  • A logical model is about what the system must do. It is still on the requirements side of the project.
  • A physical model is about how the system will meet the requirements, and how it will be constructed.

The text moves ahead to discuss two styles of data flow diagrams. The two styles are similar, with minor differences in the symbols they use for elements in the diagram. Follow the links below for a more formal discussion of each type of diagram.

  • Gane-Sarson DFDs use
    • rounded rectangles for processes
    • squares with drop shadows for entities
    • arrows for data flows
    • shallow rectangles that are open on the right end for data stores
  • Yourdon DFDs use
    • circles for processes
    • squares or rectangles (no drop shadows) for entities
    • arrows (that may be curved) for data flows
    • shallow rectangles that are open on the right end (or both ends) for data stores

The text only uses Gane-Sarson DFDs on the pages that follow.

The text discusses the meaning of each symbol. It remarks that a process symbol on a DFD may be called a black box because we are told on the diagram what it does, but not how it is done. This diagram is meant to show what happens, not how it happens. That makes this a logical diagram. A process must do something with the data it receives, or there is no need for the process to exist in the system. Typically, a process symbol represents code that a programmer must add to the system.

A data flow symbol shows how data flows into and out of processes. It must show direction, which is typically one way. If, however, a process pulls data from a data store, and writes back to it when done, there will be two data flow symbols between those objects, one flowing into the process, and one flowing into the data store. (See figure 5-7 on page 184.) This leads to the discussion on page 183 of commonly seen errors when DFDs are not done correctly. Consider the three errors in figure 5-6 on page 183:

  • The first process shown has two data flows coming out of it. That could be correct, but there is no data flow entering it. This is illogical. There is no trigger event for the process to send its data to either destination. The text calls this a spontaneous generation error.

  • The second example shows two data flows entering a process, but no data flow leaving it. This is possible, but also illogical. What happens to the data? How is the output of the process accessed? The text calls this a black hole error: data in, nothing out.

  • The third example shows a data flow called Date of Birth flowing into a process called Calculate Grade. (I'm sorry, children. Typing those in all caps like the text makes me ill.) The Calculate Grade process then sends data to a Final Grade data flow. Really? You mean I could calculate your final grades for this class if all I know if your birth dates? Is it a miracle? Is it a really bad way to hand out grades? No, the text calls it a gray hole, a process that receives insufficient data for its purpose. What's worse, some poor programmer might be told to write the program to do it, regardless of it being nonsense.

A data store represents a file, folder, drive, or other means of storing information. It does no good if it is not connected to a process by a data flow. Multiple processes might access a data store, to read or write to it. If the system accesses data that changes only with versions of the program, such as tax rates, the data store holding that data may not have input data flows, and may only be connected to processes that read it.

An entity symbol typically represents users of a system, as shown in the examples on page 185. Note that entities may not connect directly to each other or to data stores. Entities should be connected to processes by data flows.

Review the information in figure 5-11, which shows three correct and three incorrect ways to show data flowing on a DFD. These are six rules that will keep you from making rookie mistakes on a DFD.

The text goes on for several pages, revealing that DFDs can be done at several levels, that they start at a very high level (called a context diagram), that each high level process can be broken down by lower level diagrams that show what happens inside it. Got that? Good, let's move on.

On page 197, the text brings up the idea of a data dictionary. A data dictionary is defining information about a database. This may be a system or a formal document that explains each data element in our system, including the kind of variable used to collect that element, the data type used to store it, and the length constraints on the element. With regard to a DFD, a data dictionary should also hold formal information about data flows, data stores, processes, and entities. Databases are typically stored and viewed as records, which are structures of data. A customer record, for example, might hold the customer's name, ID, credit card data, billing and shipping addresses, current account status, and order history.

On page 204, the text introduces the three classic structures of programming. It is good to know them if you are going to describe processes (programs).

  • sequence - programs must work in some sequence of steps, as must systems; some processes do not terminate, but continue to run while others run
  • selection - processes can make choices about what to do based on data or user input (which is another kind of data)
  • iteration - iterations, or loops, are used in programs to repeat processes as necessary, with the same, new, or modified data

When describing a business process for the data dictionary, or for the programmers that must follow it, the system analyst producing that description may need to use some form of English that describes the logic of the process. The text compares the structured English used by an analyst to the pseudocode used by a programmer. Both are only planning notations, not meant to be sufficient to carry out the tasks of the system without development.

Starting on page 206, the text discusses decision tables, which can be used to represent business rules. A programmer must be able to read tables like these and to translate them into code that processes in the system will use to make selection decisions. We will discuss some of these examples in class.

On page 210, the text shows us a decision tree, which is another way to display business rules. The choice to use decision tables or trees may rest with the project lead or may depend on the preferences of the audience. Whichever method is used, take care not to omit a possible combination of factors, else you will have a system that produces unexpected results or breaks.

The chapter ends with advice that it is a good idea to produce a logical model of both the current and new systems using the tools in this chapter, and to produce a physical model of the old and new systems when conducting the design phase of the project.

Turning to the last Toolkit chapter, this set of pages is most useful to students not familiar with searching for information on the Internet. Please browse through it, if you have not, and let's share any questions in class.