CS 231 - Micro Electronics

Chapter 3: Data Representation


Objectives:

This lesson discusses several problems with representing, storing, and manipulating data. Objectives important to this lesson:

  1. Numbering systems used in computing
  2. Comparing different data representation methods
  3. Representing non-numeric data
  4. Common data structures
Concepts:

The chapter begins with a discussion of several data representation types the average person has probably seen, and a few  less common ones. You may recall from some early lesson that the numbering system used in English uses Arabic numerals, such as 1234, and so on. Your teacher may have called them Arabic numerals to differentiate them from Roman numerals, such as IIIIIIIV, and so on. We are exposed, as well, to the Mandarin and Russian words for computer (diÓn nǎo, and компьютер, which is pronounced the same as the English word). This is to show us that humans already have multiple methods for data representation even before we consider handing that data to a computer.

In a computer system, we typically have to represent all numerals and characters as items in a numbered list. Once such a list is created, each numeral and character can be represented simply by its number, which is then converted to a symbol that can be printed on paper or sent to a screen for a user to read.

When we are dealing with numbers themselves, we can give the actual number to the computer, where it can be manipulated as needed. The text continues with a discussion of several number systems that are commonly used with computers:

  • decimal - the number system based on ten digits (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9)
  • binary - the number system based on two digits (0 and 1)
  • hexadecimal - the number system based on sixteen digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F)

Most people are familiar with the decimal system, also called base 10. Computers are more in tune with the binary system, also called base 2. This is because computers are composed of many circuits that can only be switched on or off, which corresponds well to a number system that has only two digits.

The text explains that the binary, decimal, and hexadecimal systems all use a positional notation. Each notation works because of two things: the value of a digit (numeral), and the value of a position. Consider this decimal number: 327. What does it mean in decimal notation? Three times one hundred, plus two times ten, plus seven times one. You have that drummed into you so early in school you may never think of it any more. The value of a number is the value of each digittimes the value of its positionadded together.

Value of a Decimal Number
Position: Hundreds Position: Tens Position: Ones
Value of digit times value of position:300 Value of digit times value of position:20 Value of digit times value of position:7
3 2 7

Each position in the base 10 system is worth ten times the value of the position to its right: the position values are based on exponential powers of ten. The binary system is based on exponential powers of two, and the hexadecimal system is based on exponential powers of sixteen. The text reminds us that we have a decimal point in the base ten system for numbers like 2.5, which is equal to 2 times 1, plus 5 times 1/10. (One tenth, by the way, is 10-1, one hundredth is 10-2, and so on.) It would be silly to call the dot a decimal point in base 2 or base 16, so we refer to it by a more generic term: radix point. The numerals to the left of the radix point are the integer portion of a number, the numerals to the right of the radix point are the fractional part of the number. (The text also points out that some countries use a comma instead of a period for the radix point symbol.)

The text presents a method to convert a decimal number to a binary number. It is a bit too short to understand. Let me present a method that has worked for other classes.

In binary, there are only two digits: 1 and 0. The value of a binary number is the sum of the values of the positions that hold a 1. Let's look at how that works.

The largest number that can be stored in one byte is 255. This is the sum of the values of all eight positions in the byte. Converting a binary number to decimal is simple: add the values of all the positions that hold a one. The only trick is to know the value of each position. They are all powers of two. Start on the right with 1, and double the value to get the value of the next position to the left: 2, 4, 8, 16, 32, 64, and 128. If that isn't working for you, listen to Danny Kaye sing, and pay attention to the children in the background. Or if that's too old, try Paul McCartney.

Values of Positions in a Byte
Bit position: 7 6 5 4 3 2 1 0
Value of Position (if a 1 is in it): 128 64 32 16 8 4 2 1

How do you convert a decimal number (less than 256) to binary? (For bigger numbers, you need more than one byte, and larger powers of two.) You do a series of subtraction problems, one for each position in the byte, starting from the left. For example, let's convert 175 to binary.

  1. Ask yourself this question for each bit position: Can I subtract the value of this bit position from the current number? You must be able to do it without getting a negative result. 
    In this case, can you subtract 128 from 175
    Yes, you can. So you write a one in the 128 bit position, and do the math: 175 - 128 = 47. Perform the next test on 47.
  2. Can you subtract 64, the next position value, from 47
    No, so you write a zero in the 64 bit position.
  3. Can you subtract 32 from 47
    Yes, so write a one in the 32 bit position, and do the math: 47 - 32 = 15.
  4. Can you subtract 16 from 15
    No, so you write a zero in the 16 bit position.
  5. Can you subtract 8 from 15
    Yes, so write a one in the 8 bit position, and do the math: 15 - 8 = 7.
  6. Can you subtract 4 from 7
    Yes, so write a one in the 4 bit position, and do the math: 7 - 4 = 3.
  7. Can you subtract 2 from 3
    Yes, so write a one in the 2 bit position, and do the math: 3 - 2 = 1.
  8. When you have 1 left, write a one in the 1 bit position. This will always be done for odd numbers. 
    If there is no remainder at any of the steps, write a zero in each of the remaining bit positions.
Conversion to Binary
Bit position: 128 64 32 16 8 4 2 1
Conversion of 175 (above) 1 0 1 0 1 1 1 1

The text spends a page on adding two binary numbers. It resembles decimal addition in that we carry values to the left when they are too large to fit in the current position. For example, in any given column:

  • 0 + 0 = 0
  • 0 + 1 = 1
  • 1 + 1 = 10, so we write down the 0 and carry the 1, which becomes an addend for the next column on the left.
  • Since it could happen, 1 + 1 + 1 = 11, so we would write down the one on the right, and carry the 1 on the left to the next column.

The text moves on to hexadecimal notation, which it recommends for its compactness. Let's take an example. The largest binary number we can store in 8 bits is 1111 1111. I put a space in that string of characters to make it more readable. That same number, expressed in decimal notation, is 255. In hexadecimal notation, the number is FF. As numbers get even larger, there is a definite economy to writing them in hex.

Values of Positions in a Hexadecimal Number
Power of 16: 7 6 5 4 3 2 1 0
Value of Position : 268,435,456 16,777,216 1,048,576 65,536 4,096 256 16 1

As you can see, the value of each new position to the left becomes huge very quickly. 

The text mentions that it is sometimes difficult to know what base a number is written in, so some conventions are presented:

  • if a number is shown with no base designator, the custom is to assume it is in base 10
  • if a number is written with a subscript, that is typically the base (e.g. 10012, 100116)
  • if a number is written with a B at the end, we presume it is a binary number (e.g. 1001B)
  • if a number is written with an H at the end, we presume it is a hex number (e.g. 1001H)
  • if a number is written with a 0x prefix, we presume it is a hex number (e.g. 0x1001)

The text mentions that some systems have historically used octal (base 8) notation. It took fewer characters than binary notation, but more characters than decimal notation.

The text lists several goals of notation systems, which are in conflict with each other. You should be aware of them, but you should also know that when you are building a system, you don't get to pick these qualities as though they were items in a buffet line. You will need to work with the parameters of the system or language you are using to create your new system.

  • compactness - how many characters does it take to store a large number?
  • range - what is the largest and smallest number that can be expressed in a given memory space?
  • accuracy - can you express quantities like one third that have non-terminating decimal equivalents?
  • ease of manipulation - can it be used by your computer for arithmetic manipulation?
  • standardization - is the notation generally understood by most devices we will need to communicate with?

The text also mentions that CPUs typically support multiple data types, as do high level languages. Each may have multiple variants, but you should have a general idea about each of these types. On page 89, the text refers to them as primitive data types or machine data types:

  • integer - whole number that do not have fractional portions; may come in signed and unsigned types: unsigned integers are assumed to be positive
  • real number - numbers that may have fractional portions
  • character - symbols used to portray the characters used in human languages
    • EBCDIC - used on IBM mainframes in the 1960s (8 bits per symbol)
    • ASCII (IA5) - used on <span "="" style="font-weight: bold;">personal computers since the 1970s (7 bits per symbol); American Standard Code for Information Interchange is very close to International Alphabet 5: follow the two links for this bullet to note the insignificant differences, and note that there are other similar code sets for different countries
    • Unicode - created in the late 1980s and early 1990s by a consortium of vendors to represent more characters (16 bits per symbol); later it was expanded to use four bytes for some characters, so it has the capacity to store the characters for all current languages and some dead languages; Unicode includes ASCII as the first 256 characters in its system
  • Control codes - on page 83, the text mentions that some device control characters are typically included in language character sets: note the ones on pages 83 and 84 for ASCII, such as the codes for backspaceacknowledgenegative acknowledgecarriage return, and escape
  • Boolean - logical characters to be evaluated by Boolean logic operators like greater thanless than, and equals
  • memory address - primary memory (storage) on a system is managed by assigning an address to each memory location, as systems have grown over time, addressing schemes have changed to allow the addressing of larger and larger installed RAM

On page 89, the text explains that it called the items listed above primitive data types because there are more complex data types called data structures. A data structure is a collection of primitive types that are organized for a specific purpose. The fields in a database that hold multiple kinds of information in each record are an example of a data structure. The text tells us that files are also data structures, and so are strings of characters stored in an array. If that last word is strange to you, hold on for a moment.

We have already discussed memory addresses. In a related note, the text mentions pointers. A pointer is a variable in a programming language that holds a memory address. In other words, it points to a memory location. Why should we care? It is a useful tool in programming. For example, if we know that we are using a section of memory to hold records in a database, and each record is 512 bytes wide, we can set a pointer to the address of a particular record, and then increment the value of the pointer by some multiple of 512 bytes to cause it to point to a different record in that database. The book does not say so, but if we declare that the pointer points to a data structure of a particular size (like 512 bytes), we would only have to increment the pointer by some integer value (like 6) to move it ahead by that many data structure locations. This is a time saver for a programmer who does not know what the database administrator will actually create as the data structure.

An array (I told you we would come back to it) is a data structure of memory locations that are meant to hold the same kind of data, such as an array of integers, or an array of characters. This is a simpler structure than the database example which might have character fields, integer fields, logical fields, and more data types in each record.

The text explains that an array is given a name, and each item in the array can be referred to by that name and an index value, which is that item's position in the array. For example, we might make a character array of twenty six items to hold the English alphabet. We can call the array alphabet, and store the letter 'a' at the location alphabet[0]. Why not [1]? Arrays typically start at location 0, not 1. If we wanted to do so, we could declare the array to be of size 27, and only use index values [1] through [26], but that is not how programmers are taught to think.

On page 92, the text tells us that an array of characters is called a string. This is true, but that word (string) is also used to refer to any related input from a user.

It may not be obvious to you that an array is always stored in a series of contiguous memory locations, locations that follow in order, unbroken by other uses. This is not always possible. A program might need to reserve memory for an array of such a size that there is no sequence of memory locations of that size currently available. When this is true, it may be necessary to use a linked list instead of an array, as illustrated on page 93. For each item stored in a linked list, we also store a pointer that holds the address of the next item in the list. In this way, we are able to have a structure that functions like an array, even though the elements of the list are stored in discontinuous memory locations.

The text describes another reason to use a linked list instead of an array. It is difficult to expand the size of an array if the memory locations immediately following the array are occupied and the contents of those locations cannot be easily moved. It can also be difficult to reduce the size of an array. These problems do not apply to linked lists. The pointers associated with each element can be repointed as needed, which allows us to shorten the list by jumping over locations no longer needed, and to lengthen the list by pointing to new available locations in memory.

A disadvantage of a linked list should be clear from the illustrations in the text. If we read a list by learning where the next element is stored and jumping to it, we have a problem if we want to go backwards in the list. A standard linked list does not hold a pointer for the previous element. A doubly linked list includes two pointers for all elements except the first and the last. One pointer holds the location of the next item in the list, and the other holds the location of the previous item. Obviously, there is no pointer for a previous item associated with the first item in a list, and there is no pointer for a next item associated with the last item in a list.

The text continues the discussion of structures on page 96 by explaining that a record is a structure composed of other structures. In our database example, you should now see that this is so. If a string is a structure, and an array is a structure, then a record is a structure that can contain multiple instances of those kinds of structures and other kinds as well.

The text is a bit misleading when it explains that a collection of records, saved in secondary storage, is a file. This is true, but it is not an exhaustive definition of a file. This is really a better definition of a database file, as opposed to any other kind of file.

On page 97, we extend the definition of a structure again. This time, we are told that another kind of structure is called a class. The new characteristic of a class is that it can include methods, which are programs that can manipulate the data that is held in the class. For example, a class that contains customer records may include methods that can be used to update those records.

The last new term in the chapter is object, which is defined as an instance of a variable or a class. Once a programmer defines a class, an object can be created that belongs to the class. That object would then contain all the data structures and methods included in the class definition.


Chapter 3: Assignment 1 (Research Problem)

Due: Week Two 
Points: 20

 
Object-oriented programming has been adopted widely because of its capability to reuse code.  Most application development software provides class libraries and extensive support for complex data structures, including linked lists.  Investigate one of these libraries, such as the Microsoft Foundation Class (www.microsoft.com) or the Java 2 Platform (http://java.sun.com) application programming interface (API).  What data structures are supported by the library?  What types of data are recommended for use with each data structure object?  Which classes contain which data structures, and what methods does the library provide?  NOTE:  Be sure to cite all sources using APA format.

 

Chapter 3: Assignment 2

Complete Problems and Exercises #1 and #2 on page 103 of your textbook Systems Architecture. 
(Each exercise is worth 15 points.)
Complete Review Questions 1, 2, 6, 8, 9, 10, 11, 12, 13, and 14 on pages 102 and 103 of your textbook. 
(Each question is worth 3 points.)