Concepts:The chapter begins with a discussion of several data representation types the average person has probably seen, and a few less common ones. You may recall from some early lesson that the numbering system used in English uses Arabic numerals, such as 1, 2, 3, 4, and so on. Your teacher may have called them Arabic numerals to differentiate them from Roman numerals, such as I, II, III, IV, and so on. We are exposed, as well, to the Mandarin and Russian words for computer (diàn nǎo, and компьютер, which is pronounced the same as the English word). This is to show us that humans already have multiple methods for data representation even before we consider handing that data to a computer. In a computer system, we typically have to represent all numerals and characters as items in a numbered list. Once such a list is created, each numeral and character can be represented simply by its number, which is then converted to a symbol that can be printed on paper or sent to a screen for a user to read. When we are dealing with numbers themselves, we can give the actual number to the computer, where it can be manipulated as needed. The text continues with a discussion of several number systems that are commonly used with computers:
Most people are familiar with the decimal system, also called base 10. Computers are more in tune with the binary system, also called base 2. This is because computers are composed of many circuits that can only be switched on or off, which corresponds well to a number system that has only two digits. The text explains that the binary, decimal, and hexadecimal systems all use a positional notation. Each notation works because of two things: the value of a digit (numeral), and the value of a position. Consider this decimal number: 327. What does it mean in decimal notation? Three times one hundred, plus two times ten, plus seven times one. You have that drummed into you so early in school you may never think of it any more. The value of a number is the value of each digit, times the value of its position, added together.
Each position in the base 10 system is worth ten times the value of the position to its right: the position values are based on exponential powers of ten. The binary system is based on exponential powers of two, and the hexadecimal system is based on exponential powers of sixteen. The text reminds us that we have a decimal point in the base ten system for numbers like 2.5, which is equal to 2 times 1, plus 5 times 1/10. (One tenth, by the way, is 10-1, one hundredth is 10-2, and so on.) It would be silly to call the dot a decimal point in base 2 or base 16, so we refer to it by a more generic term: radix point. The numerals to the left of the radix point are the integer portion of a number, the numerals to the right of the radix point are the fractional part of the number. (The text also points out that some countries use a comma instead of a period for the radix point symbol.) The text presents a method to convert a decimal number to a binary number. It is a bit too short to understand. Let me present a method that has worked for other classes. In binary, there are only two digits: 1 and 0. The value of a binary number is the sum of the values of the positions that hold a 1. Let's look at how that works. The largest number that can be stored in one byte is 255. This is the sum of the values of all eight positions in the byte. Converting a binary number to decimal is simple: add the values of all the positions that hold a one. The only trick is to know the value of each position. They are all powers of two. Start on the right with 1, and double the value to get the value of the next position to the left: 2, 4, 8, 16, 32, 64, and 128. If that isn't working for you, listen to Danny Kaye sing, and pay attention to the children in the background. Or if that's too old, try Paul McCartney.
How do you convert a decimal number (less than 256) to binary? (For bigger numbers, you need more than one byte, and larger powers of two.) You do a series of subtraction problems, one for each position in the byte, starting from the left. For example, let's convert 175 to binary.
The text spends a page on adding two binary numbers. It resembles decimal addition in that we carry values to the left when they are too large to fit in the current position. For example, in any given column:
The text moves on to hexadecimal notation, which it recommends for its compactness. Let's take an example. The largest binary number we can store in 8 bits is 1111 1111. I put a space in that string of characters to make it more readable. That same number, expressed in decimal notation, is 255. In hexadecimal notation, the number is FF. As numbers get even larger, there is a definite economy to writing them in hex.
As you can see, the value of each new position to the left becomes huge very quickly. The text mentions that it is sometimes difficult to know what base a number is written in, so some conventions are presented:
The text mentions that some systems have historically used octal (base 8) notation. It took fewer characters than binary notation, but more characters than decimal notation. The text lists several goals of notation systems, which are in conflict with each other. You should be aware of them, but you should also know that when you are building a system, you don't get to pick these qualities as though they were items in a buffet line. You will need to work with the parameters of the system or language you are using to create your new system.
The text also mentions that CPUs typically support multiple data types, as do high level languages. Each may have multiple variants, but you should have a general idea about each of these types. On page 89, the text refers to them as primitive data types or machine data types:
On page 89, the text explains that it called the items listed above primitive data types because there are more complex data types called data structures. A data structure is a collection of primitive types that are organized for a specific purpose. The fields in a database that hold multiple kinds of information in each record are an example of a data structure. The text tells us that files are also data structures, and so are strings of characters stored in an array. If that last word is strange to you, hold on for a moment. We have already discussed memory addresses. In a related note, the text mentions pointers. A pointer is a variable in a programming language that holds a memory address. In other words, it points to a memory location. Why should we care? It is a useful tool in programming. For example, if we know that we are using a section of memory to hold records in a database, and each record is 512 bytes wide, we can set a pointer to the address of a particular record, and then increment the value of the pointer by some multiple of 512 bytes to cause it to point to a different record in that database. The book does not say so, but if we declare that the pointer points to a data structure of a particular size (like 512 bytes), we would only have to increment the pointer by some integer value (like 6) to move it ahead by that many data structure locations. This is a time saver for a programmer who does not know what the database administrator will actually create as the data structure. An array (I told you we would come back to it) is a data structure of memory locations that are meant to hold the same kind of data, such as an array of integers, or an array of characters. This is a simpler structure than the database example which might have character fields, integer fields, logical fields, and more data types in each record. The text explains that an array is given a name, and each item in the array can be referred to by that name and an index value, which is that item's position in the array. For example, we might make a character array of twenty six items to hold the English alphabet. We can call the array alphabet, and store the letter 'a' at the location alphabet[0]. Why not [1]? Arrays typically start at location 0, not 1. If we wanted to do so, we could declare the array to be of size 27, and only use index values [1] through [26], but that is not how programmers are taught to think. On page 92, the text tells us that an array of characters is called a string. This is true, but that word (string) is also used to refer to any related input from a user. It may not be obvious to you that an array is always stored in a series of contiguous memory locations, locations that follow in order, unbroken by other uses. This is not always possible. A program might need to reserve memory for an array of such a size that there is no sequence of memory locations of that size currently available. When this is true, it may be necessary to use a linked list instead of an array, as illustrated on page 93. For each item stored in a linked list, we also store a pointer that holds the address of the next item in the list. In this way, we are able to have a structure that functions like an array, even though the elements of the list are stored in discontinuous memory locations. The text describes another reason to use a linked list instead of an array. It is difficult to expand the size of an array if the memory locations immediately following the array are occupied and the contents of those locations cannot be easily moved. It can also be difficult to reduce the size of an array. These problems do not apply to linked lists. The pointers associated with each element can be repointed as needed, which allows us to shorten the list by jumping over locations no longer needed, and to lengthen the list by pointing to new available locations in memory. A disadvantage of a linked list should be clear from the illustrations in the text. If we read a list by learning where the next element is stored and jumping to it, we have a problem if we want to go backwards in the list. A standard linked list does not hold a pointer for the previous element. A doubly linked list includes two pointers for all elements except the first and the last. One pointer holds the location of the next item in the list, and the other holds the location of the previous item. Obviously, there is no pointer for a previous item associated with the first item in a list, and there is no pointer for a next item associated with the last item in a list. The text continues the discussion of structures on page 96 by explaining that a record is a structure composed of other structures. In our database example, you should now see that this is so. If a string is a structure, and an array is a structure, then a record is a structure that can contain multiple instances of those kinds of structures and other kinds as well. The text is a bit misleading when it explains that a collection of records, saved in secondary storage, is a file. This is true, but it is not an exhaustive definition of a file. This is really a better definition of a database file, as opposed to any other kind of file. On page 97, we extend the definition of a structure again. This time, we are told that another kind of structure is called a class. The new characteristic of a class is that it can include methods, which are programs that can manipulate the data that is held in the class. For example, a class that contains customer records may include methods that can be used to update those records. The last new term in the chapter is object, which is defined as an instance of a variable or a class. Once a programmer defines a class, an object can be created that belongs to the class. That object would then contain all the data structures and methods included in the class definition.
|