This short chapter introduces concepts about code systems used to transmit data. The objectives important to this chapter are:
Concepts:The most basic idea about computers in this chapter is that they are binary devices, essentially collections of switches that may be off or on. A bit is a binary digit, and it can only represent those two states.To communicate more complicated information than off and on, we need codes, sets of symbols that can be transmitted across data lines. The evolution of data codes is discussed, starting on page 250. Samuel F. B. Morse created Morse Code, which was useful for telegraphs and human operators, but is not very good for machine transmission. Computer systems do better when the code used meets the four criteria
listed on page 251:
The number of bits in a symbol, assuming the bits are just on
and off, determines how many unique symbols your code system can
represent. The chart on page 251 summarizes this. If the bit has two possible
states, then the number two raised to the power of the number
of bits is the number of permutations (unique possibilities)
possible in that system. Your book refers to these unique combinations
as code points. At this point, some students will observe that
the English alphabet only has twenty six letters, so five bits might be
enough. It would be, if we did not care about numerals, punctuation, capitals
and lower case, and the concepts on page 252. Three types of character
assignments, meanings for code points, are listed:
On page 253, the idea of Escape codes is introduced. This is compared to the idea of the Shift key on a typewriter. In the same way that a Shift key changes the meaning of the next symbol, an Escape code changes the meaning of symbols in a code system. This effectively gives two meanings to most symbols, without having to double the size of the code table. The drawback is that the software reading the code has to watch out for such characters. Specific codes are discussed, beginning on page 254, the first being the real Baudot code, invented by Emil Baudot, and the second being the code called Baudot, invented by Donald Murray. The book apologizes, then proceeds to discuss Murray's code, using the common misnomer for it, Baudot. I will refer to Murray's International Alphabet No. 2 as Baudot here. Baudot code, as generally used, is a five bit code, with no parity. It is still used in teletypes, telegraphs and telex equipment, even though it is outmoded. ASCII is a major code system. It is the American Standard Code for Information Interchange, developed by the American National Standards Institute, a U.S. standards organization. It works very well for English, but has some drawbacks. ASCII is a seven bit code, which becomes eight bits if a parity bit is used or if the extended version is used. Seven bits gives us 128 characters, illustrated on page 256. Using the chart on this page, and find the capital "A". The seven bits for this symbol are 100 (from the column heading) and 0001 (from the row heading). This is the binary equivalent for the decimal number 65. Most common English letters and symbols are represented here, but not all. A second chart of 128 more characters forms the extended ASCII table, when using eight bit ASCII. Notice that the bits in this chart are numbered, and that they are numbered in descending order from left to right. For the letter "A", bit 7 is a 1, bit 6 is a 0 and bit 5 is a 0. (The "100" noted above.) When using extended ASCII, every symbol has eight bits. Every symbol in the chart on page 256 would have a 0 as its eighth bit. Another major code is shown on page 257. It was invented by IBM for its mainframe systems, and has the horrible name EBCDIC, which stands for Extended Binary Coded Decimal Interchange Code. (Can you tell that the IBM marketing department was not its best department?) To be incredibly different, EBCDIC is an eight bit code and the bits are numbered in the reverse order from ASCII numbering. This system is not used on any personal computer, only on mainframes. Unicode is discussed on page 256. This is the biggest one yet. Unicode was created by a consortium of computer and software companies, and is a sixteen bit system. This gives us 65,536 possible symbols. Why so many? Earth is a large planet in terms of languages. 128 or 256 symbols is nothing when compared to the number of symbols needed for roman alphabets, Cyrillic alphabets, Asian alphabets, symbolic languages than do not even use true alphabets, and so on. Sixteen bits is about enough to cover the needs of a computer system capable of communicating with anyone on the planet. On page 259, the author discusses Control Characters, and lists several common ones. The names of these characters are often acronyms, such as SOH for Start of Header, and EOT for End of Text. Some are just abbreviations, like ACK for Acknowledgment, and NAK for Negative Acknowledgment. The concept of code efficiency is discussed briefly on page 260. This strikes me as a bit of an accountant's ruse. The author explains that an efficient code is one that spends most of its characters passing information, not processing overhead like error trapping. As far as it goes, he is correct, but I wonder how "efficient" a code might be considered if we have to keep retransmitting messages over and over because it has no error traps? When passing information from one computer system to another, one or both must often convert the messages from one code system to another. For instance, sending a message from an EBCDIC terminal to an ASCII PC requires at least one conversion. In ongoing transactions, it is likely that many conversions must take place. It is part of the job of network software to work out which side will translate, one or both, and what common languages each side speaks. Data Compression (or Compaction) is the next topic. Three main
schemes for sending fewer bits across wires are listed on page 262:
|