CS 1110 - Introduction to Programming

Starting Out with Python
Chapter 9, Dictionaries and Sets

Objectives:

This lesson covers chapter 9, which discusses more ways to organize data. Objectives important to this lesson:

  1. Dictionaries
  2. Sets
  3. Serializing objects
Concepts:
Chapter 9
Dictionaries

The chapter begins with a definition that is not very enlightening. We are told that a dictionary is a collection of data, typically pairs of data, in what we might imagine as a table. The two parts of each entry in the table are the key and the value. The key is a unique identifier for its value. A dictionary cannot have duplicate keys, but separate keys may have the same value. The text gives us an example of employee IDs (keys) that each match only one value (the employee's name, in this case).

The notation for a dictionary begins with a name for it. If we stay with the first example in the text, this dictionary is called phonebook. We begin by stating something like this:

phonebook = {'Doug':'555-3684', 'Morris':'555-6677','Julie':'555-5854'}

In this example, the name (phonebook) is assigned a series of key-value pairs that are enclosed by a set of curly braces. Each key-value pair is two strings, separated by a colon. Pairs (elements) are separated by commas. In each key-value pair, the key comes first. When you construct a dictionary, you are allowed to use any data object type as a value, but keys must be immutable types, such as numbers, strings, and tuples.

Elements in a dictionary can be entered in any order. As the text shows us, entering the name of the dictionary on a command line causes the dictionary to be displayed, sorted by the keys.

phonebook
{'Doug':'555-3684','Julie':'555-5854'
, 'Morris':'555-6677'}

Keys are used to retrieve dictionary elements, not indices, because of this sorting behavior. To retrieve Doug's value from this dictionary, the reference would be phonebook['Doug']. Note that this reference retrieves only the value associated with the supplied key, not the whole element. The text warns us that trying to retrieve a value for a key that does not exist in the dictionary will case a KeyError exception. Keys are case sensitive, so it is important to reference them with whatever capitalization was used when they were stored.

Since we can cause exceptions by attempting to access values for keys that are not in a dictionary, and we want to avoid exceptions, we can check for the existence of a key in a dictionary with the in operator. Example:

if 'Doug' in phonebook:
    print(phonebook['Doug'])

The text shows us that we can use the not in operator if we expect that the key we are referencing does not exist in the dictionary. As you might imagine, we can check for a key being in a dictionary first, and if it is found, we can execute an instruction that uses that key safely.

The text shows us that we can update the value of an existing key or add a new key-value element to the dictionary with the same notation:

phonebook['Steve'] = '555-7738' # this code would add an element to the dictionary
phonebook['Doug'] = '555-8635' # this code would update Doug's phone number value because that key already exists

We can delete an element from a dictionary by using the del command:

del phonebook['Steve'] # this would delete the 'Steve' element (the key and its value) from the dictionary

Checking for the existence of a key first with the in operator, before using the del command, would avoid a KeyError exception that would be generated if the key does not exist in the named dictionary.

The len() function will return the number of elements in a dictionary, if you provide the name of a dictionary as the argument to len(). Example:

length_phonebook = len(phonebook)

In the examples so far, we have only used strings as key and values. The text offers an example dictionary that has strings as keys, and lists as values:

test_scores = { 'Kayla' : [ 88, 92, 100], 'Luis' : [95, 74, 81], 'Sophie' : [72, 88, 91], 'Ethan' : [70, 75, 78]  }

The dictionary test_scores has four elements, each of which has a list of three numbers as its value. Even though the keys and values are of different types, this dictionary still has symmetry. The text makes itself confusing by giving us an example in which:

  • the first element has a string for a key, and a number for its value { 'abc' : 1 }
  • the second element has a number for its key, and a string for its value { 999 : 'yada yada'}
  • the third element has a tuple for a key, and a list for a value { (3, 6, 9) : [3, 6, 9] }

Note that each of the example elements has continued to use an immutable form for its key. The next example in this section, as the book states, is more practical. It gives us a dictionary that pairs three labels about an employee with three values for that employee. It is more like a database record, except that it holds the field names along with the data elements.

The text shows us that the phonebook dictionary could have been created as an empty dictionary structure by putting nothing between a pair of curly braces.

phonebook = {}

The virtue of doing so is that several lines of code can be entered to populate this dictionary, one by one, without knowing the entire sequence when we start. The text mentions another method to create an empty dictionary which is a bit longer.

phonebook = dict()

This method uses the dict() function, which would create a dictionary with the desired name, and the dictionary would be empty due to our not passing an argument it. This link leads to a page that has more information about dictionaries.

More practical information is in the example of using a for-in loop with a dictionary. The name of a dictionary can be used as an iterative for the loop.

for var1 in phonebook:
    print(var1)

This will cause the variable in the control statement (var1) to be assigned the name of the key in each dictionary entry, printing out its value for each loop. The second example in this section accesses the value associated with the key in each iteration of the loop by reading it from the dictionary.

For var1 in phonebook:
    print(var1, phonebook[var1])

This notation is useful in that we can learn the names of the keys in the dictionary and learn their values, just from knowing the name of the dictionary itself. The chapter continues with several short discussions of some dictionary methods

Dictionary Method
Description
dictionary.clear()
Removes all entries from the dictionary
dictionary.get(key, default)
This method returns the value associated with a specified key, but it returns the specified default value if the key is not found. This makes it more exception resistant than just accessing the value by index notation.
dictionary.items()
This one returns a sequence of tuples, each tuple holding one of the elements in the dictionary. The text presents a for-in loop customized for this data.
dictionary.keys()
This method returns a sequence of only the keys in the dictionary.
dictionary.values()
This method returns a sequence of only the values in the dictionary.
dictionary.pop(key, default)
This is like a search and delete function a dictionary. It looks for the specified key. If the key is found, its associated value is returned, then the key and its value are removed from the dictionary. If it is not found, the default message is returned.
dictionary.popitem()
This one is weird. It returns an arbitrary key-value pair, and it deletes them from the dictionary. The deleted pair is returned as a tuple. This seems pointless until you consider the spotlight section in the chapter about simulating a deck of cards.

Suddenly it seems clear: cards drawn from the deck dictionary will be transferred to player dictionaries. You could use a method shown above to add them where they are supposed to go. The key to doing this properly is to use a bit of random selection. The random.shuffle() function may be the best choice.

The chapter continues with several sample programs that demonstrate the use of these methods.

Sets

The text changes topics to discuss sets. Sets have different properties compared to other sequences. The text lists a few:

  • a set is a collection of unique values; there must be no duplication of any value in the set
  • sets are unordered, so the order in which elements are stored is unimportant
  • elements in a set may be of different data types

Sets are created with the built-in set() function. If it is called with no arguments, the set that is created will be empty. You may also pass the set() function an iterable sequence, such as a string, a list, or a tuple. One rule to remember is that you can only pass one argument to the set() function, so whatever you pass, it has to be one thing. What happens next depends on exactly what you pass:

  • Pass a single string, and the set will include one copy of each unique character in that string. Duplicate characters would not be included in the set.
  • Pass a tuple or a list of strings, and the set will include a copy of each unique string that was in the tuple/list. This is a way to place words in the set. When you pass a sequence of objects to the set() function, remember to enclose the sequence in the proper containing markers, all inside the parentheses that the set() function expects. Example, passing a list of strings:
    set_of_strings = set( ['one', 'two', 'three'] )

The text describes a function and some methods that work with sets:

Functions and Set Methods
Description
len(set_name)
Returns the number of elements in the specified set.
set_name.clear()
Removes all entries from the set. Oddly, if you ask for a display of the elements of an empty set, the interpreter will put the set() function on the screen.
set_name.update(sequence)
This method passes the sequence to the specified set, and the elements in the sequence are added to the set unless they are already elements in it. The name of a set may be passed as an argument instead of a sequence.
set_name.add(element)
This method passes the element to the specified set as a new item. The element is added to the set unless it is already in the set.
set_name.discard(element)
This method removes the specified element from the set if it is found in it. The remove() method may also be used, but that method raises a KeyError exception if the element is not found. The discard() method does not raise such exceptions.
startswith(substring)
Returns True if the substring is found at the beginning of the string whose method you are calling.

Just as you can do with a dictionary, you can use the name of a set as the iterative in a for-in loop. The loop would process each item  in the set until it runs out of items.

You can check for the presence or the absence of an item in a set using the notation if x in set or if x not in set, respectively.

The text discusses some processes that are more particular to sets. Some will be familiar from math classes you may remember.

  • Union - The union of two sets is the result of combining one set with the other. This operation can be done with the union method like this:
    set3 = set1.union(set2)
    This can also be done using the pipe character as the union operator, like this:
    set3 = set1 | set2
    Remember that there can be no repeated items in a set, so any repeats from one set to the other would be discarded in the resulting union set.
  • Intersection - The intersection of two sets is the set of all elements that they have in common. As with unions, there are two notations to find intersections:
    set3 = set1.intersection(set2)  # the intersection() method
    set3 = set1 & set2  # the single ampersand as the intersection operator
  • Difference - The difference of two sets depends a lot on which one is mentioned first in the operation. The difference between set1 and set2 is defined as the elements that appear in set1 that do not appear in set2. As you should see, reversing the expression would probably result in a different answer. As usual, there are two notations, but this time the cloud edition of the book has the wrong notation for the second one:
    set3 = set1.difference(set2) # the difference() method
    set3 = set1 - set2 # the minus sign is the difference operator, just like in math class
  • Symmetric Difference - You may not have run into this one before. It is defined as the set of elements that appear on one set or the other, but not both. This set will be the same, regardless of the order of the operands.
    set3 = set1.symmetric_difference(set2) # the symmetric_difference() method
    set3 = set1 ^ set2 # the caret sign is the symmetric_difference operator
  • The text defines two more terms that are mostly descriptive. A subset is a set that is entirely found in another set. For example, the set of vowels is a subset of the alphabet. A superset is a set that includes every element of another set. In the same example, the alphabet is a superset of the set of vowels. You can test for these relationships between two sets with the issuperset() and issubset() methods. These methods do not return sets, they only return True or False. Examples:

    set1.issubset(set2)
    # tests set1 being a subset of set2
    set1 <= set2 # tests set1 being a subset of set2

    set1.issuperset(set2) # tests set1 being a superset of set2
    set1 >= set2 # tests set1 being a superset of set2

Serializing Objects

The last section of the chapter addresses saving data objects, like dictionaries and sets, to files. Surprisingly, you don't want to just save them as text. You want to save them in binary files with functions that are stored in a module named pickle. The process of preparing objects for storage is called pickling for some unstated whimsical reason.

The functions the text shows us how to use are pickle.dump() and pickle.load(). First, we see how to send output to a file:

  • Make sure you import pickle at the top of a program that will use its functions.
  • Use a command like the example in the text to open a file for binary writing:
    outputfile = open('mydata.dat', 'wb')
  • Write an object to the data file with the dump() function, which does a serial conversion of the object first:
    pickle.dump(object, outputfile)
  • Close the output file when you have finished writing to it:
    outputfile.close()

To read a pickled file and store the data in a new object:

  • inputfile = open('mydata.dat', 'rb')
  • object = pickle.load(file)
  • inputfile.close()

The chapter ends with examples of files that use pickle to store and retrieve data.


Assignments

Assignments for these chapters will be found in Blackboard. We will explore that in class.