|
|
CS 1110 - Introduction to Programming
Starting Out with Python
Chapter 9, Dictionaries and Sets
Objectives:
This lesson covers chapter 9, which discusses more ways to
organize data. Objectives important to this lesson:
- Dictionaries
- Sets
- Serializing objects
Concepts:
Chapter 9
Dictionaries
The chapter begins with a definition that is not very
enlightening. We are told that a dictionary is a collection of data,
typically pairs of data, in what we might imagine as a table. The two
parts of each entry in the table are the key and the value. The key is a unique identifier for its value. A
dictionary cannot have duplicate keys, but separate keys may have the
same value. The text gives us an example of employee IDs (keys) that
each match only one value (the employee's name, in this case).
The notation for a dictionary begins with a name for it. If we stay with the
first example in the text, this dictionary is called phonebook. We begin by stating
something like this:
phonebook =
{'Doug':'555-3684', 'Morris':'555-6677','Julie':'555-5854'}
In this example, the name (phonebook) is assigned a series of key-value pairs that are enclosed by
a set of curly braces. Each
key-value pair is two strings,
separated by a colon. Pairs
(elements) are separated by commas.
In each key-value pair, the key comes first. When you construct a
dictionary, you are allowed to use any
data object type as a value,
but keys must be immutable types, such as numbers,
strings, and tuples.
Elements in a dictionary can be entered in any order. As the text shows us,
entering the name of the dictionary on a command line causes the
dictionary to be displayed, sorted by the keys.
phonebook
{'Doug':'555-3684','Julie':'555-5854', 'Morris':'555-6677'}
Keys are used to
retrieve dictionary elements, not
indices, because of this sorting behavior. To retrieve Doug's
value from this dictionary, the reference would be phonebook['Doug']. Note that this
reference retrieves only the value
associated with the supplied key, not the whole element. The text warns
us that trying to retrieve a value for a key that does not exist in the dictionary will
case a KeyError exception.
Keys are case sensitive, so it is important to reference them with
whatever capitalization was used when they were stored.
Since we can cause exceptions by attempting to access values
for keys that are not in a dictionary, and we want to avoid exceptions,
we can check for the existence of a key in a dictionary with the in operator. Example:
if 'Doug' in phonebook:
print(phonebook['Doug'])
The text shows us that we can use the not in operator if we expect that
the key we are referencing does not exist in the dictionary. As you
might imagine, we can check for a key being in a dictionary first, and
if it is found, we can execute an instruction that uses that key safely.
The text shows us that we can update
the value of an existing key or add a new key-value element to the
dictionary with the same notation:
phonebook['Steve'] =
'555-7738' # this code would add
an element to the dictionary
phonebook['Doug'] = '555-8635'
# this code would update
Doug's phone number value because that key already exists
We can delete an element from a dictionary by using the del
command:
del phonebook['Steve']
# this would delete the 'Steve' element (the key and its value) from
the dictionary
Checking for the existence
of a key first with the in operator,
before using the del command, would avoid a KeyError exception that
would be generated if the key does not exist in the named dictionary.
The len() function will return the number of elements in a
dictionary, if you provide the name of a dictionary as the argument to
len(). Example:
length_phonebook = len(phonebook)
In the examples so far, we have only used strings as key and
values. The text offers an example dictionary that has strings as keys,
and lists as values:
test_scores = {
'Kayla' : [ 88, 92, 100], 'Luis' : [95, 74, 81], 'Sophie' : [72, 88,
91], 'Ethan' : [70, 75, 78] }
The dictionary test_scores
has four elements, each of
which has a list of three numbers
as its value. Even though the
keys and values are of different types, this dictionary still has
symmetry. The text makes itself confusing by giving us an example in
which:
- the first element has a string for a key, and a number for
its value { 'abc' : 1 }
- the second element has a number for its key, and a string
for its value { 999 : 'yada yada'}
- the third element has a tuple for a key, and a list for a
value { (3, 6, 9) : [3, 6, 9] }
Note that each of the example elements has continued to use an
immutable form for its key. The next example in this section, as the
book states, is more practical. It gives us a dictionary that pairs
three labels about an employee with three values for that employee. It
is more like a database record, except that it holds the field names
along with the data elements.
The text shows us that the phonebook dictionary could have
been created as an empty dictionary structure by putting nothing
between a pair of curly braces.
phonebook = {}
The virtue of doing so is that several lines of code can be
entered to populate this dictionary, one by one, without knowing the
entire sequence when we start. The text mentions another method to
create an empty dictionary which is a bit longer.
phonebook = dict()
This method uses the dict() function, which would
create a dictionary with the desired name, and the dictionary would be
empty due to our not passing an argument it. This link leads to a page
that has more
information about dictionaries.
More practical information is in the example of using a for-in
loop with a dictionary. The name of a dictionary
can be used as an iterative for the loop.
for var1 in phonebook:
print(var1)
This will cause the variable in the control statement (var1)
to be assigned the name of the key in each dictionary entry,
printing out its value for each loop. The second example in
this section accesses the value associated with the key in each
iteration of the loop by reading it from the dictionary.
For var1 in phonebook:
print(var1, phonebook[var1])
This notation is useful in that we can learn the names of the
keys in the dictionary and learn their values, just from knowing the
name of the dictionary itself. The chapter continues with several short
discussions of some dictionary methods
Dictionary Method
|
Description |
dictionary.clear()
|
Removes all entries from the dictionary |
dictionary.get(key,
default)
|
This method returns the value associated with a
specified key, but it returns the specified default value if the key is
not found. This makes it more exception resistant than just accessing
the value by index notation. |
dictionary.items()
|
This one returns a sequence of tuples, each tuple
holding one of the elements in the dictionary. The text presents a
for-in loop customized for this data. |
dictionary.keys()
|
This method returns a sequence of only the keys in the
dictionary. |
dictionary.values()
|
This method returns a sequence of only the values in
the dictionary. |
dictionary.pop(key,
default)
|
This is like a search and delete function a
dictionary. It looks for the specified key. If the key is found,
its associated value is returned, then the key and its value
are removed from the dictionary. If it is not found,
the default message is returned. |
dictionary.popitem()
|
This
one is weird. It returns an arbitrary key-value pair, and it deletes
them from the dictionary. The deleted
pair is returned as a tuple.
This seems pointless until you consider the spotlight section in the
chapter about simulating a deck of cards.
Suddenly it seems clear: cards
drawn from the deck dictionary
will be transferred to player dictionaries.
You could use a method shown above to add them where they are supposed
to go. The key to doing this properly is to use a bit of random selection.
The random.shuffle() function may be the best choice.
|
The chapter continues with several sample programs that
demonstrate the use of these methods.
Sets
The text changes topics to discuss sets. Sets have
different properties compared to other sequences. The text lists a few:
- a set is a collection of unique values; there must
be no duplication of any value in the set
- sets are unordered, so the order in which elements
are stored is unimportant
- elements in a set may be of different data
types
Sets are created with the built-in set() function. If it is
called with no arguments, the set that is created will be empty.
You may also pass the set() function an iterable sequence, such as a string,
a list, or a tuple. One rule to remember is that you
can only pass one argument to the set() function, so whatever
you pass, it has to be one thing. What happens next depends on exactly
what you pass:
- Pass a single string, and the set will include one copy of
each unique character in that string. Duplicate characters would not be
included in the set.
- Pass a tuple or a list of strings, and the set will include
a copy of each unique string that was in the tuple/list. This is a way
to place words in the set. When you pass a sequence of objects to the
set() function, remember to enclose the sequence in the proper
containing markers, all inside the parentheses that the set() function
expects. Example, passing a list of strings:
set_of_strings = set( ['one', 'two',
'three'] )
The text describes a function and some methods that work with
sets:
Functions and Set Methods
|
Description |
len(set_name)
|
Returns
the number of elements in the specified set. |
set_name.clear()
|
Removes
all entries from the set. Oddly, if you ask for a display of the
elements of an empty set, the interpreter will put the set() function
on the screen.
|
set_name.update(sequence)
|
This
method passes the sequence to the specified set, and the elements in
the sequence are added to the set unless they are already elements in
it. The name of a set may be passed as an argument instead of a
sequence. |
set_name.add(element)
|
This
method passes the element to the specified set as a new item. The
element is added to the set unless it is already in
the set.
|
set_name.discard(element)
|
This method removes
the specified element from the set if it is found in it. The remove()
method may also be used, but that method raises a KeyError exception if
the element is not found. The discard() method does not raise such
exceptions. |
startswith(substring)
|
Returns True
if the substring is found at the beginning of the string whose method
you are calling. |
Just as you can do with a dictionary, you
can use the name of a set as the iterative in a for-in loop. The loop
would process each item in the set until it runs out of items.
You can check for the presence or the absence of an item in a set using the notation if x in set or if x not in set, respectively.
The text discusses some processes that are more particular to sets. Some will be familiar from math classes you may remember.
- Union - The union of two sets is the result of combining one set with the other. This operation can be done with the union method like this:
set3 = set1.union(set2)
This can also be done using the pipe character as the union operator, like this:
set3 = set1 | set2
Remember that there can be no repeated items in a set, so any repeats
from one set to the other would be discarded in the resulting union set.
- Intersection - The intersection of two sets is the set of all elements that they have in common. As with unions, there are two notations to find intersections:
set3 = set1.intersection(set2) # the intersection() method
set3 = set1 & set2 # the single ampersand as the intersection operator
- Difference - The difference of two sets depends a lot on which one is mentioned first in the operation. The difference between set1 and set2 is defined as the elements that appear in set1 that do not appear in set2.
As you should see, reversing the expression would probably result in a
different answer. As usual, there are two notations, but this time the
cloud edition of the book has the wrong notation for the second one:
set3 = set1.difference(set2) # the difference() method
set3 = set1 - set2 # the minus sign is the difference operator, just like in math class
- Symmetric Difference - You may not have run into this one before. It is defined as the set of elements that appear on one set or the other, but not both. This set will be the same, regardless of the order of the operands.
set3 = set1.symmetric_difference(set2) # the symmetric_difference() method
set3 = set1 ^ set2 # the caret sign is the symmetric_difference operator
- The text defines two more terms that are mostly
descriptive. A subset is a set that is entirely found in another set.
For example, the set of vowels is a subset of the alphabet. A superset is a set that includes every element of another set. In the same example, the alphabet is a superset of the set of vowels. You can test for these relationships between two sets with the issuperset() and issubset() methods. These methods do not return sets, they only return True or False. Examples:
set1.issubset(set2) # tests set1 being a subset of set2
set1 <= set2 # tests set1 being a subset of set2
set1.issuperset(set2) # tests set1 being a superset of set2
set1 >= set2 # tests set1 being a superset of set2
Serializing Objects
The last section of the chapter addresses saving data objects,
like dictionaries and sets, to files. Surprisingly, you don't want to
just save them as text. You want to save them in binary files with
functions that are stored in a module named pickle. The process of preparing objects for storage is called pickling for some unstated whimsical reason.
The functions the text shows us how to use are pickle.dump() and pickle.load(). First, we see how to send output to a file:
- Make sure you import pickle at the top of a program that will use its functions.
- Use a command like the example in the text to open a file for binary writing:
outputfile = open('mydata.dat', 'wb')
- Write an object to the data file with the dump() function, which does a serial conversion of the object first:
pickle.dump(object, outputfile)
- Close the output file when you have finished writing to it:
outputfile.close()
To read a pickled file and store the data in a new object:
- inputfile = open('mydata.dat', 'rb')
- object = pickle.load(file)
- inputfile.close()
The chapter ends with examples of files that use pickle to store and retrieve data.
|