You are here

Dictionaries and files

24 February, 2015 - 09:17

One of the common uses of a dictionary is to count the occurrence of words in a file with some written text. Let’s start with a very simple file of words taken from the text of Romeo and Juliet thanks to http://shakespeare.mit.edu/Tragedy/romeoandjuliet/romeo_juliet.2.2.html.

For the first set of examples, we will use a shortened and simplified version of the text with no punctuation. Later we will work with the text of the scene with punctuation included.

But soft what light through yonder window breaksIt is the east and Juliet is the sunArise fair sun and kill the envious moonWho is already sick and pale with grief

We will write a Python program to read through the lines of the file, break each line into a list of words, and then loop through each of the words in the line, and count each word using a dictionary.

You will see that we have two for loops. The outer loop is reading the lines of the file and the inner loop is iterating through each of the words on that particular line. This is an example of a pattern called nested loops because one of the loops is the outer loop and the other loop is the inner loop.

Because the inner loop executes all of its iterations each time the outer loop makes a single iteration, we think of the inner loop as iterating “more quickly” and the outer loop as iterating more slowly.

The combination of the two nested loops ensures that we will count every word on every line of the input file.

fname = raw_input('Enter the file name: ')try:    fhand = open(fname)except:    print 'File cannot be opened:', fname    exit()

counts = dict()for line in fhand:    words = line.split()    for word in words:        if word not in counts:            counts[word] = 1        else:            counts[word] += 1

print counts

When we run the program, we see a raw dump of all of the counts in unsorted hash order. (the romeo.txt file is available at www.py4inf.com/code/romeo.txt)

python count1.pyEnter the file name: romeo.txt{'and': 3, 'envious': 1, 'already': 1, 'fair': 1,'is': 3, 'through': 1, 'pale': 1, 'yonder': 1,'what': 1, 'sun': 2, 'Who': 1, 'But': 1, 'moon': 1,'window': 1, 'sick': 1, 'east': 1, 'breaks': 1,'grief': 1, 'with': 1, 'light': 1, 'It': 1, 'Arise': 1,'kill': 1, 'the': 3, 'soft': 1, 'Juliet': 1}

It is a bit inconvenient to look through the dictionary to find the most common words and their counts, so we need to add some more Python code to get us the output that will be more helpful.