For the exercises in this chapter we need a list of English words. There are lots of word lists available on the Web, but the one most suitable for our purpose is one of the word lists collected and contributed to the public domain by Grady Ward as part of the Moby lexicon project (see http://wikipedia.org/wiki/Moby_Project). It is a list of 113,809 ofﬁcial crosswords; that is, words that are considered valid in crossword puzzles and other word games. In the Moby collection, the ﬁlename is 113809of.fic; you can download a copy, with the simpler name words.txt, from http://thinkpython.com/code/words.txt.
This ﬁle is in plain text, so you can open it with a text editor, but you can also read it from Python. The built-in function open takes the name of the ﬁle as a parameter and returns a ﬁle object you can use to read the ﬁle.
>>> fin = open('words.txt')>>> print fin<open file 'words.txt', mode 'r' at 0xb7f4b380>
fin is a common name for a ﬁle object used for input. Mode 'r' indicates that this ﬁle is open for reading (as opposed to 'w' for writing).
The ﬁle object provides several methods for reading, including readline, which reads characters from the ﬁle until it gets to a newline and returns the result as a string:
>>> fin.readline() 'aa\r\n'
The ﬁrst word in this particular list is “aa,” which is a kind of lava. The sequence \r\n represents two whitespace characters, a carriage return and a newline, that separate this word from the next.
The ﬁle object keeps track of where it is in the ﬁle, so if you call readline again, you get the next word:
>>> fin.readline() 'aah\r\n'
The next word is “aah,” which is a perfectly legitimate word, so stop looking at me like that. Or, if it’s the whitespace that’s bothering you, we can get rid of it with the string method strip:
>>> line = fin.readline()>>> word = line.strip()>>> print wordaahed
You can also use a ﬁle object as part of a for loop. This program reads words.txt and prints each word, one per line:
fin = open('words.txt') for line in fin: word = line.strip() print word
Exercise 9.1.Write a program that reads words.txt and prints only the words with more than 20 characters (not counting whitespace).