Lecture
Review of Homework 3
Learning Objectives
- Constants: True False None
- Data types: bool tuple dict
- Concepts: mutability side-effects exceptions advanced functions text corpora conditional frequency distributions
- Tools: NLTK
(color key: Python/Programming NLP/CL Software Engineering)
Reading
Constants
Python has several special constant values (“constant” meaning they
have predefined, unchangeable values). For present purposes, we only
care about True
, False
, and None
.
The Dive Into
Python book has a good and concise description of these:
Another thing to add about None
is that it is often used
as a placeholder for optional arguments and it is the return value of a
function with no return
statement or an empty
return
statement. For example:
>>> def func(x=None):
print('this function prints', x, 'but returns None')
...
...>>> x = func()
None but returns None
this function prints >>> print(x)
None
>>> x = func(5)
5 but returns None
this function prints >>> print(x)
None
Data Types
The Python Tutorial has some good entries on tuple
and
dict
:
In addition, bool
is the type of the constants
True
and False
. In practice the explicit use
of the bool()
function is rarely necessary as it is
implicit in an if
statement, but it can be useful in
interactive sessions for determining the boolean value of objects:
>>> bool() # bool of nothing is False
False
>>> bool(True) # these are almost tautological...
True
>>> bool(False)
False
>>> bool(0) # 0 is the only False-valued integer
False
>>> bool(-1) # all other integers are True
True
>>> bool(99999)
True
>>> bool('') # the empty string is the only False string
False
>>> bool('foo') # all others are True
True
>>> bool('False') # even deceptive ones
True
>>> bool([]) # empty containers (list, tuple, set, dict, etc.) are False
False
>>> bool([1, 2, 3]) # all others are True
True
>>> bool([[]]) # even if their contents would be False
True
Functions
For further topics on functions, see the Python Tutorial’s section on default and keyword arguments:
Text Corpora
The NLTK provides interfaces to a variety of common and freely-available corpora. Read chapter 2 section 1 of the NLTK book to get an overview. You don’t need to follow all the code examples, but try to be able to answer questions like these:
- What is a text corpus?
- What kinds of corpora does the NLTK provide?
- What are things you can do with a text corpus?
- What is the difference between an annotated and unannotated corpus?
- How might some text corpus structures be more or less appropriate for certain NLP tasks?
You don’t have to read all the sections in 2.1. Focus on these for now:
Conditional Frequency Distribution
Earlier we discussed frequency distribution and used the NLTK’s
FreqDist
class. Now we will introduce conditional
frequency distributions. Please read the following:
Testing Your Knowledge
Dictionaries
Get a feel for Python’s dict
type by creating and
inspecting some dictionaries. Use help(dict)
in Python to
browse the available methods (ignore the ones that start with
__
for now). Try to read a list of words and create a
dictionary mapping each letter to the set of words starting with the
letter. For example:
>>> def letter_lookup(words):
# your code here
... >>> d = letter_lookup('python programming provides endless possibilities'.split())
>>> d['p']
'python', 'provides', 'programming', 'possibilities'}
{>>> d['e']
'endless'} {
Text Corpora and Conditional Frequencies
- Use the NLTK’s Brown corpus and
ConditionalFreqDist
class to find what are the most frequent words for each genre of the Brown corpus. - Try again but first filter out stopwords.