Lecture
Review of basic concepts: OS, command line interface, folder/directory structure, programs, scripts, Python, virtual environment, package manager
Review of Homework 1
Getting Started with Python (Lists): Slides from Object-Oriented Programming in Python (Goldwasser and Letscher)
Installing the Natural Language Toolkit (NLTK)
Learning Objectives
- Data types: int float str list set
- Concepts: assignment functions types-vs-tokens tokenization normalization frequency distributions unit tests
- Tools: notebooks NLTK
(color key: Python/Programming NLP/CL Software Engineering)
Additional Readings
The readings for this week come from the official Python tutorial. The topic is “Using Python as a Calculator”, but it is a good introduction to numbers, strings, and lists.
Additionally, please read the section on sets (only this section, not the rest of the chapter):
It helps to play with a Python interpreter while reading. Open up
Visual Studio Code’s terminal and start Python (e.g., run
python3
or py
at the command prompt), then try
out the examples for yourself.
Testing Your Knowledge
There are two methods not mentioned in the tutorial:
str.split()
– splits a string on whitespace and returns a list of substrings>>> "one two two".split() 'one', 'two', 'two'] [
list.count(x)
– return the number of times thatx
occurs in a sequence (e.g., a list or a string)>>> ['one', 'two', 'two'].count('one') 1 >>> ['one', 'two', 'two'].count('two') 2 >>> 'one two two'.count('o') 3
Given the following string:
= ('There are seven days, there are seven days, '
s 'there are seven days in a week. '
'Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday')
Try to answer the following questions:
- How many times does the word “day” occur in the string?
- How many times do the tokens “day”, “days”, and “days,” (note the
comma) occur in the list of tokens (use
split()
)? - How many tokens are there in total?
- Find the relative frequency of the token “are” (number of times it occurs over the count of all tokens)
- What is the set of unique words?
- What is the set of unique letters?