Lecture
File structure and command line operations
Also read this for a more in-depth guide to file structure and command line operations. And read this for a bit more history.
Review of Homework 1
Getting Started with Python (Lists): Slides from Object-Oriented Programming in Python (Goldwasser and Letscher)
Learning Objectives
- Data types:
int
float
str
list
set
- Concepts: assignment functions types-vs-tokens tokenization normalization frequency distributions unit tests
- Tools: notebooks NLTK
(color key: Python/Programming NLP/CL Software Engineering)
Additional Readings
The readings for this week come from the official Python tutorial. The topic is “Using Python as a Calculator”, but it is a good introduction to numbers, strings, and lists.
Additionally, please read the section on sets (only this section, not the rest of the chapter):
It helps to play with a Python interpreter while reading. Open up
Visual Studio Code’s terminal and start Python (e.g., run
python3
or py
at the command prompt), then try
out the examples for yourself.
Testing Your Knowledge
There are two methods not mentioned in the tutorial:
str.split()
– splits a string on whitespace and returns a list of substrings>>> "one two two".split() 'one', 'two', 'two'] [
list.count(x)
– return the number of times thatx
occurs in a sequence (e.g., a list or a string)>>> ['one', 'two', 'two'].count('one') 1 >>> ['one', 'two', 'two'].count('two') 2 >>> 'one two two'.count('o') 3
Given the following string:
= ('There are seven days, there are seven days, '
s 'there are seven days in a week. '
'Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday')
Try to answer the following questions:
- How many times does the word “day” occur in the string?
- How many times do the tokens “day”, “days”, and “days,” (note the
comma) occur in the list of tokens (use
split()
)? - How many tokens are there in total?
- Find the relative frequency of the token “are” (number of times it occurs over the count of all tokens)
- What is the set of unique words?
- What is the set of unique letters?