Lecture
Review of basic concepts: OS, command line interface, folder/directory structure, programs, scripts, Python, virtual environment, package manager
Review of Homework 1
Getting Started with Python (Lists): Slides from Object-Oriented Programming in Python (Goldwasser and Letscher)
Installing the Natural Language Toolkit (NLTK)
Learning Objectives
- Data types: int float str list set
- Concepts: assignment functions types-vs-tokens tokenization normalization frequency distributions unit tests
- Tools: notebooks NLTK
(color key: Python/Programming NLP/CL Software Engineering)
Additional Readings
The readings for this week come from the official Python tutorial. The topic is “Using Python as a Calculator”, but it is a good introduction to numbers, strings, and lists.
Additionally, please read the section on sets (only this section, not the rest of the chapter):
It helps to play with a Python interpreter while reading. Open up
Visual Studio Code’s terminal and start Python (e.g., run
python3
or py
at the command prompt), then try
out the examples for yourself.
Testing Your Knowledge
There are two methods not mentioned in the tutorial:
str.split()
– splits a string on whitespace and returns a list of substrings>>> "one two two".split() 'one', 'two', 'two'] [
list.count(x)
– return the number of times thatx
occurs in a sequence (e.g., a list or a string)>>> ['one', 'two', 'two'].count('one') 1 >>> ['one', 'two', 'two'].count('two') 2 >>> 'one two two'.count('o') 3
Given the following string:
= ('There are seven days, there are seven days, '
s 'there are seven days in a week. '
'Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday')
Try to answer the following questions:
- How many times does the word “day” occur in the string?
- How many times do the tokens “day”, “days”, and “days,” (note the
comma) occur in the list of tokens (use
split()
)? - How many tokens are there in total?
- Find the relative frequency of the token “are” (number of times it occurs over the count of all tokens)
- What is the set of unique words?
- What is the set of unique letters?
A basic introduction to Artificial Intelligence
The last few years have seen increasing interest in the field of “Artificial Intelligence” or “A.I.”, but many people do not really know what this is, ascribing some form of intelligence to automated processes. To get an overview of what current systems do, you can watch this brief explainer.
This course is currently oriented toward teaching the basics of programming in Python, but given that Python is one of the most popular coding languages for training machine learning models that underpin the recent advancements in AI, it is worth being aware of AI capabilities. This is particularly relevant for linguistics, since it is the manipulation of language by AI systems that has captured the attention and interest of the world.
Current systems like ChatGPT have been trained to allow them to respond with language that a human would plausibly use. But this use of language does not correspond particularly well to the ability to “reason” about the world. At the same time, the fact that language is strongly correlated (for many people) with intelligence, shows how important the study of language is to our understanding of human intelligence.
The basics you are learning here are important to understanding the logic of the computer and the tools and techniques you need to get a computer to perform tasks automatically. As you develop your knowledge of both programming and linguistics, you may also think of ways to contribute to the development of Artificial Intelligence (or Automated Inference) systems.