Traditionally linguistic analysis was done largely by hand, but computer-based methods and tools are becoming increasingly widely used in contemporary research. This course provides an introduction to skills and resources to assist the linguist in performing fast, flexible, and accurate quantitative analyses. Students will learn a programming language (Python) along with techniques for processing human language data. No previous programming experience is required: you will learn the basics of programming and computational linguistics along with some good software engineering practices.
Schedule
Week | Date | Topic | Notes |
1 | 13 Jan | What is Computational Linguistics? Why do
it? Why use Python? Computer Science basics |
Setup, VS_Code |
2 | 20 Jan | Basic Types and Data Structures; Using
Python to Count Things; Lists What is AI? |
PyT 3.1; NLTK 1; What is AI? |
3 | 27 Jan | Assignment, Expressions, and Control; Strings | HBL week |
4 | 3 Feb | Text Corpora and Conditional Frequencies | |
5 | 10 Feb | Lexical Resources and WordNet | |
6 | 17 Feb | Processing Raw Text | |
7 | 24 Feb | Mid-review; Working with Software Projects | |
– | 3 Mar | Recess | |
8 | 10 Mar | Algorithmic Thinking and Regular Expressions | |
9 | 17 Mar | N-Grams and Collocations | |
10 | 24 Mar | Part-of-speech Tagging | |
– | 31 Mar | Hari Raya | |
11 | 7 Apr | Classification | |
12 | 14 Apr | Ethics, Language Models, and Software Libraries | |
13 | 21 Apr | Review and Final Quiz |
Course Pages
- Environment Setup – instructions for setting up your computer for HG2051
- Using Visual Studio Code – how to get, complete, and submit assignments
- Glossary – definitions of some technical terms
Grading Criteria
This course is graded with continuous assessment as follows:
- Homework (autograded)
- Project 1 – 30%
- Project 2 – 30%
- Mid-term Quiz – 15%
- Final Quiz – 15%
- Participation – 10%
You may also get 1–5% extra credit (not exceeding 100% in the course) by submitting a contribution (e.g., code or documentation) to an open-source project. Contact me for details.
Resources
- Python – https://www.python.org/
- The Python Tutorial (official docs) – https://docs.python.org/3/tutorial/
- The Python Standard Library (official docs) – https://docs.python.org/3/library/index.html
- Learn Python in 10 Minutes (quick guide) – https://www.stavros.io/tutorials/python/
- Learn Python the Hard Way (beginner’s guide) – https://learnpythonthehardway.org/book/
- Dive Into Python 3 (free ebook) – https://diveintopython3.problemsolving.io/
- Git – https://git-scm.com/
- Official documentation (manuals, cheat sheets, videos) – https://git-scm.com/doc
- GitHub – https://github.com/
- GitHub Guides – https://guides.github.com/
- Visual Studio Code – https://code.visualstudio.com/
- Documentation – https://code.visualstudio.com/docs
- Using Python in VS Code – https://code.visualstudio.com/docs/python/python-tutorial
- Working with Jupyter Notebooks in Visual Studio Code – https://code.visualstudio.com/docs/python/jupyter-support
- NLTK – http://www.nltk.org/
- Natural Language Processing with Python (free ebook) – http://www.nltk.org/book/
- StackOverflow (popular programming Q&A site)– https://stackoverflow.com/
Acknowledgments
The majority of the content for this course has been borrowed (with permission) from Michael Wayne Goodman and Francis Bond, who taught previous years. Below are some archives of the previous courses: