HG2051 – Language and the Computer

Hiram Ring

Traditionally linguistic analysis was done largely by hand, but computer-based methods and tools are becoming increasingly widely used in contemporary research. This course provides an introduction to skills and resources to assist the linguist in performing fast, flexible, and accurate quantitative analyses. Students will learn a programming language (Python) along with techniques for processing human language data. No previous programming experience is required: you will learn the basics of programming and computational linguistics along with some good software engineering practices.

Schedule

Week Date Topic Notes
1 What is Computational Linguistics? Why do it? Why use Python?
Computer Science basics
2 Basic Types and Data Structures; Using Python to Count Things; Lists
What is AI?
3 Assignment, Expressions, and Control; Strings
4 Text Corpora and Conditional Frequencies [Student Union Day]
5 Lexical Resources and WordNet
6 Processing Raw Text
7 Mid-review; Working with Software Projects
3 Oct Recess
8 Algorithmic Thinking and Regular Expressions
9 17 Oct N-Grams and Collocations
10 Part-of-speech Tagging [HBL week]
11 Classification
12 Ethics, Language Models, and Software Libraries
13 14 Nov Review and Final Quiz Coding challenge (Quiz 2)

Course Pages

Grading Criteria

This course is graded with continuous assessment as follows:

You may also get 1–5% extra credit (not exceeding 100% in the course) by submitting a contribution (e.g., code or documentation) to an open-source project. Contact me for details.

Resources

Acknowledgments

Much of the content for this course has been borrowed (with permission) from Michael Wayne Goodman and Francis Bond, who taught previous years. Below are some archives of the previous courses: