HG2051 – Language and the Computer (AY22-23)

Hiram Ring <hiram.ring@ntu.edu.sg>

Thursdays, 12:30-15:20

TR+65 (SS1-B1-03, South Spine)

Traditionally linguistic analysis was done largely by hand, but computer-based methods and tools are becoming increasingly widely used in contemporary research. This course provides an introduction to skills and resources that can assist the linguist in performing fast, flexible, and accurate quantitative analyses. Students will learn a programming language (Python) along with techniques for processing human language data. No previous programming experience is required: we will teach you the basics of programming and computational linguistics along with some good software engineering practices.

Schedule

Week Date Topic Notes
1 12 Jan What is Computational Linguistics? Why do it? Why use Python?
2 19 Jan Basic Types and Data Structures; Using Python to Count Things; Lists
3 26 Jan Assignment, Expressions, and Control; Strings
4 02 Feb Text Corpora and Conditional Frequencies notebook
5 09 Feb Lexical Resources and WordNet notebook
6 16 Feb Processing Raw Text notebook
7 23 Feb Mid-review; Working with Software Projects Midterm Quiz
02 Mar Recess
8 09 Mar Regular Expressions and Algorithmic Thinking
9 16 Mar N-Grams and Collocations
10 23 Mar Part-of-speech Tagging
11 30 Mar Classification Project 1 due
12 06 Apr Ethics, Language Models, and Software Libraries
13 13 Apr Review and Final Quiz Final Quiz (in-class)
14 21 Apr No class Project 2 due

Course Pages

Grading Criteria

This course is graded with continuous assessment as follows:

You may also get 1–5% extra credit (not exceeding 100% in the course) by submitting a contribution (e.g., code or documentation) to an open-source project. Contact me for details.

Resources

Acknowledgments

The majority of the content for this course has been borrowed (with permission) from Michael Wayne Goodman and Francis Bond, who taught previous years: