Week 11

Learning Objectives

(color key: Python/Programming NLP/CL Software Engineering)

Reading

This week covers an introduction to machine learning. Last week we looked at part-of-speech taggers that were trained by looking at examples of gold-standard pos-tagged words, and these were a kind of very basic supervised classification based on statistical inference. Machine learning also uses statistical inference, but often it builds multi-dimensional models using a variety of features of data instances instead of just counting them. For example, instead of just counting to find the most frequent part-of-speech tag for some word, we could consider additional features such as whether the word is capitalized, whether it appears after the word “to”, or the domain of the sentence (e.g., “news”, if it is known). Using all of these features as conditions would make a conditional frequency distribution too sparse to be useful (the features/conditions are too discriminating to get a model that generalizes to unseen data), so instead these features are used in a way that does not overfit to the training data.

Additional Reading

Testing Your Knowledge

Questions

Practical Work