HG2051 – Week 10

Learning Objectives

Concepts: pos tagging baseline systems backoff automatic evaluation statistical inference

(color key: Python/Programming NLP/CL Software Engineering)

Reading

Additional Reading

Cross-validation

K-fold cross validation is a common practice in NLP:

In K-fold cross-validation, the original sample is randomly partitioned into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for testing the model, and the remaining K − 1 subsamples are used as training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. The K results from the folds then can be averaged (or otherwise combined)

E.g., 5-fold: split into A, B, C, D, E
- Fold 1: Train on A, B, C, D — test on E
- Fold 2: Train on A, B, C, E — test on D
- Fold 3: Train on A, B, D, E — test on C
- Fold 4: Train on A, C, D, E — test on B
- Fold 5: Train on B, C, D, E — test on A
NLTK 5.6 – Transformation-based Tagging

Testing Your Knowledge

Questions

Q: What do you think makes a good baseline system?
Q: What is the purpose of having backoff taggers? Or, in what situations is a backoff tagger used?
Q: Why is an n-gram tagger conditioned on the previous tags and not on the previous words?

Practical Work

Train a bigram tagger with no backoff tagger, and run it on some of the training data. Next, run it on some new data. What happens to the performance of the tagger? Why?

TRAIN (input is tagged sentences):
bigram_tagger = nltk.BigramTagger(train_sents)
TAG (input is an untagged sentence):
bigram_tagger.tag(unseen_sent)
EVALUATE (input is tagged sentences):
bigram_tagger.evaluate(test_sents)

Now add a backoff tagger (your choice) and try it again (build then evaluate). What happens to the performance of the tagger this time and why?

backoff = ???

bigram_taggert2 = nltk.BigramTagger(train_sents, backoff=backoff_tagger)
Think about how we could deal with sparseness by defining an ‘unknown’ word. What would you have to do to the training and test data?