HG2051 – Week 5

Learning Objectives

(color key: Python/Programming NLP/CL Software Engineering)

Peter Norvig’s Spelling Corrector – demonstrates something cool you can do with just a little bit of Python
ARPABET – the phonemic transcription system used by the CMU Pronouncing Dictionary
WordNet
- Documentation of relations: https://globalwordnet.github.io/gwadoc
- Browse the modern English WordNet: https://en-word.net/
- The Open Multilingual Wordnet: http://compling.hss.ntu.edu.sg/omw/

Use the nltk.corpus.words wordlist to estimate the following for several text corpora:
- what percentage of the text’s vocabularly are not in the wordlist?
- what percentage of the wordlist are present in the text?
Use the ARPABET transcriptions in the nltk.corpus.cmudict corpus to devise a function for identifying rhyming words (how they are identified is up to you). What are the largest clusters of rhyming words?
Use nltk.corpus.wordnet to explore word relations.
- What are the hyponyms of student?
- Use the lowest_common_hypernyms() method on synsets to find what is the shared hypernym of student and professor. How about professor and lecturer?