Final Review
Final Topics
- Python/Programming
re
module (re.match()
,re.search()
,re.fullmatch()
,re.sub()
)- Built-in Functions (
map()
,filter()
)
- NLP/CL
- Stemming and Lemmatization
- Segmentation and Tokenization
- N-grams and Collocations
- Part-of-Speech Tags and Tagging Methods
- Statistical Inference
- Automatic Evaluation
- Backoff Methods
- Baseline Systems
- Machine Learning
- Supervised vs Unsupervised Learning
- Linguistic Features for Machine Learning
- Classification
- Decision Trees
- Entropy
- Ethics
- Software Engineering
- Regular Expressions
- Higher-order Functions
- Recursive Functions
Learning Objectives
Python/Programming
This category is for programming concepts, techniques, and structures as used in Python.
The re
Module
re.search()
re.match()
re.fullmatch()
re.sub()
Built-in Functions
map()
filter()
NLP/CL
This category is for concepts and techniques related to natural language processing or computational linguistics.
String Manipulation
Stemming and Lemmatization
- What is the difference?
- When would you use one or the other?
Segmentation and Tokenization
- What is the difference (i.e., when is segmentation not tokenization)
N-grams
N-grams
- What is an n-gram?
- Why do we pad sequences when computing n-grams?
- How do you get n-grams of a sequence (using NLTK or just Python)?
Collocations
- What are collocations?
- What are they good for?
N-gram Language Models
- What can you do with an n-gram language model?
- What is the formula for a bigram model? (at least, in general terms)
Part-of-Speech Tags
Part-of-Speech Tags
- What are tag sets? Why are there different ones?
- Do all languages use the same tag set?
Tagging Methods
Machine Learning and Classification
Statistical Inference
- What are the basic requirements for statistical inference?
- Create a basic model for predicting, e.g., gender, part-of-speech, next words, etc.
Automatic Evaluation
- Why do we use automatic evaluation?
- How would you evaluate, e.g., a part-of-speech tagger?
Backoff Methods
- What makes a good backoff method?
- Why do we use backoff methods?
Baseline Systems
- What makes a good baseline system?
- Why do we use baseline systems?
Machine Learning
Supervised vs Unsupervised Learning
- What is necessary for supervised classification?
Linguistic Features for Machine Learning
- Give examples of linguistic features for tasks like POS-tagging, genre classification, etc.
Classification
- What is the classification task?
- What are some examples?
Decision Trees
- In general terms, how is a decision tree created?
- How is it applied to classify some new instance?
Entropy
- What is the formula for entropy?
- What does it measure?
Ethics
- Can you describe and give examples of the following kinds of
problems?
- exclusion
- overgeneralization
- overexposure / underexposure
- dual-use
Software Engineering
This category is for practices of software engineering as well as programming concepts that are relevant to programming languages beyond Python.