HG2051 – Week 12

Reading

Ethics in NLP

Computational linguistics and natural language processing have, in the recent decade or two, been quickly propelled from somewhat niche academic interest to a huge industry with many consumer-facing products (e.g., Google Translate, Siri, autocomplete, etc.) and other applied systems (stock market prediction, risk assessment, hate speech detection, etc.). Unfortunately, the explosion of popularity did not coincide with increased understanding of the ethical questions particular to the field. And as the recent release of ChatGPT has shown, this has not changed in the intervening years.

Please read the following short papers as a broad overview of the problems:

Dirk Hovy and Shannon L. Spruit. The Social Impact of Natural Language Processing. In the Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016.
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. FAccT ’21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Pages 610–623. 2021. https://doi.org/10.1145/3442188.3445922

Questions

Why, in the authors’ view, has there been few discussions about ethics in NLP, and what situation has changed to make such discussions more urgent?
Give examples of ethical problems for the following sources:
- exclusion
- overgeneralization
- overexposure / underexposure
- dual-use

Additional Reading

There are many links to papers and course websites in the ACL wiki on Ethics in NLP

Language Models

A “language model” is a model that gives a probability for a sequence of words and/or predicts missing (e.g., next) words in a sequence. They are used for judging the fluency of machine translation outputs and for text generation. The traditional method of creating language models uses n-grams. For this, read the following sections from Jurafsky and Martin’s Speech and Natural Language Processing:

JM 3 – N-gram Language Models Just the introduction and 3.1 – N-grams.

In recent years, “neural language models”, in particular transformer-based models like BERT and GPT-3, have completely altered the direction of academic research and industry applications of NLP. The quality of their generated text is so good that it is often indistinguishable from human-produced text, at least when the prompts or outputs are carefully selected. Advocates of these models get mesmerized by the “magic” of their uninterpretable performance and may claim that they “understand” or “comprehend” text. Critics point out that they have only memorized and resynthesized linguistic form without any other signals (e.g., vision, sound, joint attention and social context, etc.) and cannot possibly have any true understanding of the world. The Bender and Koller (2020) paper (see “additional reading”, below) does an in-depth look at the claims of machine comprehension (and don’t miss the humorous example outputs of GPT-2 in the appendix).

Additional Reading

Wikipedia article on Language Model (a bit dry, but it’s a decent survey)
Emily M. Bender and Alexander Koller. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.

Software

The following are some additional libraries for Python that you may find useful for NLP:

General-purpose, pretrained models:
- https://spacy.io/
- https://stanfordnlp.github.io/stanza/
Useful for data science or for designing machine-learning experiments:
If you are taking/plan to take a course in HPSG:
- https://github.com/delph-in/pydelphin
Transformer models and tools available for experimentation:
- https://huggingface.co

There are tons more. E.g., here’s an “Awesome” list of NLP software for Python (and other languages): https://github.com/keon/awesome-nlp#user-content-python