Tokenization, Normalization, and Segmentation

utterance                disfluency                filled pause
lemma                    wordform                  word type
word token               dialects                  code switching
tokenization             word segmentation         case folding
lemmatization            stemming                  sentence segmentation

Evaluation

macroaveraging                multinomial
microaveraging                extrinsic evaluation
F1 measure                    intrinsic evaluation
precision                     training set
recall                        development set
F-measure                     test set
gold labels                   perplexity
contingency table             null hypothesis
multi-label                   bootstrap test

N-Grams and Smoothing

language model                          sparsity          
n-gram                                  zeros  
bigram                                  closed/open vocabulary
trigram                                 OOV word
chain rule                              Laplace smoothing
Markov assumption                       backoff
maximum likelihood estimation           discounting
normalize                               interpolation
relative frequency                      

Vector Semantics

vector semantics                   tf-idf algorithm
embeddings                         term frequency
term-document matrix               document frequency
vector space model                 idf
row vector                         co-occurrence
word-word matrix                   debiasing
cosine similarity

Word Sense Disambiguation

word sense                       lexical sample task
zeugma                           all-words task
WordNet                          semantic concordance
gloss                            most frequent sense
synset                           one sense per discourse
supersense                       word sense disambiguation

Part of Speech Tagging

part of speech         degree                  wh-pronoun
closed class           manner                  auxiliary verb
open class             temporal adverb         copula
function word          preposition             modal verb
noun                   particle                interjection
proper noun            phrasal verb            POS tagging
common noun            determiner              ambiguous / disambiguation
count noun             article                 accuracy
mass noun              conjunction             sequence model
verb                   complementizer          Markov chain
adjective              pronoun                 Markov assumption
adverb                 personal pronoun        Hidden Markov Model
locative               possessive pronoun      decoding
Viterbi algorithm      beam search             unknown words

Text Classification

text categorization            naive Bayes assumption
sentiment analysis             linear classifier
language id                    unknown words
authorship attribution         stop words
generative classifier          binary naive Bayes
discriminative classifier      sentiment lexicon
multinomial naive Bayes        hyperpartisan news
bag-of-words                   clickbait
prior probability              fake news
likelihood

NLP Exam 1 Topic List

Tokenization, Normalization, and Segmentation

Evaluation

N-Grams and Smoothing

Vector Semantics

Word Sense Disambiguation

Part of Speech Tagging

Text Classification