Bag-of-words: word counts, regardless of word order
Language models: machine learning algorithms trained on
language-specific datasets
Word counting
Methods
N-gram frequency (unigrams, bigrams, trigrams)
KWIC: Keywords in context/concordance
Collocations: words that appear in close proximity
Topic modeling: sets of words that appear together across multiple
documents
TF-IDF: term-frequency inverse-document-frequency
Examples
Fan Engagement Meter -
Peter Decherney, James Fiumara, Scott Enderle (Price Lab) - quantifying
text reuse in fan fiction archives from popular film franchises
On The Books - UNC -
identifying Jim Crow laws that have not been repealed