Text Analysis

Alice McGrath

March 6, 2025

Agenda

  1. Final project proposal questions
  2. Text analysis examples
  3. Cultural analytics tutorial

Project proposal questions

Project Proposal instructions

Text Analysis Methods

A google N-gram showing War and Peace From the Google Books NGram viewer

Natural Language Processing

  • Bag-of-words: word counts, regardless of word order
  • Language models: machine learning algorithms trained on language-specific datasets

Word counting

Methods

  • N-gram frequency (unigrams, bigrams, trigrams)
  • KWIC: Keywords in context/concordance
  • Collocations: words that appear in close proximity
  • Topic modeling: sets of words that appear together across multiple documents
  • TF-IDF: term-frequency inverse-document-frequency

Examples

  • Fan Engagement Meter - Peter Decherney, James Fiumara, Scott Enderle (Price Lab) - quantifying text reuse in fan fiction archives from popular film franchises
  • On The Books - UNC - identifying Jim Crow laws that have not been repealed

Language models

Methods

  • Named Entity Recognition/parts-of-speech tagging
  • Word vectors/word embeddings
  • Sentiment analysis*
  • Text classification
  • Text prediction/generation

Examples

TF-IDF

TF-IDF with SciKit Learn, from Intro to Cultural Analytics in Python.

Other resources

Constellate tutorial on Significant Terms

Tools and resources