PyDistintoX

A Python tool for contrastive text analysis using 16 statistical distinctiveness measures (based on Pydistinto).

PyDistintoX compares two text corpora using measures such as TF-IDF, Zeta, Chi-squared, LLR, and Eta to quantify and visualize lexical distinctiveness.

Usage

PyDistintoX can be used in two ways:

  • CLI — run analyses directly from the command line
  • Python library — import functions for custom workflows

If you already have tokenized/lemmatized data (e.g. from a gensim corpus) and want to skip NLP processing, see Pre-Parsed Data.

Installation

# CLI via uv (recommended)
uv tool install pydistintox

# or via pip
pip install pipx && pipx install pydistintox