PyDistintoX
A Python tool for contrastive text analysis using 16 statistical distinctiveness measures (based on Pydistinto).
PyDistintoX compares two text corpora using measures such as TF-IDF, Zeta, Chi-squared, LLR, and Eta to quantify and visualize lexical distinctiveness.
Usage
PyDistintoX can be used in two ways:
- CLI — run analyses directly from the command line
- Python library — import functions for custom workflows
If you already have tokenized/lemmatized data (e.g. from a gensim corpus) and want to skip NLP processing, see Pre-Parsed Data.
Installation
# CLI via uv (recommended)
uv tool install pydistintox
# or via pip
pip install pipx && pipx install pydistintox