117 packages

Adaptive Experimentation & Bandits

Lightweight microframework for Bayesian bandits (Thompson Sampling) with support for contextual/restless/delayed rewards.

A/B testing experimentation Bayesian

Implements a wide range of contextual bandit algorithms (linear, tree-based, neural) and off-policy evaluation methods.

A/B testing experimentation machine learning

Production-ready, scikit-learn style library for contextual & stochastic bandits with parallelism and simulation tools.

A/B testing experimentation

Framework for **offline evaluation (OPE)** of bandit policies using logged data. Implements IPS, DR, DM estimators.

A/B testing experimentation

Library for advanced bandit problems: X-armed bandits (continuous/structured action spaces) and online optimization.

A/B testing experimentation

Comprehensive research framework for single/multi-player MAB algorithms (stochastic, adversarial, contextual).

A/B testing experimentation

Bayesian Econometrics

High-level interface for building Bayesian GLMMs, built on top of PyMC. Uses formula syntax similar to R's `lme4`.

Bayesian inference

Bayesian Marketing Mix Modeling (see Marketing Mix Models section).

Bayesian inference

Probabilistic programming library built on JAX for scalable Bayesian inference, often faster than PyMC.

Bayesian inference

Flexible probabilistic programming library for Bayesian modeling and inference using MCMC algorithms (NUTS).

Bayesian inference

Causal Discovery & Graphical Models

Causal inference using graphical models (DAGs), including identification theory and effect estimation.

causal inference graphs

Implements algorithms for causal discovery (recovering causal graph structure) from observational data.

causal inference graphs

Uses Bayesian Networks for causal reasoning, combining ML with expert knowledge to model relationships.

causal inference graphs Bayesian

Specialized package for learning non-Gaussian linear causal models, implementing various versions of the LiNGAM algorithm including ICA-based methods.

causal inference graphs

Specialized package for causal inference in time series data implementing PCMCI, PCMCIplus, LPCMCI algorithms with conditional independence tests.

causal inference graphs

Comprehensive Python package serving as Python translation and extension of Java-based Tetrad toolkit for causal discovery algorithms.

causal inference graphs

Huawei Noah's Ark Lab end-to-end causal structure learning toolchain emphasizing gradient-based methods with GPU acceleration (NOTEARS, GOLEM).

causal inference graphs

Python interface to Tetrad Java library using JPype, providing direct access to Tetrad's causal discovery algorithms with efficient data translation.

causal inference graphs

Causal Inference & Matching

Implements classical causal inference methods like propensity score matching, inverse probability weighting, stratification.

causal inference matching

IBM-developed package that provides a scikit-learn-inspired API for causal inference with meta-algorithms supporting arbitrary machine learning models.

causal inference matching

Focuses on uplift modeling and heterogeneous treatment effect estimation using machine learning techniques.

causal inference matching

Implements Propensity Score Matching (PSM) and Coarsened Exact Matching (CEM) with ML flexibility for propensity score estimation.

causal inference matching

Python library for causal research that addresses the scarcity of real-world datasets with known causal relations. Provides fine-grained control over structural causal models.

causal inference matching

Developed by PyMC Labs, focuses specifically on causal inference in quasi-experimental settings. Specializes in scenarios where randomization is impossible or expensive.

causal inference matching

End-to-end framework for causal inference based on causal graphs (DAGs) and potential outcomes. Covers identification, estimation, refutation.

causal inference matching

Fast k-nearest-neighbor matching for large datasets using Facebook's FAISS library.

causal inference matching

Focuses on uplift modeling and estimating heterogeneous treatment effects using various ML-based methods.

causal inference matching

Core Libraries & Linear Models

Foundational ML library with regression models (incl. regularized), model selection, cross-validation, evaluation metrics.

regression linear models

Comprehensive library for estimating statistical models (OLS, GLM, etc.), conducting tests, and data exploration. Core tool.

regression linear models

Dimensionality Reduction

Specialized library for Exploratory (EFA) and Confirmatory (CFA) Factor Analysis with rotation options for interpretability.

machine learning dimensionality

Optimized, parallel implementation of t-distributed Stochastic Neighbor Embedding (t-SNE) for large datasets.

machine learning dimensionality

Fast and scalable implementation of Uniform Manifold Approximation and Projection (UMAP) for non-linear reduction.

machine learning dimensionality

Discrete Choice Models

Maximum likelihood estimation of parametric models, with strong support for complex discrete choice models.

discrete choice logit

Tools for estimating demand for differentiated products using the Berry-Levinsohn-Pakes (BLP) method.

discrete choice logit

Flexible implementation of conditional/multinomial logit models with utilities for data preparation.

discrete choice logit

Fast estimation of Multinomial Logit and Mixed Logit models, optimized for performance.

discrete choice logit

PyTorch framework for flexible estimation of complex discrete choice models, leveraging GPU acceleration.

discrete choice logit

Double/Debiased Machine Learning (DML)

Implements the double/debiased ML framework (Chernozhukov et al.) for estimating causal parameters (ATE, LATE, POM) with ML nuisances.

machine learning causal inference

Microsoft toolkit for estimating heterogeneous treatment effects using DML, causal forests, meta-learners, and orthogonal ML methods.

machine learning causal inference

Double‑post Lasso estimator for high‑dimensional treatment effects (Belloni‑Chernozhukov‑Hansen 2014).

machine learning causal inference

Debiased‑Lasso detector of heterogeneous treatment effects in randomized experiments.

machine learning causal inference

Instrumental Variables (IV) & GMM

Lightweight package for setting up and estimating custom GMM models based on user-defined moment conditions.

IV GMM

Marketing Mix Models (MMM) & Business Analytics

Analyze customer lifetime value (CLV) using probabilistic models (BG/NBD, Pareto/NBD) to predict purchases.

marketing analytics

Lightweight Python library focused specifically on Marketing Mix Modeling implementation.

marketing analytics

Collection of Bayesian marketing models built with PyMC, including MMM, CLV, and attribution.

marketing analytics Bayesian

Python/STAN implementation of Bayesian Marketing Mix Models.

marketing analytics Bayesian

Natural Language Processing for Economics

Library focused on topic modeling (LDA, LSI) and document similarity analysis.

NLP text analysis

Access to thousands of pre-trained models for NLP tasks like text classification, summarization, embeddings, etc.

NLP text analysis

Industrial-strength NLP library for efficient text processing pipelines (NER, POS tagging, etc.).

NLP text analysis

Numerical Optimization & Computational Tools

High-performance numerical computing with autograd and XLA compilation on CPU/GPU/TPU.

optimization computation

Popular deep learning framework with flexible automatic differentiation.

optimization computation machine learning

Panel Data & Fixed Effects

Solves linear models with high-dimensional fixed effects, supporting robust variance calculation and IV.

panel data fixed effects

Estimation of fixed, random, pooled OLS models for panel data. Also Fama-MacBeth and between/first-difference estimators.

panel data fixed effects

Fast estimation of linear models with multiple high-dimensional fixed effects (like R's `fixest`). Supports OLS, IV, Poisson, robust/cluster SEs.

panel data fixed effects

Out-of-core regression (OLS/IV) for very large datasets using DuckDB aggregation. Handles data that doesn't fit in memory.

panel data fixed effects

Estimation of dynamic panel data models using Arellano-Bond (Difference GMM) and Blundell-Bond (System GMM). Includes Windmeijer correction & tests.

panel data fixed effects

Power Simulation & Design of Experiments

Bayesian Adaptive Design Optimization (ADO) for tuning experiments in real-time, with models for psychometric tasks.

power analysis experiments Bayesian

Parallel active learning library for adaptive function sampling/evaluation, with live plotting for monitoring.

power analysis experiments

Automates generation and optimization of designs, especially for mixed factor-level experiments; computes efficiency metrics.

power analysis experiments

Implements classical Design of Experiments: factorial (full/fractional), response surface (Box-Behnken, CCD), Latin Hypercube.

power analysis experiments

Program Evaluation Methods (DiD, SC, RDD)

Python port of Google's R package for estimating causal effects of interventions on time series using Bayesian structural time-series models.

DiD synthetic control RDD Bayesian

Implements modern difference-in-differences methods for staggered adoption designs (e.g., Callaway & Sant'Anna).

DiD synthetic control RDD

Implementation of synthetic control methods for comparative case studies when panel data is available.

DiD synthetic control RDD

Python adaptation of the R `did` package. Implements multi-period DiD with staggered treatment timing (Callaway & Sant’Anna).

DiD synthetic control RDD

Implements advanced synthetic control methods: forward DiD, cluster SC, factor models, and proximal SC. Designed for single-treated-unit settings.

DiD synthetic control RDD

Changes‑in‑Changes (CiC) estimator for distributional treatment effects (Athey & Imbens 2006).

DiD synthetic control RDD causal inference

Lee (2009) sample‑selection bounds for treatment effects; trims treated distribution to match selection rates.

DiD synthetic control RDD causal inference

Toolkit for sharp RDD analysis, including bandwidth calculation and estimation, integrating with pandas.

DiD synthetic control RDD

Comprehensive tools for Regression Discontinuity Designs (RDD), including optimal bandwidth selection, estimation, inference.

DiD synthetic control RDD

Quantile Regression & Distributional Methods

Fast quantile regression solver using interior point methods, supporting robust and clustered standard errors.

quantile regression

Recentered Influence‑Function (RIF) regression for unconditional quantile & distributional effects (Firpo et al., 2008).

quantile regression

Scikit-learn compatible implementation of Quantile Regression Forests for non-parametric estimation.

quantile regression

Spatial Econometrics

The broader PySAL ecosystem contains many tools for spatial data handling, weights, visualization, and analysis.

spatial geography

The spatial regression `spreg` module of PySAL. Implements spatial lag, error, IV models, and diagnostics.

spatial geography

Standard Errors, Bootstrapping & Reporting

Curated list of quantitative finance libraries and resources (many statistical/TS tools overlap with econometrics).

bootstrap standard errors

Teaches software design principles for ML—modularity, abstraction, and reproducibility—going beyond ad hoc Jupyter workflows. Focus on maintainable, production-quality ML code.

bootstrap standard errors

Modern introduction to causal inference methods (DiD, IV, RDD, Synth, ML-based) with Python code examples.

bootstrap standard errors

Practical guide by A. Turrell on using Python for modern econometric research, data analysis, and workflows.

bootstrap standard errors

Intermediate 5-course series by Andrew Ng covering deep neural networks, CNNs, RNNs, transformers, and real-world DL applications using TensorFlow.

bootstrap standard errors machine learning

Beginner-friendly 3-course series by Andrew Ng covering core ML methods (regression, classification, clustering, trees, NN) with hands-on projects.

bootstrap standard errors

Comprehensive intro notes by Kevin Sheppard covering Python basics, core libraries, and econometrics applications.

bootstrap standard errors

High-quality lecture series on quantitative economic modeling, computational tools, and economics using Python/Julia.

bootstrap standard errors

(`scipy.stats.bootstrap`) Computes bootstrap confidence intervals for various statistics using percentile, BCa methods.

bootstrap standard errors

Python port of R's stargazer for creating publication-quality regression tables (HTML, LaTeX) from `statsmodels` & `linearmodels` results.

bootstrap standard errors

Teaches essential developer tools often skipped in formal education—command line, Git, Vim, scripting, debugging, etc.

bootstrap standard errors

Fast implementation of various wild cluster bootstrap algorithms (WCR, WCU) for robust inference, especially with few clusters.

bootstrap standard errors

State Space & Volatility Models

Focuses on Kalman filters (standard, EKF, UKF) and smoothers with a clear, pedagogical implementation style.

volatility state space

Specialized package for estimating Dynamic Factor Models (DFM) using state-space methods and Kalman filtering.

volatility state space

Implements Kalman filter, smoother, and EM algorithm for parameter estimation, including support for missing values and UKF.

volatility state space

(See Bayesian) Bayesian state-space modeling using PyMC, integrating Kalman filtering within MCMC for parameter estimation.

volatility state space Bayesian

Efficient Bayesian estimation of stochastic volatility (SV) models using MCMC.

volatility state space Bayesian

Statistical Inference & Hypothesis Testing

User-friendly interface for common statistical tests (ANOVA, ANCOVA, t-tests, correlations, chi², reliability) built on pandas & scipy.

inference hypothesis testing

Part of the PyWhy ecosystem providing statistical methods specifically for causal applications, including various independence tests and power-divergence methods.

inference hypothesis testing

Foundational module within SciPy for a wide range of statistical functions, distributions, and hypothesis tests (t-tests, ANOVA, chi², KS, etc.).

inference hypothesis testing

Library focused on hypothesis testing: ANOVA/MANOVA, t-tests, chi-square, Fisher's exact, nonparametric tests (Mann-Whitney, Kruskal-Wallis, etc.).

inference hypothesis testing

Comprehensive library for survival analysis: Kaplan-Meier, Nelson-Aalen, Cox regression, AFT models, handling censored data.

inference hypothesis testing

Structural Econometrics & Estimation

Framework for describing and solving economic models (DSGE, OLG, etc.) using a declarative YAML-based format.

structural estimation

Toolkit for solving, simulating, and estimating models with heterogeneous agents (e.g., consumption-saving).

structural estimation

Core library for quantitative economics: dynamic programming, Markov chains, game theory, numerical methods.

structural estimation

Simulation and estimation of finite-horizon dynamic discrete choice (DDC) models (e.g., labor/education choice).

structural estimation

Synthetic Data Generation

Comprehensive library for generating synthetic tabular, relational, and time series data using various models.

synthetic data simulation

Port of the R package for generating synthetic populations based on sample survey data.

synthetic data simulation

Time Series Econometrics

Specialized library for modeling and forecasting conditional volatility using ARCH, GARCH, EGARCH, and related models.

time series econometrics

Broad toolkit for time series analysis, including multivariate analysis, detection (outliers, change points, trends), feature extraction.

time series econometrics

Community implementations of Jordà (2005) Local Projections for estimating impulse responses without VAR assumptions.

time series econometrics

Time Series Forecasting

Scalable time series forecasting using machine learning models (e.g., LightGBM, XGBoost) as regressors.

forecasting time series machine learning

Deep learning models (N-BEATS, N-HiTS, Transformers, RNNs) for time series forecasting, built on PyTorch Lightning.

forecasting time series machine learning

Forecasting procedure for time series with strong seasonality and trend components, developed by Facebook.

forecasting time series

Fast, scalable implementations of popular statistical forecasting models (ETS, ARIMA, Theta, etc.) optimized for performance.

forecasting time series

ARIMA modeling with automatic parameter selection (auto-ARIMA), similar to R's `forecast::auto.arima`.

forecasting time series

Unified framework for various time series tasks, including forecasting with classical, ML, and deep learning models.

forecasting time series machine learning

Tree & Ensemble Methods for Prediction

Gradient boosting library excelling with categorical features (minimal preprocessing needed). Robust against overfitting.

machine learning prediction

Fast, distributed gradient boosting (also supports RF). Known for speed, low memory usage, and handling large datasets.

machine learning prediction

Extends gradient boosting to probabilistic prediction, providing uncertainty estimates alongside point predictions. Built on scikit-learn.

machine learning prediction

(`RandomForestClassifier`/`Regressor`) Widely-used, versatile implementation of Random Forests. Easy API and parallel processing support.

machine learning prediction

High-performance, optimized gradient boosting library (also supports RF). Known for speed, efficiency, and winning competitions.

machine learning prediction

GPU-accelerated implementation of Random Forests for significant speedups on large datasets. Scikit-learn compatible API.

machine learning prediction