PanelRegression#
- class causalpy.experiments.panel_regression.PanelRegression[source]#
Panel regression with fixed effects estimation.
Enables panel-aware visualization and diagnostics, with support for both unpooled dummy-variable and demeaned (de-meaned) fixed effects.
- Parameters:
data (
DataFrame) – A pandas dataframe with panel data. Each row is an observation for a unit at a time period.formula (
str) – A statistical model formula using patsy syntax. For the unpooled dummy-variable fixed-effects approach, includeC(unit_var)(and optionallyC(time_var)) in the formula. For the demeaned transformation, do NOT include thoseC(...)terms; fixed effects are removed by transformation before fitting.unit_fe_variable (
str) – Column name for the unit identifier (e.g., “state”, “id”, “country”).time_fe_variable (
str|None) – Column name for the time identifier (e.g., “year”, “wave”, “period”). If provided, time fixed effects will be included. Default is None.fe_method (
Literal['dummies','demeaned']) –Method for handling fixed effects: - “dummies”: Use unpooled dummy-variable fixed effects
(
C(unit)/C(time)in formula). Gets individual unit effect estimates but creates N-1 dummy columns. Best for small N.”demeaned”: Use demeaned (de-meaned) transformation. Scales to large N but doesn’t directly estimate individual unit effects.
model (
PyMCModel|RegressorMixin|None) – A PyMC (Bayesian) or sklearn (OLS) model. If None, a model must be provided.
Examples
Small panel with dummy variables:
>>> import causalpy as cp >>> import pandas as pd >>> # Create small panel: 10 units, 20 time periods >>> np.random.seed(42) >>> units = [f"unit_{i}" for i in range(10)] >>> periods = range(20) >>> data = pd.DataFrame( ... [ ... { ... "unit": u, ... "time": t, ... "treatment": int(t >= 10 and u in units[:5]), ... "x1": np.random.randn(), ... "y": np.random.randn(), ... } ... for u in units ... for t in periods ... ] ... ) >>> result = cp.PanelRegression( ... data=data, ... formula="y ~ C(unit) + C(time) + treatment + x1", ... unit_fe_variable="unit", ... time_fe_variable="time", ... fe_method="dummies", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={"random_seed": 42, "progressbar": False} ... ), ... )
Large panel with demeaned transformation:
>>> # Create larger panel: 1000 units, 10 time periods >>> np.random.seed(42) >>> units = [f"unit_{i}" for i in range(1000)] >>> periods = range(10) >>> data = pd.DataFrame( ... [ ... { ... "unit": u, ... "time": t, ... "treatment": int(t >= 5), ... "x1": np.random.randn(), ... "y": np.random.randn(), ... } ... for u in units ... for t in periods ... ] ... ) >>> result = cp.PanelRegression( ... data=data, ... formula="y ~ treatment + x1", # No C(unit) needed ... unit_fe_variable="unit", ... time_fe_variable="time", ... fe_method="demeaned", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={"random_seed": 42, "progressbar": False} ... ), ... )
Notes
The demeaned transformation (de-meaning by group) removes time-invariant confounders but also drops time-invariant covariates from the model. For the
"dummies"approach (unpooled FE), individual unit effects can be extracted from the coefficients. For the demeaned approach, unit effects can be recovered post-hoc using the stored group means (_group_means), which are always computed from the original (pre-demeaning) data.This class does not yet implement hierarchical/partial-pooling fixed effects. Those semantics are intentionally kept out of scope here so
fe_method="dummies"remains an accurate label for the current unpooled estimator.Two-way fixed effects (unit + time) control for both unit-specific and time-specific unobserved heterogeneity. This is the standard approach in difference-in-differences estimation.
Balanced vs unbalanced panels: A panel is balanced when every unit is observed in every time period; otherwise it is unbalanced (e.g. unit entry/exit, missing waves). When both unit and time fixed effects are requested with
fe_method="demeaned", the sequential demeaning (first by unit, then by time) is algebraically equivalent to the standard two-way demeaned transformation only for balanced panels. For unbalanced panels, iterative alternating demeaning would be needed for exact convergence; the single-pass approximation used here may introduce small biases. Unbalanced panels are common in practice (e.g. firm or worker panels with attrition); for heavily unbalanced data, consider checking sensitivity or using dedicated FE packages that implement iterative two-way demeaning (e.g. reghdfe, pyfixest).Methods
Run the experiment algorithm: fit the model.
PanelRegression.effect_summary(*[, window, ...])Generate a decision-ready summary of causal effects.
PanelRegression.fit(*args, **kwargs)PanelRegression.generate_report(*[, ...])Generate a self-contained HTML report for this experiment.
PanelRegression.get_plot_data(*args, **kwargs)Recover the data of an experiment along with the prediction and causal impact information.
PanelRegression.get_plot_data_bayesian(**kwargs)Get plot data for Bayesian model.
PanelRegression.get_plot_data_ols(**kwargs)Get plot data for OLS model.
Validate input parameters.
PanelRegression.plot(*args[, show, ...])Plot the model.
Plot coefficient estimates with credible/confidence intervals.
PanelRegression.plot_residuals([kind])Plot residual diagnostics.
PanelRegression.plot_trajectories([units, ...])Plot unit-level time series trajectories.
Plot distribution of unit fixed effects.
PanelRegression.print_coefficients([round_to])Ask the model to print its coefficients.
PanelRegression.set_maketables_options(*[, ...])Set optional maketables rendering options for this experiment.
PanelRegression.summary([round_to])Print a summary of the panel regression results.
Attributes
idataReturn the InferenceData object of the model.
supports_bayessupports_olslabelsdata- __init__(data, formula, unit_fe_variable, time_fe_variable=None, fe_method='dummies', model=None, **kwargs)[source]#
- classmethod __new__(*args, **kwargs)#