Module 3.1: Causal Effect Estimation
What You'll Learn
- Difference between causal discovery and effect estimation
- Using the CausalEffects class
- Estimating direct, total, and mediated effects
- Answering "what if" intervention questions
Discovery vs. Effect Estimation
| Question | Tool | Output |
|---|---|---|
| "Does X cause Y?" | PCMCI (Discovery) | Graph (yes/no) |
| "How MUCH does X affect Y?" | CausalEffects | Number (effect size) |
| "What if we increase X by 10?" | CausalEffects | Predicted change in Y |
Discovery finds the structure. Effect estimation quantifies the strength.
Setup: Import Libraries
import numpy as np
import matplotlib.pyplot as plt
from tigramite import data_processing as pp
from tigramite import plotting as tp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests.parcorr import ParCorr
from tigramite.causal_effects import CausalEffects
from tigramite.toymodels import structural_causal_processes as toys
from sklearn.linear_model import LinearRegression
Create Data with Known Effects
# Create a system with KNOWN causal effects
np.random.seed(42)
def lin_f(x): return x
# True coefficients:
# X0(t-1) → X0(t) with strength 0.5
# X0(t-1) → X1(t) with strength 0.6 (the effect we want to estimate!)
# X1(t-1) → X1(t) with strength 0.4
# X1(t-1) → X2(t) with strength 0.7
true_links = {
0: [((0, -1), 0.5, lin_f)],
1: [((1, -1), 0.4, lin_f), ((0, -1), 0.6, lin_f)], # X0 → X1 = 0.6
2: [((2, -1), 0.3, lin_f), ((1, -1), 0.7, lin_f)], # X1 → X2 = 0.7
}
T = 2000
data, _ = toys.structural_causal_process(true_links, T=T, seed=42)
var_names = ['Advertising', 'WebTraffic', 'Sales']
dataframe = pp.DataFrame(data, var_names=var_names)
# Scenario: Marketing Analysis
# - Advertising (X0) affects Web Traffic (X1)
# - Web Traffic (X1) affects Sales (X2)
Step 1: Discover the Causal Graph
# First, discover the causal structure
parcorr = ParCorr(significance='analytic')
pcmci = PCMCI(dataframe=dataframe, cond_ind_test=parcorr, verbosity=0)
results = pcmci.run_pcmciplus(tau_max=3, pc_alpha=0.05)
# Print and visualize
pcmci.print_significant_links(
p_matrix=results['p_matrix'],
val_matrix=results['val_matrix'],
alpha_level=0.01
)
tp.plot_graph(
graph=results['graph'],
val_matrix=results['val_matrix'],
var_names=var_names,
figsize=(8, 5)
)
plt.show()
Step 2: Estimate Causal Effects
Now we'll quantify: "If we increase Advertising by 1 unit, how much does Web Traffic change?"
# Define the causal question
# X = cause (Advertising at lag 1)
# Y = effect (Web Traffic at current time)
X = [(0, -1)] # Advertising at t-1
Y = [(1, 0)] # Web Traffic at t
# Initialize CausalEffects with the discovered graph
causal_effects = CausalEffects(
graph=results['graph'],
graph_type='stationary_dag', # Type of graph from PCMCIplus
X=X,
Y=Y,
tau_max=3,
verbosity=0
)
# Fit the effect model using linear regression
causal_effects.fit_total_effect(
dataframe=dataframe,
estimator=LinearRegression(),
)
# Estimate the effect of a 1-unit increase in Advertising
intervention = np.array([[1.0]]) # Increase Advertising by 1
effect = causal_effects.predict_total_effect(intervention_data=intervention)
# Result:
# Estimated causal effect: ~0.6
# True causal effect: 0.6
# Interpretation: Increasing Advertising by 1 unit
# causes Web Traffic to increase by ~0.6 units
Total vs. Direct Effects
Consider the path: Advertising → Web Traffic → Sales
- Direct effect: Advertising → Web Traffic (0.6)
- Indirect effect: Advertising → Web Traffic → Sales
- Total effect of Advertising on Sales = Direct + Indirect paths
# Estimate total effect of Advertising on Sales
X = [(0, -1)] # Advertising at t-1
Y = [(2, 0)] # Sales at t
causal_effects_sales = CausalEffects(
graph=results['graph'],
graph_type='stationary_dag',
X=X, Y=Y,
tau_max=3,
verbosity=0
)
causal_effects_sales.fit_total_effect(
dataframe=dataframe,
estimator=LinearRegression()
)
effect_on_sales = causal_effects_sales.predict_total_effect(
intervention_data=np.array([[1.0]])
)
# This includes both direct and indirect paths
Intervention Scenarios: "What If" Analysis
# What if we increase advertising by different amounts?
interventions = np.linspace(-2, 2, 20).reshape(-1, 1)
effects = []
for intervention in interventions:
effect = causal_effects.predict_total_effect(
intervention_data=intervention.reshape(1, -1)
)
effects.append(effect[0, 0])
# Plot intervention effects
plt.figure(figsize=(10, 5))
plt.plot(interventions, effects, 'b-', linewidth=2)
plt.xlabel('Change in Advertising')
plt.ylabel('Predicted Change in Web Traffic')
plt.title('Intervention Analysis: What If We Change Advertising?')
plt.show()
# This plot answers: 'If we change advertising by X, how does traffic change?'
Key Takeaways
- CausalEffects quantifies HOW MUCH a cause affects an outcome
- Total effect includes all causal paths (direct + indirect)
- Intervention analysis answers "what if" questions
- Requires a known graph - run discovery first, then estimate effects
- Uses adjustment sets internally to remove confounding