Module 1.3: Your First Tigramite Workflow
What You'll Learn
- The 3-step causal discovery recipe
- How to generate toy data with known causal structure
- Running PCMCI and interpreting results
- Visualizing causal graphs
The 3-Step Recipe
Every Tigramite analysis follows the same pattern:
Setup: Import Libraries
# Core imports
import numpy as np
import matplotlib.pyplot as plt
# Tigramite imports
from tigramite import data_processing as pp
from tigramite import plotting as tp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests.parcorr import ParCorr
from tigramite.toymodels import structural_causal_processes as toys
Create Toy Data with Known Structure
To verify Tigramite works, we'll create synthetic data where we know the true causal structure. This is like a "test" - if Tigramite can recover what we built, we know it's working!
Our Factory Scenario
Imagine a factory with 4 sensors measuring different things every hour:
- X0: Machine Temperature - tends to stay warm (depends on its past)
- X1: Pressure - affected by an external factor
- X2: Quality Score - what we care about! Affected by Temperature and Pressure
- X3: External Factor - something outside (like ambient conditions)
The True Causal Structure We'll Build:
Understanding the Code
The code uses a special format to define causal links:
lin_f= a simple linear function (the effect is proportional to the cause)((0, -1), 0.7, lin_f)means: "Variable 0 at lag 1 causes this variable with strength 0.7"- The dictionary
{target: [list of causes]}defines what causes each variable
# Define the TRUE causal structure
np.random.seed(42) # For reproducibility
# Linear function: output = input (simple proportional effect)
def lin_f(x):
return x
# Define causal links
# Format: {target_variable: [((cause_var, -lag), strength, function), ...]}
true_links = {
# Variable 0 (Temperature): caused by its own past
0: [((0, -1), 0.7, lin_f)],
# Variable 1 (Pressure): caused by its past AND External factor
1: [((1, -1), 0.6, lin_f), # Pressure depends on past pressure
((3, -1), 0.5, lin_f)], # Pressure depends on External
# Variable 2 (Quality): caused by Temperature AND Pressure
2: [((0, -1), 0.6, lin_f), # Quality depends on Temperature
((1, -1), 0.5, lin_f)], # Quality depends on Pressure
# Variable 3 (External): caused by its own past only
3: [((3, -1), 0.8, lin_f)]
}
# Generate synthetic data
T = 1000 # 1000 time points
data, _ = toys.structural_causal_process(true_links, T=T, seed=42)
print(f"Data shape: {data.shape}")
print(f" - {data.shape[0]} time points (rows)")
print(f" - {data.shape[1]} variables (columns)")
Visualize the Data
# Let's visualize our data
var_names = ['Temperature', 'Pressure', 'Quality', 'External']
fig, axes = plt.subplots(4, 1, figsize=(12, 8), sharex=True)
colors = ['coral', 'steelblue', 'seagreen', 'purple']
for i, (ax, name, color) in enumerate(zip(axes, var_names, colors)):
ax.plot(data[:200, i], color=color, linewidth=1)
ax.set_ylabel(name)
ax.grid(True, alpha=0.3)
axes[-1].set_xlabel('Time')
axes[0].set_title('Factory Sensor Data (first 200 time steps)')
plt.tight_layout()
plt.show()
Step 1: Data → DataFrame
Wrap your numpy array in Tigramite's DataFrame class.
# STEP 1: Create DataFrame
var_names = ['Temperature', 'Pressure', 'Quality', 'External']
dataframe = pp.DataFrame(
data, # Your numpy array (T, N)
var_names=var_names, # Optional: variable names
)
print("DataFrame created!")
print(f"Variables: {dataframe.var_names}")
Step 2: Method + Test → PCMCI
Choose your:
- Independence test: How to measure if two variables are related
- Discovery method: Algorithm to find the causal graph
For this example:
- ParCorr: Partial correlation (good for linear relationships)
- PCMCI: Standard algorithm (assumes no same-time causation)
# STEP 2: Initialize method and test
# Independence test (ParCorr = partial correlation, for linear data)
parcorr = ParCorr(significance='analytic')
# Discovery method (PCMCI)
pcmci = PCMCI(
dataframe=dataframe, # Our data
cond_ind_test=parcorr, # How to test independence
verbosity=1 # 0=quiet, 1=some output, 2=detailed
)
Step 3: Run & Visualize
Execute the algorithm with key parameters:
tau_max: Maximum lag to search (how far back in time?)pc_alpha: Significance level for variable selection (None = auto-select)alpha_level: Final significance threshold for the graph
# STEP 3: Run causal discovery
results = pcmci.run_pcmci(
tau_max=5, # Check up to 5 time lags
pc_alpha=None, # Auto-select significance level
alpha_level=0.01 # Final threshold (p < 0.01 = significant)
)
Understanding the Results
The results dictionary contains:
graph: The causal graph (who causes who)val_matrix: Strength of each link (test statistic values)p_matrix: P-values (lower = more confident)
# Print significant links
pcmci.print_significant_links(
p_matrix=results['p_matrix'],
val_matrix=results['val_matrix'],
alpha_level=0.01
)
Visualize the Causal Graph
# Process graph (summary view)
tp.plot_graph(
graph=results['graph'],
val_matrix=results['val_matrix'],
var_names=var_names,
link_colorbar_label='MCI strength',
node_colorbar_label='Auto-MCI',
figsize=(8, 6)
)
plt.show()
Time Series Graph (Temporal View)
# Time series graph (shows temporal structure)
tp.plot_time_series_graph(
graph=results['graph'],
val_matrix=results['val_matrix'],
var_names=var_names,
figsize=(12, 6),
link_colorbar_label='MCI strength'
)
plt.show()
Verify: Did We Recover the True Structure?
Let's compare what we found vs. what we know is true.
# True links we created
print("TRUE CAUSAL LINKS (what we built):")
print("="*50)
true_edges = [
("Temperature(t-1)", "Temperature(t)", 0.7),
("Pressure(t-1)", "Pressure(t)", 0.6),
("External(t-1)", "Pressure(t)", 0.5),
("Temperature(t-1)", "Quality(t)", 0.6),
("Pressure(t-1)", "Quality(t)", 0.5),
("External(t-1)", "External(t)", 0.8),
]
for source, target, coeff in true_edges:
print(f" {source} → {target} (strength: {coeff})")
print("\n" + "="*50)
print("DISCOVERED LINKS:")
print("="*50)
# Extract discovered links
for j in range(4): # target variable
for i in range(4): # source variable
for tau in range(6): # lag
if results['graph'][j, i, tau] == '-->':
strength = results['val_matrix'][j, i, tau]
print(f" {var_names[i]}(t-{tau}) → {var_names[j]}(t) (strength: {strength:.3f})")
Complete Template
Here's the entire workflow in one block - copy this for your own analysis!
# =============================================================================
# COMPLETE TIGRAMITE WORKFLOW TEMPLATE
# =============================================================================
import numpy as np
from tigramite import data_processing as pp
from tigramite import plotting as tp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests.parcorr import ParCorr
# STEP 1: Load your data (shape: T rows, N columns)
data = np.loadtxt('your_data.csv', delimiter=',')
var_names = ['Var1', 'Var2', 'Var3']
dataframe = pp.DataFrame(data, var_names=var_names)
# STEP 2: Set up PCMCI
parcorr = ParCorr(significance='analytic')
pcmci = PCMCI(dataframe=dataframe, cond_ind_test=parcorr, verbosity=0)
# STEP 3: Run causal discovery
results = pcmci.run_pcmci(tau_max=5, pc_alpha=None, alpha_level=0.05)
# STEP 4: Visualize
tp.plot_graph(graph=results['graph'], val_matrix=results['val_matrix'], var_names=var_names)
plt.show()
# Print results
pcmci.print_significant_links(p_matrix=results['p_matrix'], val_matrix=results['val_matrix'], alpha_level=0.05)
Quick Quiz
Q1: What does tau_max=5 mean?
Search for causal links up to 5 time steps in the past. Links like X(t-1)→Y(t), X(t-2)→Y(t), etc., up to X(t-5)→Y(t).
Q2: Why do we use significance='analytic' for ParCorr?
For partial correlation with Gaussian data, the null distribution is mathematically known (Student's t), so we can compute p-values analytically. This is faster than permutation tests.
Q3: What's the difference between pc_alpha and alpha_level?
pc_alpha: Used during the PC algorithm phase to select potential causal parentsalpha_level: Final threshold for what counts as a significant link in the output graph
Key Takeaways
- 3-Step Recipe: Data → DataFrame → PCMCI → Run → Visualize
- DataFrame wraps your numpy array with metadata
- PCMCI needs a conditional independence test (ParCorr for linear data)
- tau_max controls how far back in time to search
- Results contain graph, val_matrix, and p_matrix
- Always verify with toy data where you know the truth!