Module 1.3: Your First Tigramite Workflow

25 min Prerequisites: Modules 1.1 and 1.2

What You'll Learn

  1. The 3-step causal discovery recipe
  2. How to generate toy data with known causal structure
  3. Running PCMCI and interpreting results
  4. Visualizing causal graphs

The 3-Step Recipe

Every Tigramite analysis follows the same pattern:

┌─────────────────────────────────────────┐ │ Step 1: DATA → DataFrame │ │ Your numpy array goes into DataFrame │ └────────────────┬────────────────────────┘ │ ▼ ┌─────────────────────────────────────────┐ │ Step 2: METHOD + TEST → PCMCI │ │ Choose discovery method and CI test │ └────────────────┬────────────────────────┘ │ ▼ ┌─────────────────────────────────────────┐ │ Step 3: RUN & VISUALIZE │ │ Execute and plot the causal graph │ └─────────────────────────────────────────┘

Setup: Import Libraries

# Core imports
import numpy as np
import matplotlib.pyplot as plt

# Tigramite imports
from tigramite import data_processing as pp
from tigramite import plotting as tp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests.parcorr import ParCorr
from tigramite.toymodels import structural_causal_processes as toys

Create Toy Data with Known Structure

To verify Tigramite works, we'll create synthetic data where we know the true causal structure. This is like a "test" - if Tigramite can recover what we built, we know it's working!

Our Factory Scenario

Imagine a factory with 4 sensors measuring different things every hour:

  • X0: Machine Temperature - tends to stay warm (depends on its past)
  • X1: Pressure - affected by an external factor
  • X2: Quality Score - what we care about! Affected by Temperature and Pressure
  • X3: External Factor - something outside (like ambient conditions)

The True Causal Structure We'll Build:

Temperature(t-1) → Temperature(t) [stays warm] Temperature(t-1) → Quality(t) [heat affects quality] External(t-1) → Pressure(t) [external affects pressure] Pressure(t-1) → Pressure(t) [pressure is stable] Pressure(t-1) → Quality(t) [pressure affects quality] External(t-1) → External(t) [external is stable]

Understanding the Code

The code uses a special format to define causal links:

  • lin_f = a simple linear function (the effect is proportional to the cause)
  • ((0, -1), 0.7, lin_f) means: "Variable 0 at lag 1 causes this variable with strength 0.7"
  • The dictionary {target: [list of causes]} defines what causes each variable
# Define the TRUE causal structure
np.random.seed(42)  # For reproducibility

# Linear function: output = input (simple proportional effect)
def lin_f(x):
    return x

# Define causal links
# Format: {target_variable: [((cause_var, -lag), strength, function), ...]}

true_links = {
    # Variable 0 (Temperature): caused by its own past
    0: [((0, -1), 0.7, lin_f)],

    # Variable 1 (Pressure): caused by its past AND External factor
    1: [((1, -1), 0.6, lin_f),    # Pressure depends on past pressure
        ((3, -1), 0.5, lin_f)],   # Pressure depends on External

    # Variable 2 (Quality): caused by Temperature AND Pressure
    2: [((0, -1), 0.6, lin_f),    # Quality depends on Temperature
        ((1, -1), 0.5, lin_f)],   # Quality depends on Pressure

    # Variable 3 (External): caused by its own past only
    3: [((3, -1), 0.8, lin_f)]
}

# Generate synthetic data
T = 1000  # 1000 time points
data, _ = toys.structural_causal_process(true_links, T=T, seed=42)

print(f"Data shape: {data.shape}")
print(f"  - {data.shape[0]} time points (rows)")
print(f"  - {data.shape[1]} variables (columns)")

Visualize the Data

# Let's visualize our data
var_names = ['Temperature', 'Pressure', 'Quality', 'External']

fig, axes = plt.subplots(4, 1, figsize=(12, 8), sharex=True)

colors = ['coral', 'steelblue', 'seagreen', 'purple']
for i, (ax, name, color) in enumerate(zip(axes, var_names, colors)):
    ax.plot(data[:200, i], color=color, linewidth=1)
    ax.set_ylabel(name)
    ax.grid(True, alpha=0.3)

axes[-1].set_xlabel('Time')
axes[0].set_title('Factory Sensor Data (first 200 time steps)')
plt.tight_layout()
plt.show()

Step 1: Data → DataFrame

Wrap your numpy array in Tigramite's DataFrame class.

# STEP 1: Create DataFrame
var_names = ['Temperature', 'Pressure', 'Quality', 'External']

dataframe = pp.DataFrame(
    data,                    # Your numpy array (T, N)
    var_names=var_names,     # Optional: variable names
)

print("DataFrame created!")
print(f"Variables: {dataframe.var_names}")

Step 2: Method + Test → PCMCI

Choose your:

  1. Independence test: How to measure if two variables are related
  2. Discovery method: Algorithm to find the causal graph

For this example:

  • ParCorr: Partial correlation (good for linear relationships)
  • PCMCI: Standard algorithm (assumes no same-time causation)
# STEP 2: Initialize method and test

# Independence test (ParCorr = partial correlation, for linear data)
parcorr = ParCorr(significance='analytic')

# Discovery method (PCMCI)
pcmci = PCMCI(
    dataframe=dataframe,     # Our data
    cond_ind_test=parcorr,   # How to test independence
    verbosity=1              # 0=quiet, 1=some output, 2=detailed
)

Step 3: Run & Visualize

Execute the algorithm with key parameters:

  • tau_max: Maximum lag to search (how far back in time?)
  • pc_alpha: Significance level for variable selection (None = auto-select)
  • alpha_level: Final significance threshold for the graph
# STEP 3: Run causal discovery
results = pcmci.run_pcmci(
    tau_max=5,           # Check up to 5 time lags
    pc_alpha=None,       # Auto-select significance level
    alpha_level=0.01     # Final threshold (p < 0.01 = significant)
)

Understanding the Results

The results dictionary contains:

  • graph: The causal graph (who causes who)
  • val_matrix: Strength of each link (test statistic values)
  • p_matrix: P-values (lower = more confident)
# Print significant links
pcmci.print_significant_links(
    p_matrix=results['p_matrix'],
    val_matrix=results['val_matrix'],
    alpha_level=0.01
)

Visualize the Causal Graph

# Process graph (summary view)
tp.plot_graph(
    graph=results['graph'],
    val_matrix=results['val_matrix'],
    var_names=var_names,
    link_colorbar_label='MCI strength',
    node_colorbar_label='Auto-MCI',
    figsize=(8, 6)
)
plt.show()

Time Series Graph (Temporal View)

# Time series graph (shows temporal structure)
tp.plot_time_series_graph(
    graph=results['graph'],
    val_matrix=results['val_matrix'],
    var_names=var_names,
    figsize=(12, 6),
    link_colorbar_label='MCI strength'
)
plt.show()

Verify: Did We Recover the True Structure?

Let's compare what we found vs. what we know is true.

# True links we created
print("TRUE CAUSAL LINKS (what we built):")
print("="*50)
true_edges = [
    ("Temperature(t-1)", "Temperature(t)", 0.7),
    ("Pressure(t-1)", "Pressure(t)", 0.6),
    ("External(t-1)", "Pressure(t)", 0.5),
    ("Temperature(t-1)", "Quality(t)", 0.6),
    ("Pressure(t-1)", "Quality(t)", 0.5),
    ("External(t-1)", "External(t)", 0.8),
]
for source, target, coeff in true_edges:
    print(f"  {source} → {target} (strength: {coeff})")

print("\n" + "="*50)
print("DISCOVERED LINKS:")
print("="*50)

# Extract discovered links
for j in range(4):  # target variable
    for i in range(4):  # source variable
        for tau in range(6):  # lag
            if results['graph'][j, i, tau] == '-->':
                strength = results['val_matrix'][j, i, tau]
                print(f"  {var_names[i]}(t-{tau}) → {var_names[j]}(t) (strength: {strength:.3f})")

Complete Template

Here's the entire workflow in one block - copy this for your own analysis!

# =============================================================================
# COMPLETE TIGRAMITE WORKFLOW TEMPLATE
# =============================================================================

import numpy as np
from tigramite import data_processing as pp
from tigramite import plotting as tp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests.parcorr import ParCorr

# STEP 1: Load your data (shape: T rows, N columns)
data = np.loadtxt('your_data.csv', delimiter=',')
var_names = ['Var1', 'Var2', 'Var3']
dataframe = pp.DataFrame(data, var_names=var_names)

# STEP 2: Set up PCMCI
parcorr = ParCorr(significance='analytic')
pcmci = PCMCI(dataframe=dataframe, cond_ind_test=parcorr, verbosity=0)

# STEP 3: Run causal discovery
results = pcmci.run_pcmci(tau_max=5, pc_alpha=None, alpha_level=0.05)

# STEP 4: Visualize
tp.plot_graph(graph=results['graph'], val_matrix=results['val_matrix'], var_names=var_names)
plt.show()

# Print results
pcmci.print_significant_links(p_matrix=results['p_matrix'], val_matrix=results['val_matrix'], alpha_level=0.05)

Quick Quiz

Q1: What does tau_max=5 mean?

Search for causal links up to 5 time steps in the past. Links like X(t-1)→Y(t), X(t-2)→Y(t), etc., up to X(t-5)→Y(t).

Q2: Why do we use significance='analytic' for ParCorr?

For partial correlation with Gaussian data, the null distribution is mathematically known (Student's t), so we can compute p-values analytically. This is faster than permutation tests.

Q3: What's the difference between pc_alpha and alpha_level?
  • pc_alpha: Used during the PC algorithm phase to select potential causal parents
  • alpha_level: Final threshold for what counts as a significant link in the output graph

Key Takeaways

  1. 3-Step Recipe: Data → DataFrame → PCMCI → Run → Visualize
  2. DataFrame wraps your numpy array with metadata
  3. PCMCI needs a conditional independence test (ParCorr for linear data)
  4. tau_max controls how far back in time to search
  5. Results contain graph, val_matrix, and p_matrix
  6. Always verify with toy data where you know the truth!