Module 1.2: Core Concepts in Plain English

20 min Prerequisites: Module 1.1

What You'll Learn

  1. The vocabulary of causal inference (DAG, node, edge, etc.)
  2. How to read a causal graph
  3. Special concepts for time series (lags, autocorrelation)
  4. Tigramite's notation system

Causal Graphs: The Family Tree of Variables

A causal graph is like a family tree, but for variables:

TermMeaningFamily Analogy
NodeA variable (circle)A person
EdgeA causal link (arrow)Parent-child relationship
ParentDirect causeYour mom or dad
ChildDirect effectYour son or daughter
AncestorAny cause upstreamGrandparents, great-grandparents
DescendantAny effect downstreamGrandchildren

DAG: Directed Acyclic Graph

A DAG is a causal graph with two rules:

  1. Directed: Arrows point one way (cause → effect)
  2. Acyclic: No loops (you can't be your own grandparent!)

Valid DAG:

A → B → C ↓ ↓ D → E

Invalid (has a cycle):

A → B → C ↑ | └───────┘ ← Not allowed!

Visualizing a DAG

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

fig, ax = plt.subplots(figsize=(10, 6))

# Node positions
nodes = {
    'Temperature': (0.2, 0.7),
    'Pressure': (0.5, 0.7),
    'Humidity': (0.8, 0.7),
    'Quality': (0.5, 0.3)
}

# Draw nodes
for name, (x, y) in nodes.items():
    circle = plt.Circle((x, y), 0.08, color='lightblue', ec='steelblue', linewidth=2)
    ax.add_patch(circle)
    ax.text(x, y, name, ha='center', va='center', fontsize=10, fontweight='bold')

# Draw edges (arrows)
edges = [
    ('Temperature', 'Quality'),
    ('Pressure', 'Quality'),
    ('Humidity', 'Pressure'),
]

for start, end in edges:
    x1, y1 = nodes[start]
    x2, y2 = nodes[end]
    ax.annotate('', xy=(x2, y2 + 0.08), xytext=(x1, y1 - 0.08),
                arrowprops=dict(arrowstyle='->', color='coral', lw=2))

ax.set_title('Example DAG: Factory Process', fontsize=14)
plt.show()

# Reading this graph:
# - Temperature DIRECTLY causes Quality
# - Pressure DIRECTLY causes Quality
# - Humidity causes Quality INDIRECTLY (through Pressure)

The Big Three: Confounder, Mediator, Collider

These are the three fundamental patterns in causal graphs:

1. Confounder (Common Cause) - MOST IMPORTANT!

C / \ v v A B
  • C causes BOTH A and B
  • Creates spurious correlation between A and B
  • Example: Summer (C) causes both ice cream sales (A) and drowning (B)
  • Key insight: A and B look related, but neither causes the other!

2. Mediator (Chain)

A → M → B
  • A causes M, which causes B
  • M is the "middle step" in the causal path
  • Example: Smoking (A) → Tar in lungs (M) → Cancer (B)
  • Key insight: A affects B, but indirectly through M

3. Collider (Common Effect)

A B \ / v C
  • Both A and B cause C
  • A and B are NOT related to each other
  • Example: Both Talent (A) and Luck (B) cause Success (C)
  • Key insight: A and B are independent - knowing one tells you nothing about the other
Good news: You don't need to memorize these! Tigramite figures out which pattern fits your data automatically.

Time Series Graphs: Adding Time

In time series, the SAME variable at DIFFERENT times are treated as DIFFERENT nodes!

Time: t-2 t-1 t ┌───┐ ┌───┐ ┌───┐ X: │X₋₂│────▶│X₋₁│────▶│X₀ │ └───┘ └───┘ └───┘ │ ▼ ┌───┐ ┌───┐ ┌───┐ Y: │Y₋₂│────▶│Y₋₁│────▶│Y₀ │ └───┘ └───┘ └───┘

Key Time Series Terms

TermSymbolMeaning
Lagτ (tau)How many time steps back
AutocorrelationX(t-1) → X(t)Variable depends on its own past
Lagged effectX(t-τ) → Y(t)X at time t-τ causes Y at time t
ContemporaneousX(t) ↔ Y(t)Same-time relationship

Tigramite's Notation: (variable, -lag)

Tigramite uses a simple tuple notation: (variable_index, -lag)

NotationMeaning
(0, 0)Variable 0 at current time
(0, -1)Variable 0, one step in the past
(1, -2)Variable 1, two steps in the past
(2, -5)Variable 2, five steps in the past

Example

If we have variables: ['Temperature', 'Pressure', 'Quality']

  • (0, -1) = Temperature yesterday
  • (1, -2) = Pressure 2 days ago
  • (2, 0) = Quality today

Code Example: Understanding Tigramite Links

import numpy as np

var_names = ['Temperature', 'Pressure', 'Quality']

# A link is represented as: ((cause_var, -lag), target_var)
# Example links in Tigramite format:

example_links = [
    ((0, -1), 2),  # Temperature at lag 1 → Quality
    ((1, -2), 2),  # Pressure at lag 2 → Quality
    ((0, -1), 0),  # Temperature autocorrelation
]

print("Example causal links:")
print("="*50)
for (cause_var, lag), target in example_links:
    cause_name = var_names[cause_var]
    target_name = var_names[target]
    print(f"({cause_var}, {lag}) → {target}")
    print(f"  Meaning: {cause_name} at lag {-lag} causes {target_name}")
    print()

Graph Output Types in Tigramite

When you run causal discovery, Tigramite returns a graph with edge symbols:

SymbolMeaningWhen it occurs
-->Definite causal directionLagged links (time settles direction)
<--Reverse directionSame as above, other direction
o-oUndetermined directionContemporaneous (same time)
x-xConflicting evidenceRare, indicates issues
<->Bidirected (hidden confounder)Only with LPCMCI method
(empty)No linkVariables not causally related

Reading a Tigramite Graph Output

# Example: Understanding a Tigramite graph output
import numpy as np

# Simulated graph array (3 variables, max lag 2)
# Shape: (N, N, tau_max+1) where graph[i, j, tau] = link from j at lag tau to i

example_graph = np.array([
    # Links TO variable 0 (Temperature)
    [['', '-->', ''],      # FROM var 0 at lags [0, 1, 2]
     ['', '', ''],         # FROM var 1
     ['', '', '']],        # FROM var 2

    # Links TO variable 1 (Pressure)
    [['', '', ''],
     ['', '-->', ''],      # Pressure autocorrelation at lag 1
     ['', '', '']],

    # Links TO variable 2 (Quality)
    [['', '-->', ''],      # Temperature at lag 1 → Quality
     ['', '', '-->'],      # Pressure at lag 2 → Quality
     ['', '', '']]
])

var_names = ['Temperature', 'Pressure', 'Quality']

print("Reading the graph array:")
print("="*50)
for j in range(3):
    for i in range(3):
        for tau in range(3):
            if example_graph[j, i, tau] != '':
                print(f"{var_names[i]}(t-{tau}) {example_graph[j, i, tau]} {var_names[j]}(t)")

Conditional Independence: The Key Insight

Causal discovery works by testing conditional independence:

"Are X and Y independent GIVEN Z?" Written as: X ⊥ Y | Z

Why This Matters

  • If X → Y (direct cause), then X and Y will be dependent even after conditioning on other variables.
  • If X ← Z → Y (confounder), then X and Y become independent once we condition on Z.

Tigramite tests many conditional independencies to figure out the causal structure!

Demonstration: Conditional Independence

import numpy as np

np.random.seed(42)
n = 1000

# Confounder structure: Z → X and Z → Y
Z = np.random.randn(n)  # Common cause
X = 0.8 * Z + np.random.randn(n) * 0.5  # Caused by Z
Y = 0.8 * Z + np.random.randn(n) * 0.5  # Also caused by Z

# Unconditional correlation (X and Y)
corr_xy = np.corrcoef(X, Y)[0, 1]
print(f"Correlation X-Y (unconditional): {corr_xy:.3f}")
print("They look correlated!\n")

# Partial correlation (X and Y given Z)
# Regress X on Z, get residuals
slope_xz = np.polyfit(Z, X, 1)[0]
X_resid = X - slope_xz * Z

# Regress Y on Z, get residuals
slope_yz = np.polyfit(Z, Y, 1)[0]
Y_resid = Y - slope_yz * Z

# Correlation of residuals = partial correlation
partial_corr = np.corrcoef(X_resid, Y_resid)[0, 1]
print(f"Partial correlation X-Y | Z: {partial_corr:.3f}")
print("Almost zero! X and Y are conditionally independent given Z.")
print("\nThis tells us: X doesn't directly cause Y (and vice versa).")
print("The correlation was spurious - caused by the confounder Z.")

Summary: Your Causal Vocabulary Cheat Sheet

TermSimple Definition
DAGA causal diagram with arrows, no loops
NodeA variable in the graph (a circle)
EdgeA causal link (an arrow)
ParentDirect cause of a variable
ConfounderHidden common cause creating spurious correlation
MediatorVariable in the middle of a causal chain
ColliderVariable caused by two others
Lag (τ)Time steps between cause and effect
AutocorrelationVariable depending on its own past
(i, -τ)Tigramite notation: variable i at lag τ
-->Definite causal direction in output
Conditional independenceX and Y unrelated given Z

Quick Quiz

Q1: In notation (2, -3), what does the -3 mean?

The variable is 3 time steps in the PAST (lag of 3).

Q2: You see --> between Temperature and Quality. What does this mean?

Temperature CAUSES Quality, and we're confident about the direction.

Q3: X and Y are correlated. After conditioning on Z, they're not. What pattern is this?

Confounder pattern! Z is a common cause of both X and Y.