Probabilistic Graphical Models - Learning Module

Loading content...

0/245

D-Separation

The Algorithm for Reading Independence from Graphs

You've built a Bayesian network with 50 nodes representing a medical diagnosis system. A user asks: "Given that we know the patient's blood pressure and family history, are their cholesterol level and exercise habits independent?"

How do you answer this? You could derive the full joint distribution, condition on blood pressure and family history, and check if the joint factors. But for a 50-node network, that's computationally prohibitive and error-prone.

What you need is a graph-based criterion—a way to read the answer directly from the network structure without any numerical computation. Enter d-separation (directed separation), one of the most elegant and powerful concepts in probabilistic graphical models.

D-separation lets you:

Determine conditional independence by examining paths in the graph
Answer independence queries in polynomial time (vs. exponential for numerical computation)
Verify model assumptions by checking implied independencies
Debug models by identifying unexpected dependencies

What You Will Learn

By the end of this page, you will master d-separation: its formal definition, the algorithm for determining it, proofs of why it works, and practical techniques for applying it. You'll be able to look at any Bayesian network and immediately read off which variables are independent given any evidence.

Building Intuition: Information Flow in Directed Graphs

Before the formal definition, let's build intuition by thinking about information flow along paths.

Imagine the Bayesian network as a system of pipes, where "probabilistic influence" can flow between variables. Two variables are dependent if influence can flow between them, and independent if all paths of influence are blocked.

Key insight: We need to understand when a path is active (allows influence to flow) versus blocked (stops influence).

Recall the three fundamental patterns from the previous page:

1. Chain: A → B → C

Path is ACTIVE (influence flows A → B → C)
Conditioning on B BLOCKS the path
Intuition: B is a mediator; knowing B screens off A from C

2. Fork: A ← B → C

Path is ACTIVE (B's influence reaches both A and C)
Conditioning on B BLOCKS the path
Intuition: B is a common cause; knowing B explains the correlation

3. Collider: A → B ← C

Path is BLOCKED (A and B are independent causes)
Conditioning on B ACTIVATES the path!
Intuition: B is a common effect; knowing B creates "explaining away"

Path Activation Rules
Pattern	Node Type	Without Conditioning	Conditioning on Middle Node
A → B → C	B is in a chain	✓ ACTIVE	✗ BLOCKED
A ← B → C	B is a fork	✓ ACTIVE	✗ BLOCKED
A → B ← C	B is a collider	✗ BLOCKED	✓ ACTIVE!

The key asymmetry:

For chains and forks, conditioning BLOCKS the path. For colliders, conditioning OPENS the path.

This asymmetry is what makes d-separation non-trivial. You can't just check if conditioning nodes are "on the path"—you need to know what role each node plays.

Descendant activation (important detail):

For colliders, conditioning on a descendant of the collider also activates the path (partially). If B is a collider and D is a descendant of B:

A → B ← C
    ↓
    D

Conditioning on D also opens the path between A and C. Intuitively: learning about D gives information about B, which enables explaining away.

The Collider Subtlety

Many people initially forget that conditioning on a collider's DESCENDANT also activates the path. This is a common source of errors. Always check not just the conditioning set, but also whether any conditioned variable is a descendant of a collider on the path.

Formal Definition of D-Separation

Now let's state the formal definition with mathematical precision.

Definition (D-Separation):

Let $G$ be a directed acyclic graph (DAG) with node sets $X$, $Y$, and $Z$ (where $X$, $Y$, $Z$ are disjoint).

We say $X$ and $Y$ are d-separated by $Z$ in $G$, written:

$$X \perp_G Y \mid Z$$

if and only if every undirected path from any node in $X$ to any node in $Y$ is blocked by $Z$.

Definition (Blocked Path):

An undirected path $p$ from $X$ to $Y$ is blocked by $Z$ if there exists some node $w$ on the path such that either:

$w$ is in a chain or fork on the path, AND $w \in Z$
- Forms: $\cdot \to w \to \cdot$ or $\cdot \leftarrow w \to \cdot$ or $\cdot \to w \leftarrow \cdot$ with $w$ pointing outward
- More precisely: $w$ has one edge into it and one out, or two edges out
$w$ is a collider on the path, AND neither $w$ nor any descendant of $w$ is in $Z$
- Form: $\cdot \to w \leftarrow \cdot$ (both arrows point to $w$)

Definition (Active Path):

A path is active (or d-connected) given $Z$ if it is not blocked. This means for every node $w$ on the path:

If $w$ is a non-collider: $w \notin Z$ (not conditioned)
If $w$ is a collider: $w$ or some descendant of $w$ is in $Z$

d_separation_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
"""
D-Separation Algorithm Implementation
 
Given a DAG G and sets X, Y, Z, determine if X ⟂_G Y | Z.
"""
 
from collections import defaultdict, deque
 
class BayesianNetwork:
    def __init__(self):
        self.parents = defaultdict(set)   # child -> set of parents
        self.children = defaultdict(set)  # parent -> set of children
        self.nodes = set()
    
    def add_edge(self, parent, child):
        """Add directed edge parent -> child"""
        self.parents[child].add(parent)
        self.children[parent].add(child)
        self.nodes.add(parent)
        self.nodes.add(child)
    
    def get_ancestors(self, nodes):
        """Get all ancestors of a set of nodes (including the nodes themselves)"""
        ancestors = set(nodes)
        queue = deque(nodes)
        
        while queue:
            node = queue.popleft()
            for parent in self.parents[node]:
                if parent not in ancestors:
                    ancestors.add(parent)
                    queue.append(parent)
        
        return ancestors
    
    def is_d_separated(self, X: set, Y: set, Z: set) -> bool:
        """
        Check if X and Y are d-separated by Z.
        
        Uses the 'reachability' algorithm:
        1. Find all ancestors of Z (needed for collider activation)
        2. Do a modified BFS from X to Y, tracking path validity
        """
        # Get ancestors of Z (including Z itself) for collider checks
        Z_ancestors = self.get_ancestors(Z)
        
        # BFS with state tracking
        # State: (node, direction) where direction is 'up' or 'down'
        # 'up' means we arrived via a child (going against arrows)
        # 'down' means we arrived via a parent (going with arrows)
        
        visited = set()
        queue = deque()
        
        # Start from X, can go either direction
        for x in X:
            queue.append((x, 'up'))
            queue.append((x, 'down'))
        
        while queue:
            node, direction = queue.popleft()
            
            # Check if we reached Y
            if node in Y:
                return False  # Found active path, not d-separated
            
            if (node, direction) in visited:
                continue
            visited.add((node, direction))
            
            # Determine what moves are valid based on current direction
            
            if direction == 'up':
                # We came from a child (going up/backward in the graph)
                # Can continue to parents if node is not in Z
                if node not in Z:
                    for parent in self.parents[node]:
                        queue.append((parent, 'up'))
                
                # Can go to other children (node is fork/chain) if not in Z
                if node not in Z:
                    for child in self.children[node]:
                        queue.append((child, 'down'))
            
            else:  # direction == 'down'
                # We came from a parent (going down/forward in the graph)
                # Can continue to children if node is not in Z
                if node not in Z:
                    for child in self.children[node]:
                        queue.append((child, 'down'))
                
                # Can go to parents only if this node is a collider that's activated
                # (node or descendant in Z)
                if node in Z_ancestors:  # This means node or a descendant is in Z
                    for parent in self.parents[node]:
                        queue.append((parent, 'up'))
        
        # No path from X to Y found
        return True
 
# Example: Classic "Sprinkler" network
#
#           Cloudy
#          /      #         v        v
#    Sprinkler   Rain
#               /
#          v    v
#         WetGrass
#
bn = BayesianNetwork()
bn.add_edge('Cloudy', 'Sprinkler')
bn.add_edge('Cloudy', 'Rain')
bn.add_edge('Sprinkler', 'WetGrass')
bn.add_edge('Rain', 'WetGrass')
 
# Test various d-separation queries
queries = [
    ({'Sprinkler'}, {'Rain'}, set()),           # S ⟂ R | {} ?
    ({'Sprinkler'}, {'Rain'}, {'Cloudy'}),      # S ⟂ R | C ?
    ({'Sprinkler'}, {'Rain'}, {'WetGrass'}),    # S ⟂ R | W ? (collider!)
    ({'Cloudy'}, {'WetGrass'}, set()),          # C ⟂ W | {} ?
    ({'Cloudy'}, {'WetGrass'}, {'Sprinkler', 'Rain'}),  # C ⟂ W | {S,R} ?
]
 
print("D-Separation Queries in Sprinkler Network:")
print("=" * 55)
for X, Y, Z in queries:
    result = bn.is_d_separated(X, Y, Z)
    symbol = "⟂" if result else "↔"
    Z_str = Z if Z else "{}"
    print(f"  {X} {symbol} {Y} | {Z_str}")
    
print("\nExplanations:")
print("- Sprinkler ↔ Rain | {}: Connected via Cloudy (common cause)")
print("- Sprinkler ⟂ Rain | Cloudy: Cloudy blocks the fork")  
print("- Sprinkler ↔ Rain | WetGrass: WetGrass is a collider, conditioning opens path!")
print("- Cloudy ↔ WetGrass | {}: Connected via Sprinkler and Rain")
print("- Cloudy ⟂ WetGrass | {S,R}: Both paths are blocked")

The Reachability Algorithm

The d-separation algorithm can be implemented as a reachability search that tracks 'direction' (are we going with or against arrows?). This tracking is essential because whether a node blocks or activates depends on how we arrived at it. The algorithm runs in O(n + e) time where n is nodes and e is edges.

Soundness and Completeness of D-Separation

D-separation isn't just a heuristic—it has strong theoretical guarantees.

Theorem (Soundness):

If $X$ and $Y$ are d-separated by $Z$ in a DAG $G$, then $X \perp!!!\perp Y | Z$ in every probability distribution $P$ that factorizes according to $G$.

$$X \perp_G Y | Z \Rightarrow X \perp_P Y | Z \text{ for all } P \text{ compatible with } G$$

Proof intuition: When a path is blocked, the factorization structure ensures the conditional distribution factors. Each blocking configuration (chain/fork blocked, or collider without activation) corresponds to a mathematical factorization that implies independence.

Theorem (Completeness - Faithfulness):

If $P$ is faithful to $G$ (no conditional independencies hold beyond those implied by the graph), then:

$$X \perp_P Y | Z \Leftrightarrow X \perp_G Y | Z$$

Faithfulness means the distribution has no "accidental" independencies due to specific parameter choices.

What faithfulness means:

A distribution $P$ is faithful to $G$ if the ONLY conditional independencies that hold in $P$ are those implied by d-separation in $G$. This rules out cases where, by coincidence, parameters cancel out to create independence not implied by structure.

Example of unfaithfulness:

Consider a chain A → B → C with:

$P(B=1|A=1) = 0.5$, $P(B=1|A=0) = 0.5$ (B ignores A!)

The factorization is $P(A,B,C) = P(A)P(B|A)P(C|B)$.

By d-separation, A and C are NOT d-separated (there's an active path A→B→C).

But because $P(B|A) = P(B)$ (by our parameter choice!), we have $A \perp!!!\perp B$, which implies $A \perp!!!\perp C$.

This independence holds due to specific parameters, not structure—the distribution is unfaithful to the graph.

Why faithfulness is usually assumed:

Unfaithful distributions are "measure zero" in parameter space (probability zero under prior)
Learning algorithms typically find faithful representations
Real-world data rarely exhibits exact cancellations
It's a minimal assumption that makes inference tractable

D-Separation Guarantees
Property	Statement	Conditions
Soundness	d-separation ⇒ conditional independence	Always holds (any distribution factorizing per G)
Completeness	conditional independence ⇒ d-separation	Requires faithfulness assumption
Efficiency	D-sep query in O(n + e) time	Polynomial in graph size

The Power of Soundness

Even without assuming faithfulness, soundness is incredibly powerful. If d-separation says X ⟂ Y | Z, you KNOW it holds for any parameters. This guarantee enables you to verify model assumptions, simplify inference, and reason about model behavior without knowing specific probabilities.

Worked Examples: Mastering D-Separation

Let's work through several examples to build mastery. For each, we'll identify all paths and check each one.

Example 1: Medical Diagnosis Network

       Smoking
         ↓
       Cancer ←── Genetics
       /    \
      ↓      ↓
   Cough   Fatigue

Query: Is Smoking ⟂ Genetics? (no conditioning)

Paths from Smoking to Genetics:

Smoking → Cancer ← Genetics

Analysis:

Cancer is a COLLIDER (both arrows point in)
No conditioning, so collider is NOT activated
Path is BLOCKED

Result: Smoking ⟂ Genetics ✓

Query: Is Smoking ⟂ Genetics | Cancer?

Same path: Smoking → Cancer ← Genetics

Analysis:

Cancer is a COLLIDER
We're conditioning on Cancer
Collider is ACTIVATED
Path is ACTIVE

Result: Smoking ↔ Genetics | Cancer (DEPENDENT!)

This is explaining away: if we know someone has cancer, and they're not a smoker, genetic factors become more likely.

Example 2: Extended Network

    A → B → C → D
        ↓
        E

Query: Is A ⟂ D | {B}?

Paths from A to D:

A → B → C → D

Analysis:

B is a chain node (arrows: in from A, out to C and E)
B is in conditioning set
Path is BLOCKED at B

Result: A ⟂ D | B ✓

Query: Is A ⟂ E | {C}?

Paths from A to E:

A → B → E
A → B → C (then we need to get to E...)
- There's no path from C to E!

Actually, path is just: A → B → E

Analysis:

B is a chain node on this path
B is NOT in conditioning set {C}
Path is ACTIVE

Result: A ↔ E | C (DEPENDENT)

Query: Is A ⟂ E | {B, C}?

Path: A → B → E

Analysis:

B is in conditioning set
Path is BLOCKED at B

Result: A ⟂ E | {B, C} ✓

d_separation_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
"""
Verify our worked examples programmatically.
"""
 
from collections import defaultdict, deque
 
class DAG:
    def __init__(self):
        self.parents = defaultdict(set)
        self.children = defaultdict(set)
        self.nodes = set()
    
    def add_edge(self, parent, child):
        self.parents[child].add(parent)
        self.children[parent].add(child)
        self.nodes.update([parent, child])
    
    def get_ancestors(self, nodes):
        ancestors = set(nodes)
        queue = deque(nodes)
        while queue:
            node = queue.popleft()
            for parent in self.parents[node]:
                if parent not in ancestors:
                    ancestors.add(parent)
                    queue.append(parent)
        return ancestors
    
    def is_d_separated(self, X: set, Y: set, Z: set) -> bool:
        Z_ancestors = self.get_ancestors(Z)
        visited = set()
        queue = deque()
        
        for x in X:
            queue.append((x, 'up'))
            queue.append((x, 'down'))
        
        while queue:
            node, direction = queue.popleft()
            if node in Y:
                return False
            if (node, direction) in visited:
                continue
            visited.add((node, direction))
            
            if direction == 'up':
                if node not in Z:
                    for parent in self.parents[node]:
                        queue.append((parent, 'up'))
                    for child in self.children[node]:
                        queue.append((child, 'down'))
            else:
                if node not in Z:
                    for child in self.children[node]:
                        queue.append((child, 'down'))
                if node in Z_ancestors:
                    for parent in self.parents[node]:
                        queue.append((parent, 'up'))
        return True
 
# Example 1: Medical Diagnosis
print("Example 1: Medical Diagnosis Network")
print("=" * 50)
g1 = DAG()
g1.add_edge('Smoking', 'Cancer')
g1.add_edge('Genetics', 'Cancer')
g1.add_edge('Cancer', 'Cough')
g1.add_edge('Cancer', 'Fatigue')
 
queries1 = [
    ({'Smoking'}, {'Genetics'}, set(), "Explaining away setup"),
    ({'Smoking'}, {'Genetics'}, {'Cancer'}, "Conditioning on collider"),
    ({'Cough'}, {'Fatigue'}, set(), "Siblings (common parent)"),
    ({'Cough'}, {'Fatigue'}, {'Cancer'}, "Block common parent"),
    ({'Smoking'}, {'Cough'}, set(), "Ancestor to descendant"),
]
 
for X, Y, Z, desc in queries1:
    result = g1.is_d_separated(X, Y, Z)
    symbol = "⟂" if result else "↔"
    Z_str = Z if Z else "{}"
    print(f"  {list(X)[0]} {symbol} {list(Y)[0]} | {Z_str}")
    print(f"    ({desc})")
 
# Example 2: Extended Chain
print("\nExample 2: Extended Chain with Branch")
print("=" * 50)
g2 = DAG()
g2.add_edge('A', 'B')
g2.add_edge('B', 'C')
g2.add_edge('C', 'D')
g2.add_edge('B', 'E')
 
queries2 = [
    ({'A'}, {'D'}, set(), "Full chain, no conditioning"),
    ({'A'}, {'D'}, {'B'}, "Block at B"),
    ({'A'}, {'D'}, {'C'}, "Block at C"),
    ({'A'}, {'E'}, {'C'}, "C doesn't block A-B-E path"),
    ({'A'}, {'E'}, {'B'}, "B blocks everything from A"),
    ({'D'}, {'E'}, set(), "Share ancestor B"),
    ({'D'}, {'E'}, {'B'}, "Block common ancestor"),
    ({'D'}, {'E'}, {'C'}, "Partial blocking"),
]
 
for X, Y, Z, desc in queries2:
    result = g2.is_d_separated(X, Y, Z)
    symbol = "⟂" if result else "↔"
    Z_str = Z if Z else "{}"
    print(f"  {list(X)[0]} {symbol} {list(Y)[0]} | {Z_str}")
    print(f"    ({desc})")

Practice Strategy

To master d-separation: (1) List ALL paths between the query nodes, (2) For each path, identify every node's type (chain/fork/collider), (3) Check if each path is blocked by the conditioning set, (4) X ⟂ Y | Z iff ALL paths are blocked. One active path means dependence!

D-Separation vs Simple Graph Separation

It's instructive to compare d-separation in directed graphs with the simpler separation criterion for undirected graphs.

Undirected Graph Separation:

In an undirected graph, $X$ and $Y$ are separated by $Z$ if removing $Z$ disconnects $X$ from $Y$. Every node on a path blocks if conditioned.

D-Separation (Directed Graphs):

D-separation is more complex because edge directions create asymmetry:

Chains and forks: conditioning blocks (like undirected)
Colliders: conditioning OPENS (opposite of undirected!)

Why the difference?

The asymmetry comes from the causal/generative interpretation of directed graphs. Consider A → C ← B:

Generatively: A and B are sampled independently, then C is generated based on both
Without knowing C: A and B provide no information about each other
Knowing C: If C has a surprising value, and A didn't cause it, B probably did

This "explaining away" has no analog in undirected models.

Moralization:

To convert a Bayesian network to a Markov Random Field, we use moralization:

For each node, connect all its parents ("marry" them)
Drop edge directions

The moralized graph may have MORE edges, representing that conditioning on common children creates dependencies. The moral graph is an I-map for the BN, but may lose some independence information.

Undirected Separation

•Remove Z, check connectivity
•All nodes block the same way
•Conditioning only blocks, never opens
•Simpler to reason about
•Used for Markov Random Fields

D-Separation (Directed)

•Check each path, considering node types
•Chains/forks vs colliders behave differently
•Conditioning can open collider paths
•More complex but more expressive
•Used for Bayesian Networks

The V-Structure/Immorality

A V-structure (also called an 'immorality') is a collider A → C ← B where A and B are NOT connected. This is called an immorality because when we moralize, we must 'marry' the parents—adding an edge that wasn't there. V-structures are what distinguish Bayesian networks from moralized MRFs and are key to structure learning.

Applications of D-Separation

D-separation isn't just a theoretical concept—it has practical applications throughout probabilistic AI.

Application 1: Inference Optimization

Before computing $P(Query | Evidence)$, check d-separation:

If Query ⟂ Evidence, the evidence is irrelevant
Irrelevant variables can be ignored, reducing computation

Example: In a 1000-node medical network, if you want $P(Disease | TestResult)$, d-separation might show that 950 nodes are conditionally independent of both—you only need to reason about 50.

Application 2: Model Validation

D-separation implies testable predictions:

List all independencies implied by your model
Test them against data
Violations suggest model misspecification

Application 3: Feature Selection

For predicting target $Y$ from features $X_1, \ldots, X_n$:

Find the Markov blanket of $Y$ (covered next page)
Features outside the blanket are d-separated from $Y$ given blanket features
Those features are provably irrelevant for prediction

Application 4: Causal Inference

D-separation is crucial for identifying causal effects:

Backdoor criterion: To estimate the causal effect of $X$ on $Y$:

Find all "backdoor paths" from $X$ to $Y$ (paths with an arrow INTO $X$)
Find a set $Z$ that d-separates $X$ from $Y$ along these backdoor paths
Condition on $Z$ to block confounding

This is the foundation of causal identification in observational studies.

Application 5: Structure Learning

Constraint-based structure learning (PC algorithm):

Test pairwise conditional independencies
Remove edges where independence holds
Orient edges based on V-structure detection using d-separation rules

D-separation connects the statistical tests to graph structure.

Application 6: Debugging Probabilistic Programs

When a probabilistic program gives unexpected results:

Check d-separation between variables of interest
Unexpected dependencies often indicate modeling errors
Missing edges or wrong edge directions show up as d-separation violations

d_separation_applications.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
"""
Application: Finding irrelevant variables for inference.
 
Given a query P(Target | Evidence), find variables that are
d-separated from Target given Evidence—they can be ignored.
"""
 
def find_relevant_nodes(dag, target, evidence):
    """
    Find nodes that are NOT d-separated from target given evidence.
    These are the only nodes needed for inference.
    """
    relevant = set()
    
    # Check each node
    for node in dag.nodes:
        if node == target or node in evidence:
            relevant.add(node)
            continue
        
        # Is this node d-separated from target given evidence?
        if not dag.is_d_separated({node}, {target}, evidence):
            relevant.add(node)
    
    return relevant
 
# Example: Large medical network (simplified)
# In practice, this could have hundreds of nodes
 
from collections import defaultdict, deque
 
class DAG:
    def __init__(self):
        self.parents = defaultdict(set)
        self.children = defaultdict(set)
        self.nodes = set()
    
    def add_edge(self, parent, child):
        self.parents[child].add(parent)
        self.children[parent].add(child)
        self.nodes.update([parent, child])
    
    def get_ancestors(self, nodes):
        ancestors = set(nodes)
        queue = deque(nodes)
        while queue:
            node = queue.popleft()
            for parent in self.parents[node]:
                if parent not in ancestors:
                    ancestors.add(parent)
                    queue.append(parent)
        return ancestors
    
    def is_d_separated(self, X, Y, Z):
        Z_ancestors = self.get_ancestors(Z)
        visited = set()
        queue = deque([(x, d) for x in X for d in ['up', 'down']])
        
        while queue:
            node, direction = queue.popleft()
            if node in Y:
                return False
            if (node, direction) in visited:
                continue
            visited.add((node, direction))
            
            if direction == 'up':
                if node not in Z:
                    for p in self.parents[node]:
                        queue.append((p, 'up'))
                    for c in self.children[node]:
                        queue.append((c, 'down'))
            else:
                if node not in Z:
                    for c in self.children[node]:
                        queue.append((c, 'down'))
                if node in Z_ancestors:
                    for p in self.parents[node]:
                        queue.append((p, 'up'))
        return True
 
# Build a medical diagnosis network
med = DAG()
# Lifestyle factors
med.add_edge('Smoking', 'LungDisease')
med.add_edge('Smoking', 'HeartDisease')
med.add_edge('Exercise', 'HeartDisease')
med.add_edge('Diet', 'HeartDisease')
med.add_edge('Diet', 'Diabetes')
# Genetic factors
med.add_edge('GeneticRisk', 'HeartDisease')
med.add_edge('GeneticRisk', 'Diabetes')
# Diseases cause symptoms
med.add_edge('LungDisease', 'Cough')
med.add_edge('LungDisease', 'Fatigue')
med.add_edge('HeartDisease', 'ChestPain')
med.add_edge('HeartDisease', 'Fatigue')
med.add_edge('Diabetes', 'Fatigue')
med.add_edge('Diabetes', 'Thirst')
 
print("Medical Diagnosis Network")
print(f"Total nodes: {len(med.nodes)}")
print("-" * 50)
 
# Query: P(HeartDisease | ChestPain, Fatigue)
target = 'HeartDisease'
evidence = {'ChestPain', 'Fatigue'}
 
relevant = find_relevant_nodes(med, target, evidence)
irrelevant = med.nodes - relevant
 
print(f"\nQuery: P({target} | {evidence})")
print(f"Relevant nodes ({len(relevant)}): {relevant}")
print(f"Irrelevant nodes ({len(irrelevant)}): {irrelevant}")
print(f"\nInference can ignore {len(irrelevant)} of {len(med.nodes)} nodes!")

Common Pitfalls and Edge Cases

Even experienced practitioners make mistakes with d-separation. Here are the most common pitfalls:

Pitfall 1: Forgetting Descendant Activation

The collider rule includes descendants:

A → C ← B
    ↓
    D

❌ Wrong: "A ⟂ B | D because D is not on the path"

✓ Right: "A ↔ B | D because D is a descendant of collider C, which activates C"

Pitfall 2: Confusing Undirected Thinking

❌ Wrong: "Conditioning always blocks paths"

✓ Right: "Conditioning blocks chains and forks, but OPENS colliders"

Pitfall 3: Missing Paths

Remember to check ALL paths, not just the "obvious" ones:

A → B → C
↓       ↑
D ──→── E

Paths from A to C:

A → B → C
A → D → E → C

Both must be blocked for A ⟂ C.

Pitfall 4: Thinking All Colliders Must Be Conditioned

❌ Wrong: "Path A → C ← B is blocked unless we condition on C"

✓ Right: "Path A → C ← B is blocked BY DEFAULT. Conditioning on C (or descendants) OPENS it."

Colliders start blocked; conditioning activates them.

Pitfall 5: Overlooking Multiple Roles

A single node can play different roles on different paths:

A → B → C
↑       ↓
└── D ←─┘

For the query A ⟂ C:

Path A → B → C: B is a chain node
Path A ← D ← C: D is a chain node, but with different orientation

Check each path independently!

Pitfall 6: Conditioning on Everything

Conditioning on too many variables can open colliders and CREATE dependencies:

Before conditioning: A ⟂ B (collider between them blocks)
After conditioning on collider: A ↔ B

More evidence isn't always better for independence!

The Collider Mantra

Repeat until internalized: 'Colliders START blocked. Conditioning OPENS them.' This is opposite to chains and forks, where paths START open and conditioning BLOCKS them. This asymmetry is the source of most d-separation errors.

Summary and Key Takeaways

Let's consolidate the essential knowledge about d-separation:

Key Takeaways

•D-separation is a graph-based criterion: X ⟂ Y | Z iff all undirected paths between X and Y are blocked by Z.
•Blocking rules: Chains and forks are blocked when the middle node is conditioned; colliders are blocked UNLESS conditioned (or their descendants are).
•Soundness: D-separation guarantees conditional independence for ANY distribution that factorizes according to the graph.
•Completeness: Under faithfulness, d-separation captures ALL conditional independencies—no more, no less.
•Algorithm: D-separation can be checked in O(n + e) time using a reachability algorithm that tracks direction.
•Applications: Inference optimization, model validation, feature selection, causal identification, structure learning.
•Key pitfall: Don't forget that conditioning on collider descendants also activates paths!

What's next:

D-separation lets us query specific conditional independencies. But there's an even more powerful concept: the Markov blanket—the minimal set of variables that renders a node completely independent of all others. Understanding Markov blankets is essential for feature selection, local computation, and understanding what information each variable truly needs.

Page Complete

You now have a complete understanding of d-separation. You can formally define it, apply the algorithm, understand its theoretical guarantees, and use it for practical applications. Most importantly, you can avoid the common pitfalls that trip up even experienced practitioners. Next, we'll explore Markov blankets—the final piece of the probabilistic graphical models foundation.

D-Separation

The Algorithm for Reading Independence from Graphs

D-separation lets you:

Determine conditional independence by examining paths in the graph
Answer independence queries in polynomial time (vs. exponential for numerical computation)
Verify model assumptions by checking implied independencies
Debug models by identifying unexpected dependencies

What You Will Learn

Building Intuition: Information Flow in Directed Graphs

Before the formal definition, let's build intuition by thinking about information flow along paths.

Key insight: We need to understand when a path is active (allows influence to flow) versus blocked (stops influence).

Recall the three fundamental patterns from the previous page:

1. Chain: A → B → C

Path is ACTIVE (influence flows A → B → C)
Conditioning on B BLOCKS the path
Intuition: B is a mediator; knowing B screens off A from C

2. Fork: A ← B → C

Path is ACTIVE (B's influence reaches both A and C)
Conditioning on B BLOCKS the path
Intuition: B is a common cause; knowing B explains the correlation

3. Collider: A → B ← C

Path is BLOCKED (A and B are independent causes)
Conditioning on B ACTIVATES the path!
Intuition: B is a common effect; knowing B creates "explaining away"

Path Activation Rules
Pattern	Node Type	Without Conditioning	Conditioning on Middle Node
A → B → C	B is in a chain	✓ ACTIVE	✗ BLOCKED
A ← B → C	B is a fork	✓ ACTIVE	✗ BLOCKED
A → B ← C	B is a collider	✗ BLOCKED	✓ ACTIVE!

The key asymmetry:

For chains and forks, conditioning BLOCKS the path. For colliders, conditioning OPENS the path.

This asymmetry is what makes d-separation non-trivial. You can't just check if conditioning nodes are "on the path"—you need to know what role each node plays.

Descendant activation (important detail):

For colliders, conditioning on a descendant of the collider also activates the path (partially). If B is a collider and D is a descendant of B:

A → B ← C
    ↓
    D

Conditioning on D also opens the path between A and C. Intuitively: learning about D gives information about B, which enables explaining away.

The Collider Subtlety

Formal Definition of D-Separation

Now let's state the formal definition with mathematical precision.

Definition (D-Separation):

Let $G$ be a directed acyclic graph (DAG) with node sets $X$, $Y$, and $Z$ (where $X$, $Y$, $Z$ are disjoint).

We say $X$ and $Y$ are d-separated by $Z$ in $G$, written:

$$X \perp_G Y \mid Z$$

if and only if every undirected path from any node in $X$ to any node in $Y$ is blocked by $Z$.

Definition (Blocked Path):

An undirected path $p$ from $X$ to $Y$ is blocked by $Z$ if there exists some node $w$ on the path such that either:

$w$ is in a chain or fork on the path, AND $w \in Z$
- Forms: $\cdot \to w \to \cdot$ or $\cdot \leftarrow w \to \cdot$ or $\cdot \to w \leftarrow \cdot$ with $w$ pointing outward
- More precisely: $w$ has one edge into it and one out, or two edges out
$w$ is a collider on the path, AND neither $w$ nor any descendant of $w$ is in $Z$
- Form: $\cdot \to w \leftarrow \cdot$ (both arrows point to $w$)

Definition (Active Path):

A path is active (or d-connected) given $Z$ if it is not blocked. This means for every node $w$ on the path:

If $w$ is a non-collider: $w \notin Z$ (not conditioned)
If $w$ is a collider: $w$ or some descendant of $w$ is in $Z$

d_separation_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
"""
D-Separation Algorithm Implementation
 
Given a DAG G and sets X, Y, Z, determine if X ⟂_G Y | Z.
"""
 
from collections import defaultdict, deque
 
class BayesianNetwork:
    def __init__(self):
        self.parents = defaultdict(set)   # child -> set of parents
        self.children = defaultdict(set)  # parent -> set of children
        self.nodes = set()
    
    def add_edge(self, parent, child):
        """Add directed edge parent -> child"""
        self.parents[child].add(parent)
        self.children[parent].add(child)
        self.nodes.add(parent)
        self.nodes.add(child)
    
    def get_ancestors(self, nodes):
        """Get all ancestors of a set of nodes (including the nodes themselves)"""
        ancestors = set(nodes)
        queue = deque(nodes)
        
        while queue:
            node = queue.popleft()
            for parent in self.parents[node]:
                if parent not in ancestors:
                    ancestors.add(parent)
                    queue.append(parent)
        
        return ancestors
    
    def is_d_separated(self, X: set, Y: set, Z: set) -> bool:
        """
        Check if X and Y are d-separated by Z.
        
        Uses the 'reachability' algorithm:
        1. Find all ancestors of Z (needed for collider activation)
        2. Do a modified BFS from X to Y, tracking path validity
        """
        # Get ancestors of Z (including Z itself) for collider checks
        Z_ancestors = self.get_ancestors(Z)
        
        # BFS with state tracking
        # State: (node, direction) where direction is 'up' or 'down'
        # 'up' means we arrived via a child (going against arrows)
        # 'down' means we arrived via a parent (going with arrows)
        
        visited = set()
        queue = deque()
        
        # Start from X, can go either direction
        for x in X:
            queue.append((x, 'up'))
            queue.append((x, 'down'))
        
        while queue:
            node, direction = queue.popleft()
            
            # Check if we reached Y
            if node in Y:
                return False  # Found active path, not d-separated
            
            if (node, direction) in visited:
                continue
            visited.add((node, direction))
            
            # Determine what moves are valid based on current direction
            
            if direction == 'up':
                # We came from a child (going up/backward in the graph)
                # Can continue to parents if node is not in Z
                if node not in Z:
                    for parent in self.parents[node]:
                        queue.append((parent, 'up'))
                
                # Can go to other children (node is fork/chain) if not in Z
                if node not in Z:
                    for child in self.children[node]:
                        queue.append((child, 'down'))
            
            else:  # direction == 'down'
                # We came from a parent (going down/forward in the graph)
                # Can continue to children if node is not in Z
                if node not in Z:
                    for child in self.children[node]:
                        queue.append((child, 'down'))
                
                # Can go to parents only if this node is a collider that's activated
                # (node or descendant in Z)
                if node in Z_ancestors:  # This means node or a descendant is in Z
                    for parent in self.parents[node]:
                        queue.append((parent, 'up'))
        
        # No path from X to Y found
        return True
 
# Example: Classic "Sprinkler" network
#
#           Cloudy
#          /      #         v        v
#    Sprinkler   Rain
#               /
#          v    v
#         WetGrass
#
bn = BayesianNetwork()
bn.add_edge('Cloudy', 'Sprinkler')
bn.add_edge('Cloudy', 'Rain')
bn.add_edge('Sprinkler', 'WetGrass')
bn.add_edge('Rain', 'WetGrass')
 
# Test various d-separation queries
queries = [
    ({'Sprinkler'}, {'Rain'}, set()),           # S ⟂ R | {} ?
    ({'Sprinkler'}, {'Rain'}, {'Cloudy'}),      # S ⟂ R | C ?
    ({'Sprinkler'}, {'Rain'}, {'WetGrass'}),    # S ⟂ R | W ? (collider!)
    ({'Cloudy'}, {'WetGrass'}, set()),          # C ⟂ W | {} ?
    ({'Cloudy'}, {'WetGrass'}, {'Sprinkler', 'Rain'}),  # C ⟂ W | {S,R} ?
]
 
print("D-Separation Queries in Sprinkler Network:")
print("=" * 55)
for X, Y, Z in queries:
    result = bn.is_d_separated(X, Y, Z)
    symbol = "⟂" if result else "↔"
    Z_str = Z if Z else "{}"
    print(f"  {X} {symbol} {Y} | {Z_str}")
    
print("\nExplanations:")
print("- Sprinkler ↔ Rain | {}: Connected via Cloudy (common cause)")
print("- Sprinkler ⟂ Rain | Cloudy: Cloudy blocks the fork")  
print("- Sprinkler ↔ Rain | WetGrass: WetGrass is a collider, conditioning opens path!")
print("- Cloudy ↔ WetGrass | {}: Connected via Sprinkler and Rain")
print("- Cloudy ⟂ WetGrass | {S,R}: Both paths are blocked")

The Reachability Algorithm

Soundness and Completeness of D-Separation

D-separation isn't just a heuristic—it has strong theoretical guarantees.

Theorem (Soundness):

If $X$ and $Y$ are d-separated by $Z$ in a DAG $G$, then $X \perp!!!\perp Y | Z$ in every probability distribution $P$ that factorizes according to $G$.

$$X \perp_G Y | Z \Rightarrow X \perp_P Y | Z \text{ for all } P \text{ compatible with } G$$

Theorem (Completeness - Faithfulness):

If $P$ is faithful to $G$ (no conditional independencies hold beyond those implied by the graph), then:

$$X \perp_P Y | Z \Leftrightarrow X \perp_G Y | Z$$

Faithfulness means the distribution has no "accidental" independencies due to specific parameter choices.

What faithfulness means:

Example of unfaithfulness:

Consider a chain A → B → C with:

$P(B=1|A=1) = 0.5$, $P(B=1|A=0) = 0.5$ (B ignores A!)

The factorization is $P(A,B,C) = P(A)P(B|A)P(C|B)$.

By d-separation, A and C are NOT d-separated (there's an active path A→B→C).

But because $P(B|A) = P(B)$ (by our parameter choice!), we have $A \perp!!!\perp B$, which implies $A \perp!!!\perp C$.

This independence holds due to specific parameters, not structure—the distribution is unfaithful to the graph.

Why faithfulness is usually assumed:

Unfaithful distributions are "measure zero" in parameter space (probability zero under prior)
Learning algorithms typically find faithful representations
Real-world data rarely exhibits exact cancellations
It's a minimal assumption that makes inference tractable

D-Separation Guarantees
Property	Statement	Conditions
Soundness	d-separation ⇒ conditional independence	Always holds (any distribution factorizing per G)
Completeness	conditional independence ⇒ d-separation	Requires faithfulness assumption
Efficiency	D-sep query in O(n + e) time	Polynomial in graph size

The Power of Soundness

Worked Examples: Mastering D-Separation

Let's work through several examples to build mastery. For each, we'll identify all paths and check each one.

Example 1: Medical Diagnosis Network

       Smoking
         ↓
       Cancer ←── Genetics
       /    \
      ↓      ↓
   Cough   Fatigue

Query: Is Smoking ⟂ Genetics? (no conditioning)

Paths from Smoking to Genetics:

Smoking → Cancer ← Genetics

Analysis:

Cancer is a COLLIDER (both arrows point in)
No conditioning, so collider is NOT activated
Path is BLOCKED

Result: Smoking ⟂ Genetics ✓

Query: Is Smoking ⟂ Genetics | Cancer?

Same path: Smoking → Cancer ← Genetics

Analysis:

Cancer is a COLLIDER
We're conditioning on Cancer
Collider is ACTIVATED
Path is ACTIVE

Result: Smoking ↔ Genetics | Cancer (DEPENDENT!)

This is explaining away: if we know someone has cancer, and they're not a smoker, genetic factors become more likely.

Example 2: Extended Network

    A → B → C → D
        ↓
        E

Query: Is A ⟂ D | {B}?

Paths from A to D:

A → B → C → D

Analysis:

B is a chain node (arrows: in from A, out to C and E)
B is in conditioning set
Path is BLOCKED at B

Result: A ⟂ D | B ✓

Query: Is A ⟂ E | {C}?

Paths from A to E:

A → B → E
A → B → C (then we need to get to E...)
- There's no path from C to E!

Actually, path is just: A → B → E

Analysis:

B is a chain node on this path
B is NOT in conditioning set {C}
Path is ACTIVE

Result: A ↔ E | C (DEPENDENT)

Query: Is A ⟂ E | {B, C}?

Path: A → B → E

Analysis:

B is in conditioning set
Path is BLOCKED at B

Result: A ⟂ E | {B, C} ✓

d_separation_examples.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
"""
Verify our worked examples programmatically.
"""
 
from collections import defaultdict, deque
 
class DAG:
    def __init__(self):
        self.parents = defaultdict(set)
        self.children = defaultdict(set)
        self.nodes = set()
    
    def add_edge(self, parent, child):
        self.parents[child].add(parent)
        self.children[parent].add(child)
        self.nodes.update([parent, child])
    
    def get_ancestors(self, nodes):
        ancestors = set(nodes)
        queue = deque(nodes)
        while queue:
            node = queue.popleft()
            for parent in self.parents[node]:
                if parent not in ancestors:
                    ancestors.add(parent)
                    queue.append(parent)
        return ancestors
    
    def is_d_separated(self, X: set, Y: set, Z: set) -> bool:
        Z_ancestors = self.get_ancestors(Z)
        visited = set()
        queue = deque()
        
        for x in X:
            queue.append((x, 'up'))
            queue.append((x, 'down'))
        
        while queue:
            node, direction = queue.popleft()
            if node in Y:
                return False
            if (node, direction) in visited:
                continue
            visited.add((node, direction))
            
            if direction == 'up':
                if node not in Z:
                    for parent in self.parents[node]:
                        queue.append((parent, 'up'))
                    for child in self.children[node]:
                        queue.append((child, 'down'))
            else:
                if node not in Z:
                    for child in self.children[node]:
                        queue.append((child, 'down'))
                if node in Z_ancestors:
                    for parent in self.parents[node]:
                        queue.append((parent, 'up'))
        return True
 
# Example 1: Medical Diagnosis
print("Example 1: Medical Diagnosis Network")
print("=" * 50)
g1 = DAG()
g1.add_edge('Smoking', 'Cancer')
g1.add_edge('Genetics', 'Cancer')
g1.add_edge('Cancer', 'Cough')
g1.add_edge('Cancer', 'Fatigue')
 
queries1 = [
    ({'Smoking'}, {'Genetics'}, set(), "Explaining away setup"),
    ({'Smoking'}, {'Genetics'}, {'Cancer'}, "Conditioning on collider"),
    ({'Cough'}, {'Fatigue'}, set(), "Siblings (common parent)"),
    ({'Cough'}, {'Fatigue'}, {'Cancer'}, "Block common parent"),
    ({'Smoking'}, {'Cough'}, set(), "Ancestor to descendant"),
]
 
for X, Y, Z, desc in queries1:
    result = g1.is_d_separated(X, Y, Z)
    symbol = "⟂" if result else "↔"
    Z_str = Z if Z else "{}"
    print(f"  {list(X)[0]} {symbol} {list(Y)[0]} | {Z_str}")
    print(f"    ({desc})")
 
# Example 2: Extended Chain
print("\nExample 2: Extended Chain with Branch")
print("=" * 50)
g2 = DAG()
g2.add_edge('A', 'B')
g2.add_edge('B', 'C')
g2.add_edge('C', 'D')
g2.add_edge('B', 'E')
 
queries2 = [
    ({'A'}, {'D'}, set(), "Full chain, no conditioning"),
    ({'A'}, {'D'}, {'B'}, "Block at B"),
    ({'A'}, {'D'}, {'C'}, "Block at C"),
    ({'A'}, {'E'}, {'C'}, "C doesn't block A-B-E path"),
    ({'A'}, {'E'}, {'B'}, "B blocks everything from A"),
    ({'D'}, {'E'}, set(), "Share ancestor B"),
    ({'D'}, {'E'}, {'B'}, "Block common ancestor"),
    ({'D'}, {'E'}, {'C'}, "Partial blocking"),
]
 
for X, Y, Z, desc in queries2:
    result = g2.is_d_separated(X, Y, Z)
    symbol = "⟂" if result else "↔"
    Z_str = Z if Z else "{}"
    print(f"  {list(X)[0]} {symbol} {list(Y)[0]} | {Z_str}")
    print(f"    ({desc})")

Practice Strategy

D-Separation vs Simple Graph Separation

It's instructive to compare d-separation in directed graphs with the simpler separation criterion for undirected graphs.

Undirected Graph Separation:

In an undirected graph, $X$ and $Y$ are separated by $Z$ if removing $Z$ disconnects $X$ from $Y$. Every node on a path blocks if conditioned.

D-Separation (Directed Graphs):

D-separation is more complex because edge directions create asymmetry:

Chains and forks: conditioning blocks (like undirected)
Colliders: conditioning OPENS (opposite of undirected!)

Why the difference?

The asymmetry comes from the causal/generative interpretation of directed graphs. Consider A → C ← B:

Generatively: A and B are sampled independently, then C is generated based on both
Without knowing C: A and B provide no information about each other
Knowing C: If C has a surprising value, and A didn't cause it, B probably did

This "explaining away" has no analog in undirected models.

Moralization:

To convert a Bayesian network to a Markov Random Field, we use moralization:

For each node, connect all its parents ("marry" them)
Drop edge directions

The moralized graph may have MORE edges, representing that conditioning on common children creates dependencies. The moral graph is an I-map for the BN, but may lose some independence information.

Undirected Separation

•Remove Z, check connectivity
•All nodes block the same way
•Conditioning only blocks, never opens
•Simpler to reason about
•Used for Markov Random Fields

D-Separation (Directed)

•Check each path, considering node types
•Chains/forks vs colliders behave differently
•Conditioning can open collider paths
•More complex but more expressive
•Used for Bayesian Networks

The V-Structure/Immorality

Applications of D-Separation

D-separation isn't just a theoretical concept—it has practical applications throughout probabilistic AI.

Application 1: Inference Optimization

Before computing $P(Query | Evidence)$, check d-separation:

If Query ⟂ Evidence, the evidence is irrelevant
Irrelevant variables can be ignored, reducing computation

Example: In a 1000-node medical network, if you want $P(Disease | TestResult)$, d-separation might show that 950 nodes are conditionally independent of both—you only need to reason about 50.

Application 2: Model Validation

D-separation implies testable predictions:

List all independencies implied by your model
Test them against data
Violations suggest model misspecification

Application 3: Feature Selection

For predicting target $Y$ from features $X_1, \ldots, X_n$:

Find the Markov blanket of $Y$ (covered next page)
Features outside the blanket are d-separated from $Y$ given blanket features
Those features are provably irrelevant for prediction

Application 4: Causal Inference

D-separation is crucial for identifying causal effects:

Backdoor criterion: To estimate the causal effect of $X$ on $Y$:

Find all "backdoor paths" from $X$ to $Y$ (paths with an arrow INTO $X$)
Find a set $Z$ that d-separates $X$ from $Y$ along these backdoor paths
Condition on $Z$ to block confounding

This is the foundation of causal identification in observational studies.

Application 5: Structure Learning

Constraint-based structure learning (PC algorithm):

Test pairwise conditional independencies
Remove edges where independence holds
Orient edges based on V-structure detection using d-separation rules

D-separation connects the statistical tests to graph structure.

Application 6: Debugging Probabilistic Programs

When a probabilistic program gives unexpected results:

Check d-separation between variables of interest
Unexpected dependencies often indicate modeling errors
Missing edges or wrong edge directions show up as d-separation violations

d_separation_applications.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
"""
Application: Finding irrelevant variables for inference.
 
Given a query P(Target | Evidence), find variables that are
d-separated from Target given Evidence—they can be ignored.
"""
 
def find_relevant_nodes(dag, target, evidence):
    """
    Find nodes that are NOT d-separated from target given evidence.
    These are the only nodes needed for inference.
    """
    relevant = set()
    
    # Check each node
    for node in dag.nodes:
        if node == target or node in evidence:
            relevant.add(node)
            continue
        
        # Is this node d-separated from target given evidence?
        if not dag.is_d_separated({node}, {target}, evidence):
            relevant.add(node)
    
    return relevant
 
# Example: Large medical network (simplified)
# In practice, this could have hundreds of nodes
 
from collections import defaultdict, deque
 
class DAG:
    def __init__(self):
        self.parents = defaultdict(set)
        self.children = defaultdict(set)
        self.nodes = set()
    
    def add_edge(self, parent, child):
        self.parents[child].add(parent)
        self.children[parent].add(child)
        self.nodes.update([parent, child])
    
    def get_ancestors(self, nodes):
        ancestors = set(nodes)
        queue = deque(nodes)
        while queue:
            node = queue.popleft()
            for parent in self.parents[node]:
                if parent not in ancestors:
                    ancestors.add(parent)
                    queue.append(parent)
        return ancestors
    
    def is_d_separated(self, X, Y, Z):
        Z_ancestors = self.get_ancestors(Z)
        visited = set()
        queue = deque([(x, d) for x in X for d in ['up', 'down']])
        
        while queue:
            node, direction = queue.popleft()
            if node in Y:
                return False
            if (node, direction) in visited:
                continue
            visited.add((node, direction))
            
            if direction == 'up':
                if node not in Z:
                    for p in self.parents[node]:
                        queue.append((p, 'up'))
                    for c in self.children[node]:
                        queue.append((c, 'down'))
            else:
                if node not in Z:
                    for c in self.children[node]:
                        queue.append((c, 'down'))
                if node in Z_ancestors:
                    for p in self.parents[node]:
                        queue.append((p, 'up'))
        return True
 
# Build a medical diagnosis network
med = DAG()
# Lifestyle factors
med.add_edge('Smoking', 'LungDisease')
med.add_edge('Smoking', 'HeartDisease')
med.add_edge('Exercise', 'HeartDisease')
med.add_edge('Diet', 'HeartDisease')
med.add_edge('Diet', 'Diabetes')
# Genetic factors
med.add_edge('GeneticRisk', 'HeartDisease')
med.add_edge('GeneticRisk', 'Diabetes')
# Diseases cause symptoms
med.add_edge('LungDisease', 'Cough')
med.add_edge('LungDisease', 'Fatigue')
med.add_edge('HeartDisease', 'ChestPain')
med.add_edge('HeartDisease', 'Fatigue')
med.add_edge('Diabetes', 'Fatigue')
med.add_edge('Diabetes', 'Thirst')
 
print("Medical Diagnosis Network")
print(f"Total nodes: {len(med.nodes)}")
print("-" * 50)
 
# Query: P(HeartDisease | ChestPain, Fatigue)
target = 'HeartDisease'
evidence = {'ChestPain', 'Fatigue'}
 
relevant = find_relevant_nodes(med, target, evidence)
irrelevant = med.nodes - relevant
 
print(f"\nQuery: P({target} | {evidence})")
print(f"Relevant nodes ({len(relevant)}): {relevant}")
print(f"Irrelevant nodes ({len(irrelevant)}): {irrelevant}")
print(f"\nInference can ignore {len(irrelevant)} of {len(med.nodes)} nodes!")

Common Pitfalls and Edge Cases

Even experienced practitioners make mistakes with d-separation. Here are the most common pitfalls:

Pitfall 1: Forgetting Descendant Activation

The collider rule includes descendants:

A → C ← B
    ↓
    D

❌ Wrong: "A ⟂ B | D because D is not on the path"

✓ Right: "A ↔ B | D because D is a descendant of collider C, which activates C"

Pitfall 2: Confusing Undirected Thinking

❌ Wrong: "Conditioning always blocks paths"

✓ Right: "Conditioning blocks chains and forks, but OPENS colliders"

Pitfall 3: Missing Paths

Remember to check ALL paths, not just the "obvious" ones:

A → B → C
↓       ↑
D ──→── E

Paths from A to C:

A → B → C
A → D → E → C

Both must be blocked for A ⟂ C.

Pitfall 4: Thinking All Colliders Must Be Conditioned

❌ Wrong: "Path A → C ← B is blocked unless we condition on C"

✓ Right: "Path A → C ← B is blocked BY DEFAULT. Conditioning on C (or descendants) OPENS it."

Colliders start blocked; conditioning activates them.

Pitfall 5: Overlooking Multiple Roles

A single node can play different roles on different paths:

A → B → C
↑       ↓
└── D ←─┘

For the query A ⟂ C:

Path A → B → C: B is a chain node
Path A ← D ← C: D is a chain node, but with different orientation

Check each path independently!

Pitfall 6: Conditioning on Everything

Conditioning on too many variables can open colliders and CREATE dependencies:

Before conditioning: A ⟂ B (collider between them blocks)
After conditioning on collider: A ↔ B

More evidence isn't always better for independence!

The Collider Mantra

Summary and Key Takeaways

Let's consolidate the essential knowledge about d-separation:

Key Takeaways

•D-separation is a graph-based criterion: X ⟂ Y | Z iff all undirected paths between X and Y are blocked by Z.
•Blocking rules: Chains and forks are blocked when the middle node is conditioned; colliders are blocked UNLESS conditioned (or their descendants are).
•Soundness: D-separation guarantees conditional independence for ANY distribution that factorizes according to the graph.
•Completeness: Under faithfulness, d-separation captures ALL conditional independencies—no more, no less.
•Algorithm: D-separation can be checked in O(n + e) time using a reachability algorithm that tracks direction.
•Applications: Inference optimization, model validation, feature selection, causal identification, structure learning.
•Key pitfall: Don't forget that conditioning on collider descendants also activates paths!

What's next:

Page Complete