Machine LearningGraphical Models

Bayesian Networks

LevelAdvanced

Duration90 mins

TopicGraphical Models

1 / 5

DAG Structure

The Architecture of Probabilistic Reasoning

At the heart of every Bayesian Network lies a Directed Acyclic Graph (DAG)—a deceptively simple mathematical structure that encodes profound statements about probabilistic relationships. Understanding DAGs is not merely a prerequisite for Bayesian Networks; it is the key to unlocking their full power.

A DAG provides the skeleton upon which probabilistic reasoning is built. The directed edges represent causal or influential relationships, while the absence of cycles ensures a coherent flow of information from causes to effects. This structure isn't arbitrary—it reflects deep truths about how variables in the real world depend upon one another.

In this page, we will dissect the anatomy of DAGs with surgical precision, exploring their formal definition, essential properties, graphical terminology, and the profound connection between graph structure and probabilistic independence. By the end, you will not only understand what a DAG is but why it is the perfect vehicle for capturing joint probability distributions.

What You Will Learn

By completing this page, you will: (1) Master the formal definition and properties of Directed Acyclic Graphs, (2) Understand the complete terminology of DAG components, (3) Grasp why acyclicity is essential for probabilistic modeling, (4) Learn to identify parent-child relationships and ancestral structures, and (5) Connect graph structure to the semantics of conditional independence.

Formal Definition of Directed Acyclic Graphs

Let us begin with mathematical precision. A Directed Acyclic Graph (DAG) is a fundamental structure in graph theory with specific properties that make it uniquely suited for representing probabilistic dependencies.

Definition: A DAG is an ordered pair $G = (V, E)$ where:

$V$ is a finite set of vertices (also called nodes)
$E \subseteq V \times V$ is a set of directed edges (also called arcs)
The edges form no directed cycles

A directed cycle would be a sequence of vertices $v_1 \rightarrow v_2 \rightarrow \cdots \rightarrow v_k \rightarrow v_1$ where each arrow represents a directed edge. The 'acyclic' property forbids such structures entirely.

Notation for edges: If there exists a directed edge from vertex $X$ to vertex $Y$, we write $X \rightarrow Y$. This is read as '$X$ is a parent of $Y$' or equivalently '$Y$ is a child of $X$'. The direction of the arrow signifies an asymmetric relationship—information or influence flows from $X$ toward $Y$.

Vertices as Random Variables

In Bayesian Networks, each vertex represents a random variable. The set V might contain variables like {Weather, Traffic, Arrival_Time, Mood}. The edges encode how these variables influence one another probabilistically. This one-to-one correspondence between vertices and random variables is fundamental.

Why 'Directed'? The direction of edges carries semantic meaning. In the edge $X \rightarrow Y$:

$X$ is considered a cause, influencer, or predecessor of $Y$
$Y$'s probability distribution is defined conditioned on $X$
Knowledge about $X$ provides information about $Y$ through this structural link

Undirected edges (as in Markov Random Fields) cannot capture this asymmetric, directional relationship. When we know that smoking causes lung cancer (not vice versa), this causal direction is naturally represented as Smoking → LungCancer.

Why 'Acyclic'? The prohibition of cycles ensures:

Topological ordering exists: Vertices can be arranged in a sequence such that every edge points forward. This is essential for defining joint distributions and performing inference.
No self-causation: A variable cannot influence itself through a feedback loop, which would create paradoxes in probabilistic semantics.
Well-defined ancestral relationships: Every vertex has a clearly defined set of ancestors (predecessors) and descendants (successors), enabling powerful independence statements.

Comparison: DAG vs Other Graph Types
Property	Undirected Graph	Directed Graph (with cycles)	DAG
Edge direction	None (symmetric)	Directed (asymmetric)	Directed (asymmetric)
Cycles allowed	Yes	Yes	No
Topological ordering	Not applicable	Not always possible	Always exists
Parent-child relationships	Not defined	Defined (may be circular)	Well-defined, hierarchical
Used in	Markov Random Fields	State machines, networks	Bayesian Networks

Essential DAG Terminology

To work fluently with Bayesian Networks, you must internalize the vocabulary of DAGs. This terminology is not merely academic—every term corresponds to a specific probabilistic concept that arises repeatedly in modeling and inference.

Consider a DAG with vertices ${A, B, C, D, E}$ and edges $A \rightarrow B$, $A \rightarrow C$, $B \rightarrow D$, $C \rightarrow D$, $D \rightarrow E$. We will use this example to illustrate each concept.

Core DAG Terminology

•Parents (Pa): The parents of a vertex $X$, denoted $Pa(X)$, are all vertices with edges pointing directly into $X$. In our example, $Pa(D) = {B, C}$ and $Pa(A) = \emptyset$ (empty set).
•Children: The children of $X$ are all vertices that $X$ points directly to. Here, the children of $A$ are ${B, C}$, and the children of $E$ are empty.
•Ancestors: The ancestors of $X$ include its parents, the parents of its parents, and so on—all vertices from which there exists a directed path to $X$. The ancestors of $E$ are ${A, B, C, D}$.
•Descendants: Symmetrically, descendants of $X$ are all vertices reachable by following directed edges from $X$. The descendants of $A$ are ${B, C, D, E}$—the entire graph except $A$ itself.
•Root nodes: Vertices with no parents. In our example, $A$ is the only root. Root nodes represent exogenous variables whose distributions are marginal, not conditioned on other variables in the graph.
•Leaf nodes: Vertices with no children. Here, $E$ is the only leaf. Leaf nodes are often observed variables or final outcomes of interest.
•Non-descendants: Given a vertex $X$, its non-descendants are all vertices that are not $X$ and not descendants of $X$. This set is crucial for defining conditional independence.

The Markov Blanket

A particularly important concept is the Markov Blanket of a node X: its parents, children, and the other parents of its children (co-parents). The Markov Blanket shields X from the rest of the network—given the Markov Blanket, X is conditionally independent of all other variables. This will become critical for inference algorithms.

Paths and Trails:

A path from $X$ to $Y$ is a sequence of edges connecting them, regardless of edge direction. A directed path follows edge directions. A trail is a path that does not repeat any vertex.

From $A$ to $E$: The directed path $A \rightarrow B \rightarrow D \rightarrow E$ exists
From $B$ to $C$: No directed path, but an undirected path $B \leftarrow A \rightarrow C$ exists

The distinction between directed and undirected paths is essential for understanding d-separation (covered in detail in Module 1).

Topological Ordering:

Since a DAG has no cycles, its vertices can always be arranged in a topological order: a linear sequence where every edge points from earlier to later in the sequence. For our example: $$A < B < C < D < E$$

or equivalently $$A < C < B < D < E$$

Topological ordering enables efficient algorithms for computing joint probabilities and performing inference.

dag_terminology.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import networkx as nx
 
# Create the example DAG
G = nx.DiGraph()
G.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'), ('D', 'E')])
 
# Demonstrate DAG terminology
print("=== DAG Terminology Demonstration ===\n")
 
# Parents of D
parents_D = list(G.predecessors('D'))
print(f"Parents of D: {parents_D}")  # ['B', 'C']
 
# Children of A
children_A = list(G.successors('A'))
print(f"Children of A: {children_A}")  # ['B', 'C']
 
# Ancestors of E (all nodes from which E is reachable)
ancestors_E = nx.ancestors(G, 'E')
print(f"Ancestors of E: {ancestors_E}")  # {'A', 'B', 'C', 'D'}
 
# Descendants of A (all nodes reachable from A)
descendants_A = nx.descendants(G, 'A')
print(f"Descendants of A: {descendants_A}")  # {'B', 'C', 'D', 'E'}
 
# Root nodes (no incoming edges)
root_nodes = [n for n in G.nodes() if G.in_degree(n) == 0]
print(f"Root nodes: {root_nodes}")  # ['A']
 
# Leaf nodes (no outgoing edges)
leaf_nodes = [n for n in G.nodes() if G.out_degree(n) == 0]
print(f"Leaf nodes: {leaf_nodes}")  # ['E']
 
# Topological ordering
topo_order = list(nx.topological_sort(G))
print(f"Topological order: {topo_order}")  # ['A', 'B', 'C', 'D', 'E'] or similar
 
# Markov Blanket of D: parents + children + co-parents of children
def markov_blanket(G, node):
    parents = set(G.predecessors(node))
    children = set(G.successors(node))
    co_parents = set()
    for child in children:
        co_parents.update(G.predecessors(child))
    co_parents.discard(node)
    return parents | children | co_parents
 
mb_D = markov_blanket(G, 'D')
print(f"Markov Blanket of D: {mb_D}")  # {'B', 'C', 'E'}

Acyclicity and Topological Ordering

The 'acyclic' constraint in DAGs is not merely a technical convenience—it is the property that enables the entire Bayesian Network framework to function. Let us examine why cycles are prohibited and how acyclicity guarantees the existence of topological orderings.

The Problem with Cycles:

Consider what would happen if we allowed cycles. Suppose we have $X \rightarrow Y \rightarrow Z \rightarrow X$. This creates a logical nightmare:

Causality becomes circular: $X$ influences $Y$, which influences $Z$, which influences $X$. But which variable determines the others?
Conditional distributions become undefined: In a Bayesian Network, $P(X | Pa(X))$ requires knowing the parents' values first. But if $X$'s ancestor is also its descendant, we have no starting point.
Joint distribution factorization fails: The elegant product $P(X_1, \ldots, X_n) = \prod_i P(X_i | Pa(X_i))$ requires a valid ordering. Cycles destroy this.

Theorem (Topological Ordering): A directed graph $G$ has a topological ordering if and only if $G$ is acyclic.

Proof sketch:

Only if: If a topological ordering exists, any cycle would require some vertex to appear both before and after another, which is impossible in a linear order.
If: For an acyclic graph, iteratively remove a vertex with no incoming edges (at least one must exist since there are no cycles), adding it to the ordering. Repeat until empty.

Multiple Topological Orderings

A DAG may have multiple valid topological orderings. For our example graph, both [A, B, C, D, E] and [A, C, B, D, E] are valid. What matters is that ALL orderings respect edge directions—every parent comes before its children. The specific ordering chosen affects computational efficiency but not correctness.

Algorithm: Kahn's Topological Sort

The standard algorithm for computing a topological ordering is Kahn's algorithm (1962):

Compute the in-degree (number of incoming edges) for each vertex
Initialize a queue with all vertices of in-degree 0 (roots)
While the queue is non-empty:
- Remove a vertex $v$ from the queue and add it to the output
- For each child $c$ of $v$, decrement its in-degree
- If $c$'s in-degree becomes 0, add $c$ to the queue
If all vertices are in the output, return it; otherwise, the graph has a cycle

This algorithm runs in $O(|V| + |E|)$ time, making it highly efficient even for large graphs.

kahn_topological_sort.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
from collections import deque
 
def kahn_topological_sort(vertices, edges):
    """
    Kahn's algorithm for topological sorting.
    
    Args:
        vertices: List of vertex labels
        edges: List of (parent, child) tuples
    
    Returns:
        Topological ordering if DAG, None if cycle exists
    """
    # Build adjacency list and compute in-degrees
    adjacency = {v: [] for v in vertices}
    in_degree = {v: 0 for v in vertices}
    
    for parent, child in edges:
        adjacency[parent].append(child)
        in_degree[child] += 1
    
    # Initialize queue with all roots (in-degree 0)
    queue = deque([v for v in vertices if in_degree[v] == 0])
    ordering = []
    
    while queue:
        # Remove vertex from queue
        current = queue.popleft()
        ordering.append(current)
        
        # Process all children
        for child in adjacency[current]:
            in_degree[child] -= 1
            if in_degree[child] == 0:
                queue.append(child)
    
    # Check if we processed all vertices
    if len(ordering) == len(vertices):
        return ordering
    else:
        # Cycle detected - not all vertices processed
        remaining = set(vertices) - set(ordering)
        raise ValueError(f"Graph contains cycle involving: {remaining}")
 
 
# Example usage
vertices = ['A', 'B', 'C', 'D', 'E']
edges = [('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'), ('D', 'E')]
 
order = kahn_topological_sort(vertices, edges)
print(f"Topological order: {order}")
 
# Test with a cycle
try:
    cyclic_edges = [('A', 'B'), ('B', 'C'), ('C', 'A')]
    kahn_topological_sort(['A', 'B', 'C'], cyclic_edges)
except ValueError as e:
    print(f"Detected: {e}")

Why Topological Order Matters for Bayesian Networks:

In a Bayesian Network, computing the joint probability $P(X_1, \ldots, X_n)$ requires evaluating conditional probabilities in an order where parents are always evaluated before their children. Topological ordering provides exactly this guarantee.

For any valid topological ordering $\sigma(1), \sigma(2), \ldots, \sigma(n)$: $$P(X_1, \ldots, X_n) = \prod_{i=1}^{n} P(X_{\sigma(i)} | Pa(X_{\sigma(i)}))$$

Each factor is well-defined because by the time we evaluate $P(X_{\sigma(i)} | Pa(X_{\sigma(i)}))$, all parents have already been assigned values earlier in the ordering.

Graphical Conventions and Visual Notation

When working with Bayesian Networks, consistent graphical notation is essential for clear communication. Several conventions have become standard in the literature and practice.

Node Representation:

Circles or ellipses typically represent random variables
Observed variables are often shown as shaded or filled nodes
Latent/hidden variables are shown as unshaded or empty nodes
Intervention targets (do-calculus) are sometimes shown with double circles

Edge Representation:

Single arrows (→) denote direct probabilistic dependence
Dashed arrows sometimes denote uncertainty about the edge
Double arrows (↔) in some extensions denote latent confounders

Plate Notation (for repeated structures):

When a subgraph is replicated many times (e.g., for multiple data points), we enclose it in a plate—a rectangle with a number indicating repetitions. This compactly represents models like: $$P(\theta) \prod_{i=1}^{N} P(x_i | \theta)$$

where the plate contains $x_i$ and indicates $N$ repetitions.

Reading DAGs Like a Pro

When you encounter a Bayesian Network diagram: (1) Identify root nodes—these define marginal distributions, (2) Trace directed paths to understand the flow of influence, (3) Look for common parents (forks), common children (colliders), and chains to understand independence structure, (4) Shaded nodes indicate observed data that can be conditioned upon during inference.

Common Structural Patterns:

Three fundamental local structures appear repeatedly in DAGs and determine the independence properties:

1. Chain (Serial Connection): $A \rightarrow B \rightarrow C$

$A$ influences $C$ only through $B$
If $B$ is observed, $A$ and $C$ become independent (the chain is 'blocked')
Example: Gene → Protein → Disease

2. Fork (Diverging Connection): $B \leftarrow A \rightarrow C$

$A$ is a common cause of both $B$ and $C$
$B$ and $C$ are initially dependent (correlated through $A$)
If $A$ is observed, $B$ and $C$ become independent
Example: Rain → (Wet Grass, Wet Road)

3. Collider (Converging Connection): $A \rightarrow C \leftarrow B$

$C$ has multiple causes
$A$ and $B$ are initially independent
If $C$ is observed, $A$ and $B$ become dependent ('explaining away')
Example: Talent → Acceptance ← Effort

Independence Properties of Fundamental Structures
Structure	Pattern	Initial state	After conditioning on middle
Chain	A → B → C	A and C dependent	A ⊥ C \| B (independent)
Fork	A ← B → C	A and C dependent	A ⊥ C \| B (independent)
Collider	A → B ← C	A and C independent	A and C dependent

The collider is particularly counterintuitive: observing a common effect creates dependence between its causes. This is the 'explaining away' phenomenon—if we know someone was admitted to a university, learning they have no talent makes effort more probable, and vice versa.

These three patterns are the building blocks for understanding d-separation, which generalizes independence reading to arbitrary DAG structures.

Graph Structure and Conditional Independence

The true power of DAGs in Bayesian Networks is their ability to encode conditional independence relationships graphically. The structure of the graph is not merely a visual aid—it is a formal statement about which variables are independent given which others.

The Local Markov Property:

The fundamental assumption connecting DAG structure to probability is:

Local Markov Property: Each variable is conditionally independent of its non-descendants given its parents.

Formally, for any variable $X$: $$X \perp NonDesc(X) \mid Pa(X)$$

This single statement, applied to all variables, completely characterizes the independence structure of the distribution. From this property, we can derive all other conditional independencies through d-separation.

Example: In our DAG with $A \rightarrow B \rightarrow D$ and $A \rightarrow C \rightarrow D$:

$B \perp C \mid A$ (given $A$, $B$ and $C$ are independent)
$D$'s non-descendants are ${A}$, and $Pa(D) = {B, C}$, so $D \perp A \mid B, C$

Three Equivalent Markov Properties

There are three mathematically equivalent ways to state the Markov property for DAGs: (1) Local Markov: Each node is independent of non-descendants given parents, (2) Global Markov: D-separation implies conditional independence, (3) Markov Factorization: The joint factors according to the DAG structure. These are all equivalent for DAGs with positive distributions.

Why Structure Implies Independence:

The connection between graph structure and probability is not obvious. Why should the absence of an edge between $X$ and $Y$ imply anything probabilistic?

The answer lies in how we define Bayesian Networks. We require that the joint distribution factorizes according to the graph: $$P(X_1, \ldots, X_n) = \prod_{i=1}^{n} P(X_i | Pa(X_i))$$

This product form mathematically enforces the Markov property. If $X$ only appears in the factor $P(X | Pa(X))$ and as a conditioning variable in its children's factors, then conditioning on $Pa(X)$ 'screens off' $X$ from everything else.

The I-Map Concept:

A DAG $G$ is an Independence Map (I-map) for a distribution $P$ if every independence encoded in $G$ holds in $P$. A distribution may have more independencies than the graph indicates, but never fewer.

If $G$ is an I-map for $P$, we can safely use $G$ for inference
If $G$ encodes an independence that doesn't hold in $P$, the model is misspecified

verify_markov_property.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import numpy as np
 
def verify_local_markov_property(graph, joint_prob, var_names):
    """
    Verify that a joint distribution satisfies the local Markov property
    with respect to a given DAG structure.
    
    This checks: X ⊥ NonDesc(X) | Pa(X) for all X
    """
    import itertools
    
    def get_parents(graph, node):
        return {n for n, children in graph.items() if node in children}
    
    def get_descendants(graph, node, visited=None):
        if visited is None:
            visited = set()
        visited.add(node)
        for child in graph.get(node, []):
            if child not in visited:
                get_descendants(graph, child, visited)
        return visited - {node}
    
    def get_non_descendants(graph, node, all_nodes):
        descendants = get_descendants(graph, node)
        return all_nodes - descendants - {node}
    
    all_nodes = set(var_names)
    violations = []
    
    for node in var_names:
        parents = get_parents(graph, node)
        non_desc = get_non_descendants(graph, node, all_nodes)
        
        if not non_desc:
            continue  # No non-descendants to check
        
        # Here we would verify conditional independence
        # P(X | Pa(X), NonDesc(X)) = P(X | Pa(X))
        # This requires summing over the joint distribution
        
        print(f"Node {node}:")
        print(f"  Parents: {parents if parents else '∅'}")
        print(f"  Non-descendants: {non_desc if non_desc else '∅'}")
        print(f"  Markov property: {node} ⊥ {non_desc} | {parents}")
    
    return violations
 
 
# Example DAG structure
graph = {
    'A': ['B', 'C'],
    'B': ['D'],
    'C': ['D'],
    'D': ['E'],
    'E': []
}
 
var_names = ['A', 'B', 'C', 'D', 'E']
print("=== Verifying Local Markov Property ===\n")
verify_local_markov_property(graph, None, var_names)

Constructing DAGs from Domain Knowledge

In practice, DAG structures are often constructed from domain knowledge rather than learned from data. This knowledge-driven approach leverages expert understanding of causal mechanisms, temporal relationships, and scientific principles.

Principles for DAG Construction:

1. Causality: If $X$ causes $Y$, draw $X \rightarrow Y$. This is the most common and strongest justification for an edge. Causal relationships are often known from:

Scientific theory (genes affect proteins)
Physical mechanisms (temperature affects air pressure)
Controlled experiments (drug causes side effects)

2. Temporal Precedence: If $X$ necessarily occurs before $Y$, consider $X \rightarrow Y$. Time provides a natural ordering:

Parent's education affects child's education (not vice versa)
Yesterday's weather affects today's (not vice versa)
Past experiences shape current beliefs

3. Parsimony: Include an edge only if there is a direct dependence not mediated by other variables. If $X$ affects $Y$ only through $Z$, use $X \rightarrow Z \rightarrow Y$, not $X \rightarrow Y$.

4. Acyclicity Check: After constructing edges, verify no cycles exist. If domain knowledge suggests $A$ affects $B$ and $B$ affects $A$, this indicates:

A feedback system better modeled with dynamic Bayesian Networks (time-sliced)
A need to reconsider the variables (perhaps one is an aggregate)

Causality vs Correlation

A critical error is adding edges based on observed correlation without causal justification. If ice cream sales and drowning deaths are correlated (both high in summer), we should NOT add IceCream → Drowning. Both are caused by a common factor (warm weather). Confusing correlation with causation leads to incorrect DAG structures and invalid inferences.

Systematic Approach to DAG Elicitation:

When working with domain experts to construct DAGs:

Enumerate variables: List all relevant random variables. Be precise about definitions (e.g., 'blood pressure' could be systolic, diastolic, mean, etc.)
Establish temporal layers: Group variables by when they are determined or observed. Earlier layers can only have edges to later layers.
For each variable, identify direct causes: Ask "What directly affects this variable?" Focus on direct effects, not chains.
Review conditional independencies: Ask "If I knew the values of these parents, would any other variable still provide information about the target?" If yes, a parent is missing.
Verify edge absence: For every pair without an edge, confirm they are conditionally independent given appropriate variables.
Check acyclicity: Use topological sort to confirm the graph is a valid DAG.

Medical Diagnosis DAG ConstructionConstructing a DAG for diagnosing a disease from symptoms

Input

Output

Common Pitfalls in DAG Construction:

Missing latent common causes: If two observed variables share an unobserved common cause, failing to represent it leads to incorrect independence assumptions
Over-connection: Adding edges 'just in case' increases model complexity and reduces the power of independence assumptions
Under-connection: Missing true dependencies leads to biased inference
Cycle introduction: Especially common when modeling feedback systems—resolve by introducing time indices

DAGs in Practice: Computational Considerations

The structure of a DAG directly impacts the computational tractability of inference and learning. Certain structural properties make Bayesian Networks efficient, while others lead to exponential complexity.

Key Structural Properties:

Tree-width: The tree-width of a DAG (more precisely, of its moralized graph) determines the complexity of exact inference algorithms. Low tree-width graphs admit polynomial-time inference via junction tree algorithms.

Maximum in-degree: The number of parents any variable has directly affects the size of conditional probability tables (exponential in the number of parents) and the difficulty of parameter learning.

Graph density: Sparse graphs (few edges relative to possible edges) are generally easier to work with than dense graphs.

Computational Impact of DAG Properties
Property	When Low	When High	Practical Limit
Tree-width	Exact inference tractable	Exact inference intractable	~20-30 for exact methods
Max in-degree	Small CPTs, easy learning	Large CPTs, data-hungry	5-10 parents typically
Number of nodes	Fast all operations	Memory and time issues	Thousands feasible
Edge density	Strong independence, efficient	Weak independence, costly	Prefer sparse graphs

Design for Efficiency

When you have flexibility in DAG design, prefer structures with low tree-width and limited in-degree. Polytrees (DAGs where the underlying undirected graph is a tree) have tree-width 1 and allow linear-time exact inference. Adding edges increases expressiveness but pays a computational cost.

Special DAG Structures:

Polytrees: DAGs whose underlying undirected graph is a tree (connected, no undirected cycles). Polytrees enable:single-pass inference via belief propagation, linear-time complexity, and intuitive message passing. These are ideal when the domain admits such structure.

Naive Bayes Structure: A star graph with one parent node (the class) connected to all children (features). This extreme structure assumes all features are conditionally independent given the class—strong but often reasonable approximation.

Two-Layer Networks: Root nodes represent 'causes' or 'latent factors,' leaf nodes represent 'effects' or 'observations.' This structure (similar to factor analysis) is common in diagnostic and generative models.

Hidden Markov Model Structure: A chain of latent states with emissions at each step. The DAG structure is a sequence of time slices with specific within-slice and cross-slice edges.

General DAGs: Allow arbitrary dependencies but may require approximate inference. Junction tree methods, variational inference, or MCMC become necessary for complex structures.

analyze_dag_structure.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import networkx as nx
 
def analyze_dag_structure(edges, nodes):
    """
    Analyze structural properties of a DAG relevant to
    computational tractability in Bayesian Networks.
    """
    G = nx.DiGraph()
    G.add_nodes_from(nodes)
    G.add_edges_from(edges)
    
    if not nx.is_directed_acyclic_graph(G):
        raise ValueError("Graph contains cycles!")
    
    print("=== DAG Structural Analysis ===\n")
    
    # Basic statistics
    n_nodes = G.number_of_nodes()
    n_edges = G.number_of_edges()
    max_possible_edges = n_nodes * (n_nodes - 1) // 2
    density = n_edges / max_possible_edges if max_possible_edges > 0 else 0
    
    print(f"Nodes: {n_nodes}")
    print(f"Edges: {n_edges}")
    print(f"Density: {density:.3f}")
    
    # In-degree analysis (affects CPT size)
    in_degrees = dict(G.in_degree())
    max_in_degree = max(in_degrees.values()) if in_degrees else 0
    avg_in_degree = sum(in_degrees.values()) / len(in_degrees) if in_degrees else 0
    
    print(f"\nMax in-degree: {max_in_degree}")
    print(f"Avg in-degree: {avg_in_degree:.2f}")
    
    # Identify structure type
    undirected = G.to_undirected()
    is_tree = nx.is_tree(undirected)
    is_forest = nx.is_forest(undirected)
    
    print(f"\nIs polytree: {is_tree}")
    print(f"Is polyforest: {is_forest}")
    
    # Check for Naive Bayes structure
    roots = [n for n in G.nodes() if G.in_degree(n) == 0]
    leaves = [n for n in G.nodes() if G.out_degree(n) == 0]
    
    is_naive_bayes = (
        len(roots) == 1 and 
        all(G.in_degree(n) == 1 for n in leaves) and
        all(G.out_degree(n) == 0 for n in leaves)
    )
    print(f"Is Naive Bayes structure: {is_naive_bayes}")
    
    # Estimate tree-width (moralized graph)
    # For exact tree-width, we need more complex algorithms
    moral_graph = undirected.copy()
    for node in G.nodes():
        parents = list(G.predecessors(node))
        for i, p1 in enumerate(parents):
            for p2 in parents[i+1:]:
                moral_graph.add_edge(p1, p2)
    
    print(f"\nMoralized graph edges: {moral_graph.number_of_edges()}")
    print(f"Additional moral edges: {moral_graph.number_of_edges() - n_edges}")
    
    return {
        'n_nodes': n_nodes,
        'n_edges': n_edges,
        'density': density,
        'max_in_degree': max_in_degree,
        'is_polytree': is_tree,
    }
 
 
# Example analysis
nodes = ['A', 'B', 'C', 'D', 'E']
edges = [('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'), ('D', 'E')]
 
analysis = analyze_dag_structure(edges, nodes)

Summary: DAG Structure Foundations

We have built a comprehensive understanding of Directed Acyclic Graphs as the structural foundation of Bayesian Networks. Let us consolidate the key insights:

Key Takeaways

•DAGs are ordered pairs (V, E) where V is vertices (random variables) and E is directed edges (dependencies), with the crucial constraint that no directed cycles exist.
•Acyclicity enables topological ordering: Vertices can always be arranged so edges point forward, enabling well-defined factorization of joint distributions.
•Essential terminology: Parents, children, ancestors, descendants, roots, leaves, and Markov blanket all have precise meanings that map directly to probabilistic concepts.
•Three fundamental patterns: Chains, forks, and colliders determine how independence is affected by conditioning—colliders uniquely create dependence when observed.
•Local Markov property: Each variable is independent of non-descendants given its parents—the single statement that connects graph structure to probability.
•DAGs from domain knowledge: Causal relationships, temporal order, and parsimony guide construction; correlation alone never justifies an edge.
•Computational implications: Tree-width, in-degree, and density determine tractability; polytrees enable efficient exact inference.

What's Next:

With the structural foundation in place, the next page explores Factorization—how the DAG structure enables the joint distribution to be decomposed into a product of local conditional distributions. This factorization is the mathematical bridge between graph structure and probabilistic reasoning, and it is what makes Bayesian Networks computationally tractable.

Page Complete

You now understand Directed Acyclic Graphs as the structural foundation of Bayesian Networks. You can identify DAG components, verify acyclicity, trace ancestral relationships, and recognize independence-determining patterns. Next, we'll see how this structure enables efficient representation through factorization.

1 / 5

Loading learning content...

Machine LearningGraphical Models

Bayesian Networks

LevelAdvanced

Duration90 mins

TopicGraphical Models

1 / 5

DAG Structure

The Architecture of Probabilistic Reasoning

What You Will Learn

Formal Definition of Directed Acyclic Graphs

Definition: A DAG is an ordered pair $G = (V, E)$ where:

$V$ is a finite set of vertices (also called nodes)
$E \subseteq V \times V$ is a set of directed edges (also called arcs)
The edges form no directed cycles

Vertices as Random Variables

Why 'Directed'? The direction of edges carries semantic meaning. In the edge $X \rightarrow Y$:

$X$ is considered a cause, influencer, or predecessor of $Y$
$Y$'s probability distribution is defined conditioned on $X$
Knowledge about $X$ provides information about $Y$ through this structural link

Why 'Acyclic'? The prohibition of cycles ensures:

Topological ordering exists: Vertices can be arranged in a sequence such that every edge points forward. This is essential for defining joint distributions and performing inference.
No self-causation: A variable cannot influence itself through a feedback loop, which would create paradoxes in probabilistic semantics.
Well-defined ancestral relationships: Every vertex has a clearly defined set of ancestors (predecessors) and descendants (successors), enabling powerful independence statements.

Comparison: DAG vs Other Graph Types
Property	Undirected Graph	Directed Graph (with cycles)	DAG
Edge direction	None (symmetric)	Directed (asymmetric)	Directed (asymmetric)
Cycles allowed	Yes	Yes	No
Topological ordering	Not applicable	Not always possible	Always exists
Parent-child relationships	Not defined	Defined (may be circular)	Well-defined, hierarchical
Used in	Markov Random Fields	State machines, networks	Bayesian Networks

Essential DAG Terminology

Core DAG Terminology

•Parents (Pa): The parents of a vertex $X$, denoted $Pa(X)$, are all vertices with edges pointing directly into $X$. In our example, $Pa(D) = {B, C}$ and $Pa(A) = \emptyset$ (empty set).
•Children: The children of $X$ are all vertices that $X$ points directly to. Here, the children of $A$ are ${B, C}$, and the children of $E$ are empty.
•Ancestors: The ancestors of $X$ include its parents, the parents of its parents, and so on—all vertices from which there exists a directed path to $X$. The ancestors of $E$ are ${A, B, C, D}$.
•Descendants: Symmetrically, descendants of $X$ are all vertices reachable by following directed edges from $X$. The descendants of $A$ are ${B, C, D, E}$—the entire graph except $A$ itself.
•Root nodes: Vertices with no parents. In our example, $A$ is the only root. Root nodes represent exogenous variables whose distributions are marginal, not conditioned on other variables in the graph.
•Leaf nodes: Vertices with no children. Here, $E$ is the only leaf. Leaf nodes are often observed variables or final outcomes of interest.
•Non-descendants: Given a vertex $X$, its non-descendants are all vertices that are not $X$ and not descendants of $X$. This set is crucial for defining conditional independence.

The Markov Blanket

Paths and Trails:

A path from $X$ to $Y$ is a sequence of edges connecting them, regardless of edge direction. A directed path follows edge directions. A trail is a path that does not repeat any vertex.

From $A$ to $E$: The directed path $A \rightarrow B \rightarrow D \rightarrow E$ exists
From $B$ to $C$: No directed path, but an undirected path $B \leftarrow A \rightarrow C$ exists

The distinction between directed and undirected paths is essential for understanding d-separation (covered in detail in Module 1).

Topological Ordering:

or equivalently $$A < C < B < D < E$$

Topological ordering enables efficient algorithms for computing joint probabilities and performing inference.

dag_terminology.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import networkx as nx
 
# Create the example DAG
G = nx.DiGraph()
G.add_edges_from([('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'), ('D', 'E')])
 
# Demonstrate DAG terminology
print("=== DAG Terminology Demonstration ===\n")
 
# Parents of D
parents_D = list(G.predecessors('D'))
print(f"Parents of D: {parents_D}")  # ['B', 'C']
 
# Children of A
children_A = list(G.successors('A'))
print(f"Children of A: {children_A}")  # ['B', 'C']
 
# Ancestors of E (all nodes from which E is reachable)
ancestors_E = nx.ancestors(G, 'E')
print(f"Ancestors of E: {ancestors_E}")  # {'A', 'B', 'C', 'D'}
 
# Descendants of A (all nodes reachable from A)
descendants_A = nx.descendants(G, 'A')
print(f"Descendants of A: {descendants_A}")  # {'B', 'C', 'D', 'E'}
 
# Root nodes (no incoming edges)
root_nodes = [n for n in G.nodes() if G.in_degree(n) == 0]
print(f"Root nodes: {root_nodes}")  # ['A']
 
# Leaf nodes (no outgoing edges)
leaf_nodes = [n for n in G.nodes() if G.out_degree(n) == 0]
print(f"Leaf nodes: {leaf_nodes}")  # ['E']
 
# Topological ordering
topo_order = list(nx.topological_sort(G))
print(f"Topological order: {topo_order}")  # ['A', 'B', 'C', 'D', 'E'] or similar
 
# Markov Blanket of D: parents + children + co-parents of children
def markov_blanket(G, node):
    parents = set(G.predecessors(node))
    children = set(G.successors(node))
    co_parents = set()
    for child in children:
        co_parents.update(G.predecessors(child))
    co_parents.discard(node)
    return parents | children | co_parents
 
mb_D = markov_blanket(G, 'D')
print(f"Markov Blanket of D: {mb_D}")  # {'B', 'C', 'E'}

Acyclicity and Topological Ordering

The Problem with Cycles:

Consider what would happen if we allowed cycles. Suppose we have $X \rightarrow Y \rightarrow Z \rightarrow X$. This creates a logical nightmare:

Causality becomes circular: $X$ influences $Y$, which influences $Z$, which influences $X$. But which variable determines the others?
Conditional distributions become undefined: In a Bayesian Network, $P(X | Pa(X))$ requires knowing the parents' values first. But if $X$'s ancestor is also its descendant, we have no starting point.
Joint distribution factorization fails: The elegant product $P(X_1, \ldots, X_n) = \prod_i P(X_i | Pa(X_i))$ requires a valid ordering. Cycles destroy this.

Theorem (Topological Ordering): A directed graph $G$ has a topological ordering if and only if $G$ is acyclic.

Proof sketch:

Only if: If a topological ordering exists, any cycle would require some vertex to appear both before and after another, which is impossible in a linear order.
If: For an acyclic graph, iteratively remove a vertex with no incoming edges (at least one must exist since there are no cycles), adding it to the ordering. Repeat until empty.

Multiple Topological Orderings

Algorithm: Kahn's Topological Sort

The standard algorithm for computing a topological ordering is Kahn's algorithm (1962):

Compute the in-degree (number of incoming edges) for each vertex
Initialize a queue with all vertices of in-degree 0 (roots)
While the queue is non-empty:
- Remove a vertex $v$ from the queue and add it to the output
- For each child $c$ of $v$, decrement its in-degree
- If $c$'s in-degree becomes 0, add $c$ to the queue
If all vertices are in the output, return it; otherwise, the graph has a cycle

This algorithm runs in $O(|V| + |E|)$ time, making it highly efficient even for large graphs.

kahn_topological_sort.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
from collections import deque
 
def kahn_topological_sort(vertices, edges):
    """
    Kahn's algorithm for topological sorting.
    
    Args:
        vertices: List of vertex labels
        edges: List of (parent, child) tuples
    
    Returns:
        Topological ordering if DAG, None if cycle exists
    """
    # Build adjacency list and compute in-degrees
    adjacency = {v: [] for v in vertices}
    in_degree = {v: 0 for v in vertices}
    
    for parent, child in edges:
        adjacency[parent].append(child)
        in_degree[child] += 1
    
    # Initialize queue with all roots (in-degree 0)
    queue = deque([v for v in vertices if in_degree[v] == 0])
    ordering = []
    
    while queue:
        # Remove vertex from queue
        current = queue.popleft()
        ordering.append(current)
        
        # Process all children
        for child in adjacency[current]:
            in_degree[child] -= 1
            if in_degree[child] == 0:
                queue.append(child)
    
    # Check if we processed all vertices
    if len(ordering) == len(vertices):
        return ordering
    else:
        # Cycle detected - not all vertices processed
        remaining = set(vertices) - set(ordering)
        raise ValueError(f"Graph contains cycle involving: {remaining}")
 
 
# Example usage
vertices = ['A', 'B', 'C', 'D', 'E']
edges = [('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'), ('D', 'E')]
 
order = kahn_topological_sort(vertices, edges)
print(f"Topological order: {order}")
 
# Test with a cycle
try:
    cyclic_edges = [('A', 'B'), ('B', 'C'), ('C', 'A')]
    kahn_topological_sort(['A', 'B', 'C'], cyclic_edges)
except ValueError as e:
    print(f"Detected: {e}")

Why Topological Order Matters for Bayesian Networks:

For any valid topological ordering $\sigma(1), \sigma(2), \ldots, \sigma(n)$: $$P(X_1, \ldots, X_n) = \prod_{i=1}^{n} P(X_{\sigma(i)} | Pa(X_{\sigma(i)}))$$

Each factor is well-defined because by the time we evaluate $P(X_{\sigma(i)} | Pa(X_{\sigma(i)}))$, all parents have already been assigned values earlier in the ordering.

Graphical Conventions and Visual Notation

When working with Bayesian Networks, consistent graphical notation is essential for clear communication. Several conventions have become standard in the literature and practice.

Node Representation:

Circles or ellipses typically represent random variables
Observed variables are often shown as shaded or filled nodes
Latent/hidden variables are shown as unshaded or empty nodes
Intervention targets (do-calculus) are sometimes shown with double circles

Edge Representation:

Single arrows (→) denote direct probabilistic dependence
Dashed arrows sometimes denote uncertainty about the edge
Double arrows (↔) in some extensions denote latent confounders

Plate Notation (for repeated structures):

where the plate contains $x_i$ and indicates $N$ repetitions.

Reading DAGs Like a Pro

Common Structural Patterns:

Three fundamental local structures appear repeatedly in DAGs and determine the independence properties:

1. Chain (Serial Connection): $A \rightarrow B \rightarrow C$

$A$ influences $C$ only through $B$
If $B$ is observed, $A$ and $C$ become independent (the chain is 'blocked')
Example: Gene → Protein → Disease

2. Fork (Diverging Connection): $B \leftarrow A \rightarrow C$

$A$ is a common cause of both $B$ and $C$
$B$ and $C$ are initially dependent (correlated through $A$)
If $A$ is observed, $B$ and $C$ become independent
Example: Rain → (Wet Grass, Wet Road)

3. Collider (Converging Connection): $A \rightarrow C \leftarrow B$

$C$ has multiple causes
$A$ and $B$ are initially independent
If $C$ is observed, $A$ and $B$ become dependent ('explaining away')
Example: Talent → Acceptance ← Effort

Independence Properties of Fundamental Structures
Structure	Pattern	Initial state	After conditioning on middle
Chain	A → B → C	A and C dependent	A ⊥ C \| B (independent)
Fork	A ← B → C	A and C dependent	A ⊥ C \| B (independent)
Collider	A → B ← C	A and C independent	A and C dependent

These three patterns are the building blocks for understanding d-separation, which generalizes independence reading to arbitrary DAG structures.

Graph Structure and Conditional Independence

The Local Markov Property:

The fundamental assumption connecting DAG structure to probability is:

Local Markov Property: Each variable is conditionally independent of its non-descendants given its parents.

Formally, for any variable $X$: $$X \perp NonDesc(X) \mid Pa(X)$$

Example: In our DAG with $A \rightarrow B \rightarrow D$ and $A \rightarrow C \rightarrow D$:

$B \perp C \mid A$ (given $A$, $B$ and $C$ are independent)
$D$'s non-descendants are ${A}$, and $Pa(D) = {B, C}$, so $D \perp A \mid B, C$

Three Equivalent Markov Properties

Why Structure Implies Independence:

The connection between graph structure and probability is not obvious. Why should the absence of an edge between $X$ and $Y$ imply anything probabilistic?

The answer lies in how we define Bayesian Networks. We require that the joint distribution factorizes according to the graph: $$P(X_1, \ldots, X_n) = \prod_{i=1}^{n} P(X_i | Pa(X_i))$$

The I-Map Concept:

If $G$ is an I-map for $P$, we can safely use $G$ for inference
If $G$ encodes an independence that doesn't hold in $P$, the model is misspecified

verify_markov_property.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import numpy as np
 
def verify_local_markov_property(graph, joint_prob, var_names):
    """
    Verify that a joint distribution satisfies the local Markov property
    with respect to a given DAG structure.
    
    This checks: X ⊥ NonDesc(X) | Pa(X) for all X
    """
    import itertools
    
    def get_parents(graph, node):
        return {n for n, children in graph.items() if node in children}
    
    def get_descendants(graph, node, visited=None):
        if visited is None:
            visited = set()
        visited.add(node)
        for child in graph.get(node, []):
            if child not in visited:
                get_descendants(graph, child, visited)
        return visited - {node}
    
    def get_non_descendants(graph, node, all_nodes):
        descendants = get_descendants(graph, node)
        return all_nodes - descendants - {node}
    
    all_nodes = set(var_names)
    violations = []
    
    for node in var_names:
        parents = get_parents(graph, node)
        non_desc = get_non_descendants(graph, node, all_nodes)
        
        if not non_desc:
            continue  # No non-descendants to check
        
        # Here we would verify conditional independence
        # P(X | Pa(X), NonDesc(X)) = P(X | Pa(X))
        # This requires summing over the joint distribution
        
        print(f"Node {node}:")
        print(f"  Parents: {parents if parents else '∅'}")
        print(f"  Non-descendants: {non_desc if non_desc else '∅'}")
        print(f"  Markov property: {node} ⊥ {non_desc} | {parents}")
    
    return violations
 
 
# Example DAG structure
graph = {
    'A': ['B', 'C'],
    'B': ['D'],
    'C': ['D'],
    'D': ['E'],
    'E': []
}
 
var_names = ['A', 'B', 'C', 'D', 'E']
print("=== Verifying Local Markov Property ===\n")
verify_local_markov_property(graph, None, var_names)

Constructing DAGs from Domain Knowledge

Principles for DAG Construction:

1. Causality: If $X$ causes $Y$, draw $X \rightarrow Y$. This is the most common and strongest justification for an edge. Causal relationships are often known from:

Scientific theory (genes affect proteins)
Physical mechanisms (temperature affects air pressure)
Controlled experiments (drug causes side effects)

2. Temporal Precedence: If $X$ necessarily occurs before $Y$, consider $X \rightarrow Y$. Time provides a natural ordering:

Parent's education affects child's education (not vice versa)
Yesterday's weather affects today's (not vice versa)
Past experiences shape current beliefs

4. Acyclicity Check: After constructing edges, verify no cycles exist. If domain knowledge suggests $A$ affects $B$ and $B$ affects $A$, this indicates:

A feedback system better modeled with dynamic Bayesian Networks (time-sliced)
A need to reconsider the variables (perhaps one is an aggregate)

Causality vs Correlation

Systematic Approach to DAG Elicitation:

When working with domain experts to construct DAGs:

Enumerate variables: List all relevant random variables. Be precise about definitions (e.g., 'blood pressure' could be systolic, diastolic, mean, etc.)
Establish temporal layers: Group variables by when they are determined or observed. Earlier layers can only have edges to later layers.
For each variable, identify direct causes: Ask "What directly affects this variable?" Focus on direct effects, not chains.
Review conditional independencies: Ask "If I knew the values of these parents, would any other variable still provide information about the target?" If yes, a parent is missing.
Verify edge absence: For every pair without an edge, confirm they are conditionally independent given appropriate variables.
Check acyclicity: Use topological sort to confirm the graph is a valid DAG.

Medical Diagnosis DAG ConstructionConstructing a DAG for diagnosing a disease from symptoms

Input

Output

Common Pitfalls in DAG Construction:

Missing latent common causes: If two observed variables share an unobserved common cause, failing to represent it leads to incorrect independence assumptions
Over-connection: Adding edges 'just in case' increases model complexity and reduces the power of independence assumptions
Under-connection: Missing true dependencies leads to biased inference
Cycle introduction: Especially common when modeling feedback systems—resolve by introducing time indices

DAGs in Practice: Computational Considerations

Key Structural Properties:

Graph density: Sparse graphs (few edges relative to possible edges) are generally easier to work with than dense graphs.

Computational Impact of DAG Properties
Property	When Low	When High	Practical Limit
Tree-width	Exact inference tractable	Exact inference intractable	~20-30 for exact methods
Max in-degree	Small CPTs, easy learning	Large CPTs, data-hungry	5-10 parents typically
Number of nodes	Fast all operations	Memory and time issues	Thousands feasible
Edge density	Strong independence, efficient	Weak independence, costly	Prefer sparse graphs

Design for Efficiency

Special DAG Structures:

Hidden Markov Model Structure: A chain of latent states with emissions at each step. The DAG structure is a sequence of time slices with specific within-slice and cross-slice edges.

General DAGs: Allow arbitrary dependencies but may require approximate inference. Junction tree methods, variational inference, or MCMC become necessary for complex structures.

analyze_dag_structure.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import networkx as nx
 
def analyze_dag_structure(edges, nodes):
    """
    Analyze structural properties of a DAG relevant to
    computational tractability in Bayesian Networks.
    """
    G = nx.DiGraph()
    G.add_nodes_from(nodes)
    G.add_edges_from(edges)
    
    if not nx.is_directed_acyclic_graph(G):
        raise ValueError("Graph contains cycles!")
    
    print("=== DAG Structural Analysis ===\n")
    
    # Basic statistics
    n_nodes = G.number_of_nodes()
    n_edges = G.number_of_edges()
    max_possible_edges = n_nodes * (n_nodes - 1) // 2
    density = n_edges / max_possible_edges if max_possible_edges > 0 else 0
    
    print(f"Nodes: {n_nodes}")
    print(f"Edges: {n_edges}")
    print(f"Density: {density:.3f}")
    
    # In-degree analysis (affects CPT size)
    in_degrees = dict(G.in_degree())
    max_in_degree = max(in_degrees.values()) if in_degrees else 0
    avg_in_degree = sum(in_degrees.values()) / len(in_degrees) if in_degrees else 0
    
    print(f"\nMax in-degree: {max_in_degree}")
    print(f"Avg in-degree: {avg_in_degree:.2f}")
    
    # Identify structure type
    undirected = G.to_undirected()
    is_tree = nx.is_tree(undirected)
    is_forest = nx.is_forest(undirected)
    
    print(f"\nIs polytree: {is_tree}")
    print(f"Is polyforest: {is_forest}")
    
    # Check for Naive Bayes structure
    roots = [n for n in G.nodes() if G.in_degree(n) == 0]
    leaves = [n for n in G.nodes() if G.out_degree(n) == 0]
    
    is_naive_bayes = (
        len(roots) == 1 and 
        all(G.in_degree(n) == 1 for n in leaves) and
        all(G.out_degree(n) == 0 for n in leaves)
    )
    print(f"Is Naive Bayes structure: {is_naive_bayes}")
    
    # Estimate tree-width (moralized graph)
    # For exact tree-width, we need more complex algorithms
    moral_graph = undirected.copy()
    for node in G.nodes():
        parents = list(G.predecessors(node))
        for i, p1 in enumerate(parents):
            for p2 in parents[i+1:]:
                moral_graph.add_edge(p1, p2)
    
    print(f"\nMoralized graph edges: {moral_graph.number_of_edges()}")
    print(f"Additional moral edges: {moral_graph.number_of_edges() - n_edges}")
    
    return {
        'n_nodes': n_nodes,
        'n_edges': n_edges,
        'density': density,
        'max_in_degree': max_in_degree,
        'is_polytree': is_tree,
    }
 
 
# Example analysis
nodes = ['A', 'B', 'C', 'D', 'E']
edges = [('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D'), ('D', 'E')]
 
analysis = analyze_dag_structure(edges, nodes)

Summary: DAG Structure Foundations

We have built a comprehensive understanding of Directed Acyclic Graphs as the structural foundation of Bayesian Networks. Let us consolidate the key insights:

Key Takeaways

•DAGs are ordered pairs (V, E) where V is vertices (random variables) and E is directed edges (dependencies), with the crucial constraint that no directed cycles exist.
•Acyclicity enables topological ordering: Vertices can always be arranged so edges point forward, enabling well-defined factorization of joint distributions.
•Essential terminology: Parents, children, ancestors, descendants, roots, leaves, and Markov blanket all have precise meanings that map directly to probabilistic concepts.
•Three fundamental patterns: Chains, forks, and colliders determine how independence is affected by conditioning—colliders uniquely create dependence when observed.
•Local Markov property: Each variable is independent of non-descendants given its parents—the single statement that connects graph structure to probability.
•DAGs from domain knowledge: Causal relationships, temporal order, and parsimony guide construction; correlation alone never justifies an edge.
•Computational implications: Tree-width, in-degree, and density determine tractability; polytrees enable efficient exact inference.

What's Next:

Page Complete

1 / 5