Loading content...
If the DAG structure is the skeleton of a Bayesian Network, then Conditional Probability Tables (CPTs) are its flesh. Each CPT encodes the quantitative relationship between a variable and its parents—specifying exactly how parent configurations influence the probability distribution over the child variable.
Understanding CPTs is essential for both theoretical analysis and practical implementation. These tables are where domain knowledge becomes numerical, where expert estimates meet maximum likelihood estimates, and where the computational rubber meets the probabilistic road.
In this page, we dissect CPT representation with engineering precision: the mathematical requirements, the storage formats, the parameterization choices, and the practical considerations that arise in real-world Bayesian Network implementations. By the end, you will be able to design, implement, and optimize CPT representations for any Bayesian Network application.
By completing this page, you will: (1) Understand the mathematical structure and constraints of CPTs, (2) Master multiple representation formats (table, tree, rule-based), (3) Implement CPTs in code with proper normalization, (4) Analyze storage requirements and optimization strategies, and (5) Handle edge cases like deterministic nodes and noisy-OR.
A Conditional Probability Table specifies the conditional distribution of a variable given its parents. Let us establish the formal requirements.
Definition: Let $X$ be a random variable with possible values ${x_1, x_2, \ldots, x_r}$ (where $r$ is the cardinality of $X$), and let $Pa(X) = {U_1, U_2, \ldots, U_k}$ be the parents of $X$ with cardinalities $r_1, r_2, \ldots, r_k$ respectively.
The CPT for $X$ specifies: $$P(X = x_i | Pa(X) = \mathbf{u})$$
for all values $x_i$ of $X$ and all configurations $\mathbf{u} = (u_1, u_2, \ldots, u_k)$ of the parents.
Constraints:
Non-negativity: $P(X = x_i | Pa(X) = \mathbf{u}) \geq 0$ for all $i, \mathbf{u}$
Normalization: For each parent configuration $\mathbf{u}$: $$\sum_{i=1}^{r} P(X = x_i | Pa(X) = \mathbf{u}) = 1$$
These constraints ensure each row of the CPT is a valid probability distribution.
There are two common conventions for CPT layout. In the 'row normalization' convention (used here), each row corresponds to a parent configuration and sums to 1. In the 'column normalization' convention, roles are swapped. Always verify which convention a particular software package uses!
Degrees of Freedom:
The total number of free parameters in a CPT for variable $X$ with parents $Pa(X)$:
$$\text{Parameters} = (r - 1) \times \prod_{j=1}^{k} r_j$$
The factor $(r - 1)$ accounts for normalization: given $r - 1$ probabilities, the $r$-th is determined.
Example Calculation:
Suppose $X$ is ternary (3 values) with two binary parents $U_1, U_2$:
The full CPT would have $4 \times 3 = 12$ entries, but only 8 are free.
Special Case: Root Nodes
Variables with no parents ($Pa(X) = \emptyset$) have CPTs that are simply marginal distributions: $$P(X = x_i)$$
With $r - 1$ free parameters. These are sometimes called Prior Probability Tables.
| Child States | Parent Config | Entries | Free Parameters |
|---|---|---|---|
| 2 (binary) | None (root) | 2 | 1 |
| 2 (binary) | 1 binary parent | 4 | 2 |
| 2 (binary) | 2 binary parents | 8 | 4 |
| 2 (binary) | 3 binary parents | 16 | 8 |
| 3 (ternary) | 2 binary parents | 12 | 8 |
| 5 (5-ary) | 3 ternary parents | 135 | 108 |
The most straightforward CPT representation is a multi-dimensional table. Each dimension corresponds to a parent variable, and the entries store the conditional probabilities for each value of the child.
Standard Table Format:
For a binary variable $W$ (Wet Grass) with binary parents $R$ (Rain) and $S$ (Sprinkler), the CPT can be written as:
| R | S | P(W=wet) | P(W=dry) |
|---|---|---|---|
| rain | on | 0.99 | 0.01 |
| rain | off | 0.90 | 0.10 |
| no_rain | on | 0.90 | 0.10 |
| no_rain | off | 0.00 | 1.00 |
Each row sums to 1.0, ensuring normalization. Note that we could store only P(W=wet) since P(W=dry) = 1 - P(W=wet).
Multi-Dimensional Array Representation:
Computationally, CPTs are typically stored as multi-dimensional arrays. For $W$ with parents $R$ and $S$:
CPT[R][S][W] = probability
Or equivalently, with the child dimension last:
CPT[R, S, :] = [P(W=wet|R,S), P(W=dry|R,S)]
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160
import numpy as np class CPT: """ Conditional Probability Table representation. Supports: - Multi-dimensional array storage - Automatic normalization verification - Probability queries - Pretty printing """ def __init__(self, variable_name, variable_states, parent_names=None, parent_states=None): """ Initialize a CPT. Args: variable_name: Name of the child variable variable_states: List of possible states for the child parent_names: List of parent variable names (None for root) parent_states: Dict mapping parent name -> list of states """ self.variable_name = variable_name self.variable_states = variable_states self.parent_names = parent_names or [] self.parent_states = parent_states or {} # Compute shape of the probability table # Dimensions: [parent_1, parent_2, ..., parent_k, child] self.shape = [] for parent in self.parent_names: self.shape.append(len(self.parent_states[parent])) self.shape.append(len(variable_states)) # Initialize with uniform distribution self.table = np.ones(self.shape) / len(variable_states) # Create index mappings self._build_index_maps() def _build_index_maps(self): """Build mappings from state names to indices.""" self.child_to_idx = {s: i for i, s in enumerate(self.variable_states)} self.parent_to_idx = {} for parent in self.parent_names: self.parent_to_idx[parent] = { s: i for i, s in enumerate(self.parent_states[parent]) } def set_probabilities(self, parent_config, probabilities): """ Set the probability distribution for a given parent configuration. Args: parent_config: Dict mapping parent name -> state, or empty for root probabilities: Dict mapping child state -> probability """ # Build index tuple for parents idx = [] for parent in self.parent_names: state = parent_config[parent] idx.append(self.parent_to_idx[parent][state]) idx = tuple(idx) if idx else () # Set probabilities probs = [probabilities.get(s, 0.0) for s in self.variable_states] # Verify normalization total = sum(probs) if not np.isclose(total, 1.0): raise ValueError(f"Probabilities must sum to 1.0, got {total}") if idx: self.table[idx] = probs else: self.table[:] = probs def get_probability(self, child_state, parent_config=None): """ Get P(child=child_state | parents=parent_config). Args: child_state: The state of the child variable parent_config: Dict mapping parent name -> state Returns: The conditional probability """ parent_config = parent_config or {} idx = [] for parent in self.parent_names: state = parent_config[parent] idx.append(self.parent_to_idx[parent][state]) idx.append(self.child_to_idx[child_state]) return self.table[tuple(idx)] def verify_normalization(self): """Verify all rows sum to 1.""" if len(self.parent_names) == 0: # Root node - just check the single distribution return np.isclose(self.table.sum(), 1.0) # Sum over child dimension (last axis) row_sums = self.table.sum(axis=-1) return np.allclose(row_sums, 1.0) def __repr__(self): lines = [f"CPT for P({self.variable_name} | {', '.join(self.parent_names) or '∅'})"] lines.append(f" States: {self.variable_states}") lines.append(f" Shape: {self.shape}") lines.append(f" Parameters: {np.prod(self.shape[:-1]) * (self.shape[-1] - 1)}") lines.append(f" Normalized: {self.verify_normalization()}") return "\n".join(lines) # Example: Create CPT for WetGrassprint("=== CPT Representation Example ===\n") cpt_w = CPT( variable_name="WetGrass", variable_states=["wet", "dry"], parent_names=["Rain", "Sprinkler"], parent_states={ "Rain": ["rain", "no_rain"], "Sprinkler": ["on", "off"] }) # Set probabilities for each parent configurationcpt_w.set_probabilities( {"Rain": "rain", "Sprinkler": "on"}, {"wet": 0.99, "dry": 0.01})cpt_w.set_probabilities( {"Rain": "rain", "Sprinkler": "off"}, {"wet": 0.90, "dry": 0.10})cpt_w.set_probabilities( {"Rain": "no_rain", "Sprinkler": "on"}, {"wet": 0.90, "dry": 0.10})cpt_w.set_probabilities( {"Rain": "no_rain", "Sprinkler": "off"}, {"wet": 0.00, "dry": 1.00}) print(cpt_w)print("\nInternal table shape:", cpt_w.table.shape)print("Internal table:")print(cpt_w.table) # Query probabilitiesprint("\n=== Probability Queries ===")print(f"P(WetGrass=wet | Rain=rain, Sprinkler=on) = " f"{cpt_w.get_probability('wet', {'Rain': 'rain', 'Sprinkler': 'on'})}")print(f"P(WetGrass=dry | Rain=no_rain, Sprinkler=off) = " f"{cpt_w.get_probability('dry', {'Rain': 'no_rain', 'Sprinkler': 'off'})}")While tabular representation is the most general, it becomes unwieldy when variables have many parents or many states. Several alternative representations provide more compact encoding for special cases.
1. Decision Tree Representation:
When context-specific independence exists—where the distribution over the child depends on only some parents in certain contexts—a decision tree can be more compact.
Example: Consider $P(X | A, B, C)$ where:
A table would need $2^3 = 8$ rows. A decision tree needs only 4 leaves:
A
/ \
0 1
| |
B C
/| |\
P(X) P(X) P(X) P(X)
2. Rule-Based Representation:
For variables with many zero-probability configurations, rules can be more efficient:
IF Rain=yes AND Sprinkler=on THEN P(Wet)=0.99
IF Rain=yes AND Sprinkler=off THEN P(Wet)=0.90
...
DEFAULT P(Wet)=0.0 # Implicit for unspecified cases
Context-specific independence (CSI) occurs when conditional independence holds for specific values of conditioning variables but not globally. For example, X ⊥ Y | Z=1 might hold, but X ⊥̸ Y | Z=0. Tree representations naturally capture CSI, while tables cannot without redundancy.
3. Noisy-OR and Noisy-AND Models:
For binary variables with many causes, the Noisy-OR model provides a compact parameterization. Instead of specifying $2^k$ parameters for $k$ binary parents, Noisy-OR uses only $k + 1$ parameters.
Noisy-OR Model:
Assume each parent $U_i$ independently has a probability $q_i$ of being 'inhibited' (failing to activate the effect even when present). Additionally, there's a 'leak' probability $q_0$ of the effect being absent even with no causes.
$$P(X = 0 | U_1, \ldots, U_k) = q_0 \prod_{i: U_i = 1} q_i$$ $$P(X = 1 | U_1, \ldots, U_k) = 1 - q_0 \prod_{i: U_i = 1} q_i$$
Intuition: The effect $X$ is absent only if ALL active causes are inhibited AND the leak doesn't trigger.
Example: Lung disease ($X$) with causes: Smoking ($U_1$), Pollution ($U_2$), Genetics ($U_3$)
With 3 binary parents: Noisy-OR uses 4 parameters vs. 8 for full table.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
import numpy as npfrom itertools import product class NoisyOR: """ Noisy-OR model for binary variables with multiple binary causes. Parameters: - leak_prob: P(X=0 | no causes active) = probability of no effect with no causes - inhibition_probs: Dict mapping parent name -> P(inhibited | cause active) """ def __init__(self, variable_name, parent_names, leak_prob, inhibition_probs): self.variable_name = variable_name self.parent_names = parent_names self.leak_prob = leak_prob # q_0 self.inhibition_probs = inhibition_probs # {parent: q_i} def get_probability(self, child_value, parent_config): """ Compute P(X | parents) using Noisy-OR formula. Args: child_value: 0 or 1 parent_config: Dict mapping parent name -> 0 or 1 Returns: P(X=child_value | parents) """ # Start with leak probability prob_no_effect = self.leak_prob # Multiply by inhibition probability for each active cause for parent, value in parent_config.items(): if value == 1: # Cause is active prob_no_effect *= self.inhibition_probs[parent] # P(X=0) = product of inhibitions; P(X=1) = 1 - P(X=0) if child_value == 0: return prob_no_effect else: return 1 - prob_no_effect def generate_full_cpt(self): """ Generate the equivalent full CPT for comparison. Returns: Dict of (parent_config_tuple) -> [P(X=0), P(X=1)] """ cpt = {} parent_values = list(product([0, 1], repeat=len(self.parent_names))) for values in parent_values: config = dict(zip(self.parent_names, values)) p0 = self.get_probability(0, config) p1 = self.get_probability(1, config) cpt[values] = [p0, p1] return cpt def parameter_savings(self): """Calculate parameter savings vs full table.""" k = len(self.parent_names) full_table_params = 2 ** k # One free param per row (binary child) noisy_or_params = k + 1 # k inhibition probs + 1 leak return full_table_params, noisy_or_params # Example: Lung Disease modelprint("=== Noisy-OR Model Example ===\n") noisy_or = NoisyOR( variable_name="LungDisease", parent_names=["Smoking", "Pollution", "Genetics"], leak_prob=0.95, # 95% no disease without causes inhibition_probs={ "Smoking": 0.10, # Smoking very likely to cause "Pollution": 0.70, # Pollution moderate cause "Genetics": 0.40 # Genetics significant cause }) # Generate full CPTprint("Generated CPT from Noisy-OR model:")print("-" * 60)print(f"{'Smoking':>8} {'Pollution':>10} {'Genetics':>10} | P(Disease=0) P(Disease=1)")print("-" * 60) cpt = noisy_or.generate_full_cpt()for config, probs in sorted(cpt.items()): print(f"{config[0]:>8} {config[1]:>10} {config[2]:>10} | {probs[0]:.4f} {probs[1]:.4f}") # Parameter savingsfull, compact = noisy_or.parameter_savings()print(f"\nParameter comparison:")print(f" Full CPT: {full} parameters")print(f" Noisy-OR: {compact} parameters")print(f" Reduction: {(1 - compact/full)*100:.1f}%") # For more parents, savings are dramaticprint("\nScaling with number of parents:")for k in [3, 5, 10, 20]: full = 2 ** k compact = k + 1 print(f" {k} parents: Full={full:,} vs Noisy-OR={compact} ({(1-compact/full)*100:.2f}% reduction)")Traditional CPTs assume discrete variables with finite states. For continuous variables or mixed discrete-continuous networks, different representations are needed.
Conditional Linear Gaussian (CLG) Models:
For continuous variables with Gaussian distributions, the CPT is replaced by parameters of a conditional Gaussian:
$$P(X | Pa(X)) = \mathcal{N}(\mu(Pa(X)), \sigma^2(Pa(X)))$$
where the mean $\mu$ is a linear function of continuous parents: $$\mu = \beta_0 + \sum_{i} \beta_i U_i$$
For each discrete parent configuration, different $\beta$ coefficients and variances may apply.
Example: Height $(H)$ conditioned on Age $(A)$ and Gender $(G)$: $$P(H | A, G=\text{male}) = \mathcal{N}(100 + 3 \cdot A, 5^2)$$ $$P(H | A, G=\text{female}) = \mathcal{N}(95 + 2.8 \cdot A, 4.5^2)$$
Here, each gender has its own linear model.
In CLG networks, a continuous node cannot be a parent of a discrete node (the integral over continuous space to compute discrete probabilities doesn't factor nicely). This constraint limits network topologies when mixing discrete and continuous variables. Some extensions use discretization or special cases to relax this.
Discretization:
A common practical approach is to discretize continuous variables into bins:
This enables standard CPT representation but introduces:
Kernel Density Estimation:
For fully continuous parents, non-parametric approaches model the conditional density: $$P(X | Pa(X)) \approx \sum_{j=1}^{N} w_j \cdot K_h(X - X^{(j)}, Pa(X) - Pa(X)^{(j)})$$
where $K_h$ is a kernel function (e.g., Gaussian) and the sum is over training points with similar parent values.
Neural Network CPDs:
Modern approaches parameterize conditional distributions using neural networks: $$P(X | Pa(X)) = f_\theta(Pa(X))$$
where $f_\theta$ outputs distribution parameters (e.g., mixture weights, means, variances). This allows flexible, learnable conditional distributions but sacrifices interpretability and closed-form inference.
| Variable Types | Representation | Pros | Cons |
|---|---|---|---|
| All discrete | Standard CPT | Exact, interpretable | Exponential in parents |
| Discrete + Continuous (CLG) | Conditional Gaussian | Closed-form inference | Topology constraints |
| Discretized continuous | Standard CPT on bins | Simple implementation | Information loss |
| General continuous | Kernel density / Neural | Flexible | Computational cost |
Efficient storage and retrieval of CPT values is crucial for large-scale Bayesian Networks. Several optimization strategies reduce memory footprint and improve computational performance.
Sparse Representation:
When many CPT entries are zero or near-zero (common in practice), sparse storage is efficient:
Log-Probability Storage:
To avoid numerical underflow when multiplying many probabilities:
$$\log P(X_1, \ldots, X_n) = \sum_{i=1}^{n} \log P(X_i | Pa(X_i))$$
When summing probabilities in log-space (for marginalization): log(Σ exp(log_pᵢ)) = max_i log_pᵢ + log(Σ exp(log_pᵢ - max_i log_pᵢ)). This prevents overflow by subtracting the maximum before exponentiating. Essential for numerically stable implementations.
Index Computation:
For multi-dimensional arrays, converting parent configuration to array index efficiently:
index = 0
stride = 1
for parent in reversed(parents):
index += parent_value_index * stride
stride *= parent_cardinality
This computes the linear index from multi-dimensional coordinates in O(k) for k parents.
Caching Computed Factors:
During inference (especially variable elimination), intermediate factors are computed repeatedly. Caching strategies:
Lazy Evaluation:
For very large CPTs, compute entries on-demand rather than pre-computing all:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
import numpy as npfrom scipy.special import logsumexp class OptimizedCPT: """ CPT with numerical optimizations for stable inference. """ def __init__(self, variable_states, parent_cardinalities): """ Initialize optimized CPT. Args: variable_states: Number of states for child variable parent_cardinalities: List of cardinalities for each parent """ self.variable_states = variable_states self.parent_cardinalities = parent_cardinalities # Compute shape and total size self.shape = tuple(parent_cardinalities) + (variable_states,) self.size = np.prod(self.shape) # Store in log-space for numerical stability # Initialize with uniform (-log(variable_states)) self.log_table = np.full(self.shape, -np.log(variable_states)) # Precompute strides for fast indexing self._compute_strides() def _compute_strides(self): """Precompute strides for multi-dimensional indexing.""" self.strides = [] stride = 1 for card in reversed(self.parent_cardinalities): self.strides.append(stride) stride *= card self.strides = list(reversed(self.strides)) def parent_config_to_index(self, parent_indices): """Convert parent configuration to linear index.""" return sum(idx * stride for idx, stride in zip(parent_indices, self.strides)) def set_distribution(self, parent_indices, probabilities): """ Set probability distribution for a parent configuration. Stored in log-space. Args: parent_indices: Tuple of parent state indices probabilities: Array of probabilities (must sum to 1) """ probs = np.array(probabilities) if not np.isclose(probs.sum(), 1.0): raise ValueError("Probabilities must sum to 1") # Store as log probabilities # Handle zeros by using a very small value log_probs = np.where(probs > 0, np.log(probs), -1e10) self.log_table[parent_indices] = log_probs def get_log_probability(self, parent_indices, child_index): """Get log P(child | parents).""" return self.log_table[parent_indices + (child_index,)] def get_probability(self, parent_indices, child_index): """Get P(child | parents) - converts from log-space.""" return np.exp(self.get_log_probability(parent_indices, child_index)) def marginalize_child(self, parent_indices): """ Compute log P(parents) = log Σ_x P(x | parents)P(x) for uniform prior. For a CPT, this should equal 0 (log 1) since rows sum to 1. """ log_probs = self.log_table[parent_indices] return logsumexp(log_probs) def compute_joint_log_prob(self, assignments, cpts_and_parents): """ Compute log P(x1, ..., xn) using factorization. Args: assignments: Dict mapping variable -> state index cpts_and_parents: List of (cpt, variable, parents) tuples Returns: Log probability of the joint assignment """ log_prob = 0.0 for cpt, var, parents in cpts_and_parents: parent_indices = tuple(assignments[p] for p in parents) child_index = assignments[var] log_prob += cpt.get_log_probability(parent_indices, child_index) return log_prob # Demonstrate numerical stability benefitsprint("=== Numerical Stability Demonstration ===\n") # Create a CPT with some very small probabilitiescpt = OptimizedCPT(variable_states=3, parent_cardinalities=[2, 2]) # Set distributions including some very small probabilitiescpt.set_distribution((0, 0), [0.99, 0.009, 0.001])cpt.set_distribution((0, 1), [0.001, 0.001, 0.998])cpt.set_distribution((1, 0), [0.333, 0.334, 0.333])cpt.set_distribution((1, 1), [0.1, 0.1, 0.8]) print("CPT with small probabilities:")print("P(X=2 | parents=(0,1)):", cpt.get_probability((0, 1), 2))print("Log P(X=2 | parents=(0,1)):", cpt.get_log_probability((0, 1), 2)) # Simulate multiplying many probabilitiesprint("\n=== Stability for Long Chains ===") # Multiplying 100 probabilities of 0.9 eachn_factors = 100single_prob = 0.9 # Direct multiplication - will underflow for large ndirect_product = single_prob ** n_factorsprint(f"Direct: 0.9^{n_factors} = {direct_product}") # Log-space computation - stablelog_product = n_factors * np.log(single_prob)recovered = np.exp(log_product)print(f"Log-space: exp({n_factors} * log(0.9)) = {recovered}") # For even longer chainsn_factors = 1000direct_product = single_prob ** n_factors # Will be 0.0 due to underflowlog_product = n_factors * np.log(single_prob)print(f"\nDirect: 0.9^{n_factors} = {direct_product} (underflow!)")print(f"Log-space: {n_factors} * log(0.9) = {log_product:.2f} (stable)")Not all conditional relationships are probabilistic—some are deterministic. A variable $X$ is deterministic given its parents if $P(X = f(Pa(X)) | Pa(X)) = 1$ for some function $f$.
Examples of Deterministic Nodes:
CPT for Deterministic Nodes:
The CPT has a special structure: each row has exactly one entry of 1.0 and all others 0.0.
| A | B | P(AND=0) | P(AND=1) |
|---|---|---|---|
| 0 | 0 | 1.0 | 0.0 |
| 0 | 1 | 1.0 | 0.0 |
| 1 | 0 | 1.0 | 0.0 |
| 1 | 1 | 0.0 | 1.0 |
Storage is wasteful—we need only store the function $f$, not the full table.
Deterministic relationships introduce zeros in the joint distribution, which can cause issues: (1) Log of zero is undefined, requiring special handling, (2) Division by zero in inference when conditioning on 'impossible' evidence, (3) Some convergence proofs require strictly positive distributions. Use log(ε) for very small probabilities instead of true zeros.
Implementing Deterministic Nodes:
Rather than storing a full CPT, store the deterministic function:
class DeterministicNode:
def __init__(self, function, parent_names):
self.function = function
self.parent_names = parent_names
def get_value(self, parent_config):
return self.function(**parent_config)
def get_probability(self, child_value, parent_config):
computed = self.get_value(parent_config)
return 1.0 if child_value == computed else 0.0
Constraint Satisfaction:
Deterministic nodes can encode constraints:
Inference with Deterministic Nodes:
Deterministic relationships prune the search space during inference:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
class DeterministicNode: """ A node whose value is a deterministic function of its parents. More efficient than storing a full CPT with 0s and 1s. """ def __init__(self, name, parent_names, function): """ Args: name: Node name parent_names: List of parent variable names function: Function that takes parent values and returns child value """ self.name = name self.parent_names = parent_names self.function = function def compute_value(self, parent_values): """Compute the deterministic output given parent values.""" if isinstance(parent_values, dict): args = [parent_values[p] for p in self.parent_names] else: args = parent_values return self.function(*args) def get_probability(self, child_value, parent_values): """P(child | parents) - 1 if child equals computed value, else 0.""" computed = self.compute_value(parent_values) return 1.0 if child_value == computed else 0.0 def get_log_probability(self, child_value, parent_values): """Log probability - 0 if match, -inf otherwise.""" computed = self.compute_value(parent_values) return 0.0 if child_value == computed else float('-inf') # Example: Logical AND gateand_gate = DeterministicNode( name="AND", parent_names=["A", "B"], function=lambda a, b: 1 if (a == 1 and b == 1) else 0) print("=== Deterministic AND Gate ===\n")for a in [0, 1]: for b in [0, 1]: result = and_gate.compute_value({'A': a, 'B': b}) print(f"AND({a}, {b}) = {result}") # Example: Threshold nodethreshold_node = DeterministicNode( name="HighTemperature", parent_names=["Temperature"], function=lambda t: 1 if t > 30 else 0) print("\n=== Threshold Node (T > 30°C) ===\n")for temp in [20, 25, 30, 35, 40]: is_high = threshold_node.compute_value({'Temperature': temp}) prob = threshold_node.get_probability(1, {'Temperature': temp}) print(f"Temperature={temp}°C: HighTemperature={is_high}, P(High=1)={prob}") # Example: Selector/Multiplexerselector = DeterministicNode( name="Output", parent_names=["Switch", "InputA", "InputB"], function=lambda switch, a, b: a if switch == 0 else b) print("\n=== Selector/Multiplexer ===\n")print("Switch=0: selects InputA")print("Switch=1: selects InputB")for switch in [0, 1]: for a in [10, 20]: for b in [100, 200]: out = selector.compute_value({'Switch': switch, 'InputA': a, 'InputB': b}) print(f" Switch={switch}, A={a}, B={b} → Output={out}") # Comparison: Memory usageprint("\n=== Memory Comparison ===\n") class MemoryAnalysis: @staticmethod def cpt_memory(child_states, parent_cardinalities): """Bytes for full CPT (float64).""" entries = child_states for card in parent_cardinalities: entries *= card return entries * 8 # 8 bytes per float64 @staticmethod def deterministic_memory(): """Memory for deterministic node (just function reference).""" return 64 # Rough estimate for function object # AND gate: 2 binary parents, 1 binary outputprint("AND gate (2 binary parents):")print(f" Full CPT: {MemoryAnalysis.cpt_memory(2, [2, 2])} bytes")print(f" Deterministic: ~{MemoryAnalysis.deterministic_memory()} bytes") # Larger example: 10 binary inputsprint("\n10-input AND gate:")print(f" Full CPT: {MemoryAnalysis.cpt_memory(2, [2]*10):,} bytes ({MemoryAnalysis.cpt_memory(2, [2]*10)/1024:.1f} KB)")print(f" Deterministic: ~{MemoryAnalysis.deterministic_memory()} bytes")When data is scarce or unavailable, CPT parameters must be elicited from domain experts. This is both an art and a science, requiring careful methodology to extract accurate probability estimates.
Elicitation Challenges:
Probability estimation is hard: Humans are notoriously poor at estimating probabilities, subject to various cognitive biases
Scale effects: Experts may think in terms of 'likely', 'unlikely', 'rare' rather than 0.7, 0.2, 0.01
Conditioning confusion: Experts may conflate P(A|B) with P(B|A) or fail to hold conditioning variables fixed
Overconfidence: Experts tend to give distributions that are too narrow, underestimating uncertainty
Effort scaling: A CPT with 64 rows requires eliciting 64 (or more) distributions
Elicitation Techniques:
Anchoring: Start with a baseline and ask for adjustments: "Compared to the average case, how much more likely is X given Y?"
Scenarios: Present concrete scenarios rather than abstract probabilities: "Out of 100 patients with symptom A and test result B, how many would have disease X?"
Betting: Frame as betting: "Would you bet on X at 3-to-1 odds given Y?" This reveals revealed preferences.
Interval estimation: Ask for ranges rather than points: "What's the lowest/highest probability you'd consider reasonable?"
Calibration training: Train experts using questions with known answers to improve subsequent estimates.
For causes with additive effects, elicit just the strength of each individual cause (how likely is effect with ONLY this cause?), then compute the full CPT using Noisy-OR. This reduces elicitation burden exponentially—from 2^k entries to k single-cause questions.
Validation Techniques:
After elicitation, CPTs should be validated before deployment:
Consistency checks:
Sensitivity analysis:
Cross-validation with experts:
Historical data validation:
Logical validation:
| Bias | Description | Mitigation |
|---|---|---|
| Anchoring | Over-reliance on initial value | Use multiple anchors, randomize order |
| Overconfidence | Distributions too narrow | Explicitly ask for extreme scenarios |
| Availability | Weight recent/memorable cases | Use statistical framing (X out of 100) |
| Base rate neglect | Ignore prior probabilities | Provide base rates explicitly |
| Conjunction fallacy | P(A and B) > P(A) | Verify probabilistic coherence |
Conditional Probability Tables are the quantitative heart of Bayesian Networks. We have developed comprehensive understanding of their structure, representation, and practical handling. Let us consolidate the key insights:
What's Next:
With structure (DAGs) and parameters (CPTs) established, the next page addresses Parameter Learning—how to estimate CPT entries from observed data. We'll explore maximum likelihood estimation, Bayesian parameter estimation, handling missing data with EM, and practical considerations for learning from real datasets.
You now understand Conditional Probability Tables as the quantitative foundation of Bayesian Networks. You can implement CPTs in multiple formats, optimize storage and computation, handle special cases like determinism and continuous variables, and elicit parameters from domain experts. Next, we'll learn how to estimate these parameters from data.