Probability Fundamentals - Learning Module

Loading content...

0/245

Sample Spaces and Events

The Language of Uncertainty

Every machine learning model operates in a world riddled with uncertainty. When a spam classifier examines an email, it cannot definitively know the sender's intent—it can only assess the likelihood that the message is spam. When a recommendation system suggests a movie, it's making an educated guess about what you might enjoy. When a medical diagnosis system evaluates symptoms, it's reasoning about probable conditions.

Probability theory provides the mathematical framework for reasoning precisely about uncertain outcomes. Without it, machine learning would be reduced to rigid, brittle rules incapable of handling the inherent ambiguity of real-world data.

Before we can understand sophisticated probabilistic concepts like Bayesian inference, maximum likelihood estimation, or probabilistic graphical models, we must first master the foundational building blocks: sample spaces and events.

What You Will Learn

By the end of this page, you will understand how to formally define a random experiment, construct its sample space, identify and manipulate events using set operations, and recognize how these abstract concepts map directly to practical machine learning scenarios.

Random Experiments: The Foundation

Probability theory begins with the concept of a random experiment (also called a stochastic experiment or chance experiment). A random experiment is any procedure that satisfies three essential criteria:

Well-defined outcomes: Before the experiment, we can enumerate all possible results
Unpredictable individual result: We cannot determine with certainty which specific outcome will occur
Repeatable in principle: The experiment can be repeated under identical conditions

These criteria may seem abstract, but they precisely capture the nature of uncertainty we encounter throughout machine learning.

Examples of Random Experiments

•Coin flip — Two possible outcomes (heads, tails), each flip is unpredictable, can be repeated indefinitely
•Rolling a die — Six possible outcomes (1, 2, 3, 4, 5, 6), each roll is unpredictable, infinitely repeatable
•Drawing a card from a shuffled deck — 52 possible outcomes, unpredictable after shuffling, repeatable with re-shuffling
•Measuring a patient's blood pressure — Continuous range of possible values, affected by many uncontrollable factors, repeatable
•User clicking an ad — Two outcomes (click or no-click), influenced by countless unknowable factors, each impression is a new experiment
•Training a neural network with random initialization — Different final weights possible, outcome depends on stochastic gradient descent trajectory, repeatable with different seeds

Deterministic vs. Stochastic Processes

A deterministic process always produces the same output for the same input (e.g., f(x) = 2x + 3). A stochastic process involves randomness, so identical conditions can yield different results. Most real-world phenomena—and virtually all interesting ML problems—are stochastic because they involve unmeasured variables, noise, or inherently random mechanisms.

Why Formalize Random Experiments?

The formalization of random experiments is not merely academic rigor. It provides:

Precise definitions that eliminate ambiguity in reasoning
A common language shared across statistics, physics, engineering, and ML
The foundation for deriving powerful theorems and algorithms
Error prevention by forcing explicit statement of assumptions

Without this formalization, discussions of probability devolve into hand-waving and intuition—fine for casual conversation, dangerous for building systems that must work correctly.

Sample Spaces: The Universe of Possibilities

The sample space, denoted by Ω (capital Greek letter omega - sometimes denoted S in some textbooks), is the set of all possible outcomes of a random experiment. Every outcome appears exactly once in the sample space, and the outcomes are mutually exclusive—exactly one outcome occurs when the experiment is performed.

Formal Definition:

The sample space Ω of a random experiment is the set containing all possible outcomes ω of that experiment. When the experiment is performed, exactly one outcome ω ∈ Ω occurs.

The sample space is the starting point for all probabilistic analysis. Once we define Ω, we can then ask questions about events (subsets of Ω) and their probabilities.

Sample Spaces for Common Random Experiments
Experiment	Sample Space Ω	Type	Cardinality
Single coin flip	Ω = {H, T}	Finite, Discrete	\|Ω\| = 2
Single die roll	Ω = {1, 2, 3, 4, 5, 6}	Finite, Discrete	\|Ω\| = 6
Two coin flips	Ω = {HH, HT, TH, TT}	Finite, Discrete	\|Ω\| = 4
Number of emails received in an hour	Ω = {0, 1, 2, 3, ...} = ℕ₀	Countably Infinite, Discrete	\|Ω\| = ℵ₀
Time until server failure	Ω = [0, ∞)	Uncountably Infinite, Continuous	\|Ω\| = 𝔠
Location of a random point in a unit square	Ω = [0,1] × [0,1]	Uncountably Infinite, Continuous	\|Ω\| = 𝔠

Discrete vs. Continuous Sample Spaces

Sample spaces fall into two fundamental categories:

Discrete Sample Spaces:

Contain a finite or countably infinite number of outcomes
Outcomes can be listed (even if infinitely long): ω₁, ω₂, ω₃, ...
Examples: coin flips, die rolls, card draws, number of events

Continuous Sample Spaces:

Contain uncountably many outcomes (as many as real numbers)
Outcomes cannot be exhaustively listed
Examples: measurements, times, positions, temperatures

This distinction profoundly affects how we compute probabilities. For discrete spaces, we can assign probabilities to individual outcomes. For continuous spaces, individual outcomes typically have zero probability—we must instead work with intervals or regions.

The Subtlety of Continuous Spaces

In a continuous sample space, asking 'What is the probability that the temperature is exactly 23.4571...°C?' makes little sense—the answer is zero. Instead, we ask 'What is the probability that the temperature lies between 23°C and 24°C?' This is why probability density functions (PDFs) measure probability density per unit length, not probability of specific values.

Constructing Sample Spaces

Constructing the correct sample space is a critical skill. An improperly defined sample space leads to flawed probability calculations and incorrect conclusions. The process requires careful thought about what exactly constitutes a distinct outcome.

Key Principles for Sample Space Construction

Mutual Exclusivity: Outcomes must be distinct—if one outcome occurs, no other outcome can simultaneously occur
Exhaustiveness: The sample space must include all possible outcomes—nothing that can happen is missing
Appropriate Granularity: The level of detail must match the questions we want to answer

Sample Space Construction: Rolling Two DiceConsider rolling two distinguishable dice (e.g., one red, one blue). Different sample space constructions serve different purposes.

Input

Output

The Golden Rule of Sample Space Construction

When in doubt, use the most detailed sample space possible—the one that distinguishes outcomes whenever they could reasonably be distinguished. You can always aggregate later, but you cannot disaggregate a coarse sample space. Starting with ordered pairs for dice lets us answer questions about sums, but starting with sums alone prevents us from answering questions about specific die values.

Compound Experiments and Cartesian Products

When an experiment consists of multiple sub-experiments performed together or in sequence, we construct the overall sample space as the Cartesian product of the individual sample spaces.

If experiment A has sample space Ωₐ and experiment B has sample space Ωᵦ, then the compound experiment has sample space:

Ω = Ωₐ × Ωᵦ = {(a, b) : a ∈ Ωₐ, b ∈ Ωᵦ}

This generalizes to any number of sub-experiments:

Ω = Ω₁ × Ω₂ × ... × Ωₙ

with |Ω| = |Ω₁| × |Ω₂| × ... × |Ωₙ| for finite sample spaces.

sample_space_construction.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
import itertools
from typing import Set, Tuple, List, Any
 
def construct_cartesian_product(*sample_spaces: Set[Any]) -> Set[Tuple]:
    """
    Construct the Cartesian product of multiple sample spaces.
    
    This is the standard way to build sample spaces for compound experiments.
    
    Parameters:
        *sample_spaces: Variable number of sample space sets
    
    Returns:
        Set of tuples representing all combinations of outcomes
    
    Example:
        >>> coin = {'H', 'T'}
        >>> die = {1, 2, 3, 4, 5, 6}
        >>> compound_space = construct_cartesian_product(coin, die)
        >>> len(compound_space)  # 2 × 6 = 12
        12
    """
    return set(itertools.product(*sample_spaces))
 
 
def construct_dice_sample_space(num_dice: int, faces: int = 6) -> Set[Tuple[int, ...]]:
    """
    Construct the sample space for rolling multiple dice.
    
    Parameters:
        num_dice: Number of dice to roll
        faces: Number of faces per die (default: 6)
    
    Returns:
        Set of tuples, each representing one possible outcome
    
    Example:
        >>> space = construct_dice_sample_space(2)
        >>> len(space)  # 6^2 = 36
        36
        >>> (1, 6) in space
        True
    """
    single_die = set(range(1, faces + 1))
    return set(itertools.product(*[single_die] * num_dice))
 
 
def construct_binary_sequence_space(length: int) -> Set[Tuple[int, ...]]:
    """
    Construct sample space for binary sequences of given length.
    
    Useful for: coin flip sequences, binary classification outcomes,
    bit strings, presence/absence of features.
    
    Parameters:
        length: Length of the binary sequence
    
    Returns:
        Set of all possible binary tuples
    
    Example:
        >>> space = construct_binary_sequence_space(3)
        >>> len(space)  # 2^3 = 8
        8
        >>> (0, 1, 0) in space
        True
    """
    return set(itertools.product([0, 1], repeat=length))
 
 
# Demonstration
if __name__ == "__main__":
    # Example 1: Coin and Die
    coin = {'H', 'T'}
    die = {1, 2, 3, 4, 5, 6}
    
    space = construct_cartesian_product(coin, die)
    print(f"Coin + Die sample space size: {len(space)}")
    print(f"Sample outcomes: {list(space)[:5]}...")
    
    # Example 2: Three coin flips
    triple_coin = construct_binary_sequence_space(3)
    print(f"\nThree coin flips sample space size: {len(triple_coin)}")
    print(f"All outcomes: {sorted(triple_coin)}")
    
    # Example 3: Two dice
    two_dice = construct_dice_sample_space(2)
    print(f"\nTwo dice sample space size: {len(two_dice)}")
    
    # Count outcomes that sum to 7
    sum_seven = [outcome for outcome in two_dice if sum(outcome) == 7]
    print(f"Outcomes summing to 7: {sum_seven}")
    print(f"Count: {len(sum_seven)} out of {len(two_dice)}")

Events: Questions We Ask About Outcomes

While the sample space contains all possible outcomes, we are rarely interested in a single specific outcome. Instead, we ask questions like:

"Did the die show an even number?"
"Did the patient's test come back positive?"
"Did the model classify the image correctly?"

These questions correspond to events—collections of outcomes that share some property of interest.

Formal Definition:

An event E is a subset of the sample space: E ⊆ Ω. An event E is said to occur if the outcome ω of the experiment is an element of E (i.e., ω ∈ E).

Every event is a set, and we use set notation and operations to work with events.

Events for a Single Die Roll (Ω = {1, 2, 3, 4, 5, 6})
Event Description	Event Set E	Event Occurs When...
Roll an even number	E = {2, 4, 6}	ω ∈ {2, 4, 6}
Roll less than 4	E = {1, 2, 3}	ω ∈ {1, 2, 3}
Roll a prime number	E = {2, 3, 5}	ω ∈ {2, 3, 5}
Roll a 6	E = {6}	ω = 6
Roll anything	E = {1, 2, 3, 4, 5, 6} = Ω	Always (certain event)
Roll a 7	E = ∅ (empty set)	Never (impossible event)

Special Events

Certain Event (Ω): The sample space itself is an event that always occurs. If Ω is the set of all possible outcomes, and exactly one outcome must occur, then the event Ω occurs with absolute certainty.

Impossible Event (∅): The empty set ∅ is an event that never occurs. Since ∅ contains no outcomes, and some outcome must occur, ∅ cannot occur.

Elementary Events (Singletons): An event containing exactly one outcome is called an elementary event or simple event. For a die roll, {3} is an elementary event. These are the atomic units from which all other events are built.

Events in Machine Learning Context

In ML, events often represent model behaviors or data properties:

• E = 'Model predicts class 1' — the set of all inputs x where f(x) = 1 • E = 'Training converges' — the set of all random initializations leading to convergence • E = 'User clicks' — the set of all (user, context, item) tuples resulting in clicks

Thinking of ML behaviors as events allows us to apply the full power of probability theory.

Set Operations on Events

Since events are sets, we can combine them using standard set operations. These operations allow us to express complex questions about outcomes in terms of simpler events.

The Core Set Operations

Set Operations for Combining Events
Operation	Notation	Meaning	Event Occurs When...
Complement	E^c or Ē or E'	NOT E	E does not occur (ω ∉ E)
Union	E₁ ∪ E₂	E₁ OR E₂ (or both)	At least one of E₁, E₂ occurs
Intersection	E₁ ∩ E₂	E₁ AND E₂	Both E₁ and E₂ occur
Difference	E₁ \ E₂ or E₁ - E₂	E₁ but NOT E₂	E₁ occurs but E₂ does not
Symmetric Difference	E₁ △ E₂	Exactly one of E₁, E₂	E₁ or E₂ but not both

Set Operations in Action: Two DiceLet's work through concrete examples using the sample space for rolling two dice.

Input

Output

Mutually Exclusive (Disjoint) Events

Two events A and B are mutually exclusive (or disjoint) if they cannot both occur simultaneously:

A ∩ B = ∅

If A occurs, B definitely does not, and vice versa.

Examples:

Rolling an even number AND rolling an odd number (mutually exclusive)
Rolling a 1 AND rolling greater than 4 (mutually exclusive)
Rolling an even number AND rolling greater than 4 (NOT mutually exclusive—both occur when rolling 6)

Exhaustive Events

A collection of events {E₁, E₂, ..., Eₙ} is exhaustive if their union covers the entire sample space:

E₁ ∪ E₂ ∪ ... ∪ Eₙ = Ω

At least one of the events must occur.

Partition of the Sample Space

If events are both mutually exclusive AND exhaustive, they form a partition of Ω. Every outcome belongs to exactly one event. This is a crucial concept for decomposing probabilities.

event_operations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
from typing import Set, Tuple, FrozenSet
from itertools import product
 
# Define sample space for two dice
DICE_SAMPLE_SPACE: Set[Tuple[int, int]] = {
    (i, j) for i in range(1, 7) for j in range(1, 7)
}
 
def complement(event: Set, sample_space: Set) -> Set:
    """Return the complement of an event: E^c = Ω \ E"""
    return sample_space - event
 
def union(event_a: Set, event_b: Set) -> Set:
    """Return union of two events: A ∪ B"""
    return event_a | event_b
 
def intersection(event_a: Set, event_b: Set) -> Set:
    """Return intersection of two events: A ∩ B"""
    return event_a & event_b
 
def difference(event_a: Set, event_b: Set) -> Set:
    """Return set difference: A \ B (A but not B)"""
    return event_a - event_b
 
def symmetric_difference(event_a: Set, event_b: Set) -> Set:
    """Return symmetric difference: A △ B (exactly one of A or B)"""
    return event_a ^ event_b
 
def are_mutually_exclusive(event_a: Set, event_b: Set) -> bool:
    """Check if two events are mutually exclusive (disjoint): A ∩ B = ∅"""
    return len(intersection(event_a, event_b)) == 0
 
def are_exhaustive(events: list, sample_space: Set) -> bool:
    """Check if events cover the entire sample space: ∪ Eᵢ = Ω"""
    union_of_events = set()
    for event in events:
        union_of_events = union_of_events | event
    return union_of_events == sample_space
 
def is_partition(events: list, sample_space: Set) -> bool:
    """Check if events form a partition (mutually exclusive and exhaustive)"""
    # Check exhaustive
    if not are_exhaustive(events, sample_space):
        return False
    
    # Check pairwise mutually exclusive
    for i, event_i in enumerate(events):
        for j, event_j in enumerate(events):
            if i < j and not are_mutually_exclusive(event_i, event_j):
                return False
    return True
 
 
# Define some interesting events for two dice
def first_die_even(sample_space: Set[Tuple[int, int]]) -> Set[Tuple[int, int]]:
    """Event: First die shows an even number"""
    return {(i, j) for (i, j) in sample_space if i % 2 == 0}
 
def sum_equals(sample_space: Set[Tuple[int, int]], target: int) -> Set[Tuple[int, int]]:
    """Event: Sum of dice equals target"""
    return {(i, j) for (i, j) in sample_space if i + j == target}
 
def doubles(sample_space: Set[Tuple[int, int]]) -> Set[Tuple[int, int]]:
    """Event: Both dice show the same value"""
    return {(i, j) for (i, j) in sample_space if i == j}
 
 
if __name__ == "__main__":
    Ω = DICE_SAMPLE_SPACE
    
    # Define events
    A = first_die_even(Ω)
    B = sum_equals(Ω, 7)
    C = doubles(Ω)
    
    print(f"Sample space size: |Ω| = {len(Ω)}")
    print(f"|A| (first die even): {len(A)}")
    print(f"|B| (sum equals 7): {len(B)}")
    print(f"|C| (doubles): {len(C)}")
    
    print(f"\n|A ∩ B|: {len(intersection(A, B))}")
    print(f"A ∩ B = {intersection(A, B)}")
    
    print(f"\n|A ∪ C|: {len(union(A, C))}")
    
    print(f"\nAre B and C mutually exclusive? {are_mutually_exclusive(B, C)}")
    print(f"B ∩ C = {intersection(B, C)}")
    
    # Check partition properties
    E1 = first_die_even(Ω)
    E2 = complement(E1, Ω)  # First die odd
    print(f"\nIs {{E1, E2}} a partition? {is_partition([E1, E2], Ω)}")

De Morgan's Laws: The Fundamental Duality

De Morgan's Laws are among the most important identities in set theory and probability. They describe the relationship between unions, intersections, and complements.

De Morgan's Laws:

(A ∪ B)^c = A^c ∩ B^c

The complement of a union is the intersection of the complements.

(A ∩ B)^c = A^c ∪ B^c

The complement of an intersection is the union of the complements.

These laws generalize to any finite or countable collection of events:

(∪ᵢ Aᵢ)^c = ∩ᵢ Aᵢ^c

(∩ᵢ Aᵢ)^c = ∪ᵢ Aᵢ^c

Intuitive Understanding of De Morgan's Laws

First Law: 'NOT (A or B)' means 'NOT A and NOT B'—if neither A nor B happened, then both failed to happen.

Second Law: 'NOT (A and B)' means 'NOT A or NOT B'—if it's false that both happened, then at least one didn't happen.

Think of it as: when you take a complement, unions and intersections swap roles.

De Morgan's Laws in ML: Model Failure AnalysisDe Morgan's laws help us reason about when models succeed or fail.

Input

Output

Why De Morgan's Laws Matter in Probability

De Morgan's Laws allow us to transform complex probability calculations:

P(A ∪ B ∪ C) — The probability that at least one event occurs

Can be computed as:

P(A ∪ B ∪ C) = 1 - P((A ∪ B ∪ C)^c) = 1 - P(A^c ∩ B^c ∩ C^c)

If A, B, C are independent, this becomes much easier to calculate:

= 1 - P(A^c)P(B^c)P(C^c)

This technique—computing the probability of 'at least one' by subtracting the probability of 'none'—is ubiquitous in probability and statistics.

Sigma-Algebras: The Event Space (Advanced)

For finite sample spaces, we typically consider all possible subsets as valid events. But for infinite sample spaces—especially continuous ones—this approach creates mathematical paradoxes. The solution is to carefully specify which subsets qualify as events.

Definition: σ-algebra (Sigma-algebra)

A σ-algebra (or σ-field) on Ω is a collection ℱ of subsets of Ω satisfying:

Ω ∈ ℱ — The sample space is an event
If A ∈ ℱ, then A^c ∈ ℱ — Closed under complementation
If A₁, A₂, A₃, ... ∈ ℱ, then ∪ᵢ₌₁^∞ Aᵢ ∈ ℱ — Closed under countable unions

The pair (Ω, ℱ) is called a measurable space, and elements of ℱ are called measurable sets or events.

Why Restrict Events?

For continuous sample spaces like [0,1], the power set (all subsets) is 'too big'—it contains pathological sets for which probability cannot be consistently defined. The Vitali set, for example, leads to paradoxes if we try to assign it a probability. σ-algebras exclude these problematic sets while keeping all 'reasonable' events.

Common σ-algebras

Trivial σ-algebra: ℱ = {∅, Ω}

Only two events: something happens, or nothing happens
Carries no information about the experiment

Power set: ℱ = 2^Ω (all subsets)

Used for finite sample spaces
Every subset is a valid event

Borel σ-algebra on ℝ: ℬ(ℝ)

The smallest σ-algebra containing all open intervals
Includes all 'reasonable' subsets of real numbers
Used for continuous probability distributions

The Probability Triple

A complete probability model is specified by the probability triple (Ω, ℱ, P):

Ω — The sample space (all possible outcomes)
ℱ — The σ-algebra (events we can assign probabilities to)
P — The probability measure (assigns probabilities to events)

This rigorous foundation, developed by Andrey Kolmogorov in 1933, resolved centuries of ambiguity about probability and enables the advanced probabilistic machinery used in modern machine learning.

For Practical ML Purposes

While σ-algebras are essential for measure-theoretic probability, most ML practitioners can work effectively without deep measure theory knowledge. The key takeaway is that not every subset of a continuous sample space is a valid event, and mathematicians have developed rigorous frameworks (σ-algebras, Lebesgue measure) to handle this subtlety. These frameworks justify the probability rules we use daily.

Summary: Sample Spaces and Events in ML

We've established the foundational concepts of probability theory. Let's consolidate the key ideas and connect them explicitly to machine learning.

Core Concepts Mastered

•Random Experiment — A repeatable procedure with unpredictable outcomes; the fundamental unit of probabilistic analysis
•Sample Space (Ω) — The set of all possible outcomes; can be finite, countably infinite, or uncountably infinite (continuous)
•Events — Subsets of the sample space representing questions about outcomes; combined using set operations
•Set Operations — Complement, union, intersection, difference—allow us to express complex questions from simple events
•De Morgan's Laws — Fundamental identities connecting complements, unions, and intersections
•Mutually Exclusive Events — Events that cannot co-occur (A ∩ B = ∅)
•Partitions — Collections of mutually exclusive, exhaustive events covering the entire sample space
•σ-algebras — Collections of 'valid' events, essential for continuous sample spaces

Sample Spaces and Events in ML Applications
ML Application	Sample Space	Key Events	Why It Matters
Binary Classification	Ω = {0, 1} per sample	E = 'Positive class'	Defines precision, recall, accuracy
Multiclass Classification	Ω = {1, 2, ..., K}	Eₖ = 'Predict class k'	Partition structure enables multi-class metrics
Regression	Ω = ℝ (continuous)	E = 'Prediction within ε of true value'	Continuous Ω requires density functions
Sequence Modeling	Ω = Σⁿ (all sequences)	E = 'Sequence starts with pattern'	Exponentially large sample spaces
Reinforcement Learning	Ω = All possible trajectories	E = 'Reach goal state'	Stochastic policies sample from Ω
Generative Models (VAE, GAN)	Ω = All possible outputs	E = 'Generated sample is realistic'	Learning to sample from complex Ω

What's Next:

With sample spaces and events established, we're ready to assign probabilities to events. The next page covers the Axioms of Probability—the minimal set of rules that any valid probability measure must satisfy. These axioms, formulated by Kolmogorov, provide the logical foundation for all probabilistic reasoning.

Page Complete

You now understand how to define and work with sample spaces and events—the vocabulary of probability theory. You can construct sample spaces for various experiments, define events as subsets, combine events using set operations, and recognize the mathematical structures (σ-algebras) that make continuous probability rigorous. Next, we'll see how to assign numbers—probabilities—to these events.

Sample Spaces and Events

The Language of Uncertainty

What You Will Learn

Random Experiments: The Foundation

Well-defined outcomes: Before the experiment, we can enumerate all possible results
Unpredictable individual result: We cannot determine with certainty which specific outcome will occur
Repeatable in principle: The experiment can be repeated under identical conditions

These criteria may seem abstract, but they precisely capture the nature of uncertainty we encounter throughout machine learning.

Examples of Random Experiments

•Coin flip — Two possible outcomes (heads, tails), each flip is unpredictable, can be repeated indefinitely
•Rolling a die — Six possible outcomes (1, 2, 3, 4, 5, 6), each roll is unpredictable, infinitely repeatable
•Drawing a card from a shuffled deck — 52 possible outcomes, unpredictable after shuffling, repeatable with re-shuffling
•Measuring a patient's blood pressure — Continuous range of possible values, affected by many uncontrollable factors, repeatable
•User clicking an ad — Two outcomes (click or no-click), influenced by countless unknowable factors, each impression is a new experiment
•Training a neural network with random initialization — Different final weights possible, outcome depends on stochastic gradient descent trajectory, repeatable with different seeds

Deterministic vs. Stochastic Processes

Why Formalize Random Experiments?

The formalization of random experiments is not merely academic rigor. It provides:

Precise definitions that eliminate ambiguity in reasoning
A common language shared across statistics, physics, engineering, and ML
The foundation for deriving powerful theorems and algorithms
Error prevention by forcing explicit statement of assumptions

Without this formalization, discussions of probability devolve into hand-waving and intuition—fine for casual conversation, dangerous for building systems that must work correctly.

Sample Spaces: The Universe of Possibilities

Formal Definition:

The sample space Ω of a random experiment is the set containing all possible outcomes ω of that experiment. When the experiment is performed, exactly one outcome ω ∈ Ω occurs.

The sample space is the starting point for all probabilistic analysis. Once we define Ω, we can then ask questions about events (subsets of Ω) and their probabilities.

Sample Spaces for Common Random Experiments
Experiment	Sample Space Ω	Type	Cardinality
Single coin flip	Ω = {H, T}	Finite, Discrete	\|Ω\| = 2
Single die roll	Ω = {1, 2, 3, 4, 5, 6}	Finite, Discrete	\|Ω\| = 6
Two coin flips	Ω = {HH, HT, TH, TT}	Finite, Discrete	\|Ω\| = 4
Number of emails received in an hour	Ω = {0, 1, 2, 3, ...} = ℕ₀	Countably Infinite, Discrete	\|Ω\| = ℵ₀
Time until server failure	Ω = [0, ∞)	Uncountably Infinite, Continuous	\|Ω\| = 𝔠
Location of a random point in a unit square	Ω = [0,1] × [0,1]	Uncountably Infinite, Continuous	\|Ω\| = 𝔠

Discrete vs. Continuous Sample Spaces

Sample spaces fall into two fundamental categories:

Discrete Sample Spaces:

Contain a finite or countably infinite number of outcomes
Outcomes can be listed (even if infinitely long): ω₁, ω₂, ω₃, ...
Examples: coin flips, die rolls, card draws, number of events

Continuous Sample Spaces:

Contain uncountably many outcomes (as many as real numbers)
Outcomes cannot be exhaustively listed
Examples: measurements, times, positions, temperatures

The Subtlety of Continuous Spaces

Constructing Sample Spaces

Key Principles for Sample Space Construction

Mutual Exclusivity: Outcomes must be distinct—if one outcome occurs, no other outcome can simultaneously occur
Exhaustiveness: The sample space must include all possible outcomes—nothing that can happen is missing
Appropriate Granularity: The level of detail must match the questions we want to answer

Sample Space Construction: Rolling Two DiceConsider rolling two distinguishable dice (e.g., one red, one blue). Different sample space constructions serve different purposes.

Input

Output

The Golden Rule of Sample Space Construction

Compound Experiments and Cartesian Products

When an experiment consists of multiple sub-experiments performed together or in sequence, we construct the overall sample space as the Cartesian product of the individual sample spaces.

If experiment A has sample space Ωₐ and experiment B has sample space Ωᵦ, then the compound experiment has sample space:

Ω = Ωₐ × Ωᵦ = {(a, b) : a ∈ Ωₐ, b ∈ Ωᵦ}

This generalizes to any number of sub-experiments:

Ω = Ω₁ × Ω₂ × ... × Ωₙ

with |Ω| = |Ω₁| × |Ω₂| × ... × |Ωₙ| for finite sample spaces.

sample_space_construction.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
import itertools
from typing import Set, Tuple, List, Any
 
def construct_cartesian_product(*sample_spaces: Set[Any]) -> Set[Tuple]:
    """
    Construct the Cartesian product of multiple sample spaces.
    
    This is the standard way to build sample spaces for compound experiments.
    
    Parameters:
        *sample_spaces: Variable number of sample space sets
    
    Returns:
        Set of tuples representing all combinations of outcomes
    
    Example:
        >>> coin = {'H', 'T'}
        >>> die = {1, 2, 3, 4, 5, 6}
        >>> compound_space = construct_cartesian_product(coin, die)
        >>> len(compound_space)  # 2 × 6 = 12
        12
    """
    return set(itertools.product(*sample_spaces))
 
 
def construct_dice_sample_space(num_dice: int, faces: int = 6) -> Set[Tuple[int, ...]]:
    """
    Construct the sample space for rolling multiple dice.
    
    Parameters:
        num_dice: Number of dice to roll
        faces: Number of faces per die (default: 6)
    
    Returns:
        Set of tuples, each representing one possible outcome
    
    Example:
        >>> space = construct_dice_sample_space(2)
        >>> len(space)  # 6^2 = 36
        36
        >>> (1, 6) in space
        True
    """
    single_die = set(range(1, faces + 1))
    return set(itertools.product(*[single_die] * num_dice))
 
 
def construct_binary_sequence_space(length: int) -> Set[Tuple[int, ...]]:
    """
    Construct sample space for binary sequences of given length.
    
    Useful for: coin flip sequences, binary classification outcomes,
    bit strings, presence/absence of features.
    
    Parameters:
        length: Length of the binary sequence
    
    Returns:
        Set of all possible binary tuples
    
    Example:
        >>> space = construct_binary_sequence_space(3)
        >>> len(space)  # 2^3 = 8
        8
        >>> (0, 1, 0) in space
        True
    """
    return set(itertools.product([0, 1], repeat=length))
 
 
# Demonstration
if __name__ == "__main__":
    # Example 1: Coin and Die
    coin = {'H', 'T'}
    die = {1, 2, 3, 4, 5, 6}
    
    space = construct_cartesian_product(coin, die)
    print(f"Coin + Die sample space size: {len(space)}")
    print(f"Sample outcomes: {list(space)[:5]}...")
    
    # Example 2: Three coin flips
    triple_coin = construct_binary_sequence_space(3)
    print(f"\nThree coin flips sample space size: {len(triple_coin)}")
    print(f"All outcomes: {sorted(triple_coin)}")
    
    # Example 3: Two dice
    two_dice = construct_dice_sample_space(2)
    print(f"\nTwo dice sample space size: {len(two_dice)}")
    
    # Count outcomes that sum to 7
    sum_seven = [outcome for outcome in two_dice if sum(outcome) == 7]
    print(f"Outcomes summing to 7: {sum_seven}")
    print(f"Count: {len(sum_seven)} out of {len(two_dice)}")

Events: Questions We Ask About Outcomes

While the sample space contains all possible outcomes, we are rarely interested in a single specific outcome. Instead, we ask questions like:

"Did the die show an even number?"
"Did the patient's test come back positive?"
"Did the model classify the image correctly?"

These questions correspond to events—collections of outcomes that share some property of interest.

Formal Definition:

An event E is a subset of the sample space: E ⊆ Ω. An event E is said to occur if the outcome ω of the experiment is an element of E (i.e., ω ∈ E).

Every event is a set, and we use set notation and operations to work with events.

Events for a Single Die Roll (Ω = {1, 2, 3, 4, 5, 6})
Event Description	Event Set E	Event Occurs When...
Roll an even number	E = {2, 4, 6}	ω ∈ {2, 4, 6}
Roll less than 4	E = {1, 2, 3}	ω ∈ {1, 2, 3}
Roll a prime number	E = {2, 3, 5}	ω ∈ {2, 3, 5}
Roll a 6	E = {6}	ω = 6
Roll anything	E = {1, 2, 3, 4, 5, 6} = Ω	Always (certain event)
Roll a 7	E = ∅ (empty set)	Never (impossible event)

Special Events

Impossible Event (∅): The empty set ∅ is an event that never occurs. Since ∅ contains no outcomes, and some outcome must occur, ∅ cannot occur.

Events in Machine Learning Context

In ML, events often represent model behaviors or data properties:

Thinking of ML behaviors as events allows us to apply the full power of probability theory.

Set Operations on Events

Since events are sets, we can combine them using standard set operations. These operations allow us to express complex questions about outcomes in terms of simpler events.

The Core Set Operations

Set Operations for Combining Events
Operation	Notation	Meaning	Event Occurs When...
Complement	E^c or Ē or E'	NOT E	E does not occur (ω ∉ E)
Union	E₁ ∪ E₂	E₁ OR E₂ (or both)	At least one of E₁, E₂ occurs
Intersection	E₁ ∩ E₂	E₁ AND E₂	Both E₁ and E₂ occur
Difference	E₁ \ E₂ or E₁ - E₂	E₁ but NOT E₂	E₁ occurs but E₂ does not
Symmetric Difference	E₁ △ E₂	Exactly one of E₁, E₂	E₁ or E₂ but not both

Set Operations in Action: Two DiceLet's work through concrete examples using the sample space for rolling two dice.

Input

Output

Mutually Exclusive (Disjoint) Events

Two events A and B are mutually exclusive (or disjoint) if they cannot both occur simultaneously:

A ∩ B = ∅

If A occurs, B definitely does not, and vice versa.

Examples:

Rolling an even number AND rolling an odd number (mutually exclusive)
Rolling a 1 AND rolling greater than 4 (mutually exclusive)
Rolling an even number AND rolling greater than 4 (NOT mutually exclusive—both occur when rolling 6)

Exhaustive Events

A collection of events {E₁, E₂, ..., Eₙ} is exhaustive if their union covers the entire sample space:

E₁ ∪ E₂ ∪ ... ∪ Eₙ = Ω

At least one of the events must occur.

Partition of the Sample Space

If events are both mutually exclusive AND exhaustive, they form a partition of Ω. Every outcome belongs to exactly one event. This is a crucial concept for decomposing probabilities.

event_operations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
from typing import Set, Tuple, FrozenSet
from itertools import product
 
# Define sample space for two dice
DICE_SAMPLE_SPACE: Set[Tuple[int, int]] = {
    (i, j) for i in range(1, 7) for j in range(1, 7)
}
 
def complement(event: Set, sample_space: Set) -> Set:
    """Return the complement of an event: E^c = Ω \ E"""
    return sample_space - event
 
def union(event_a: Set, event_b: Set) -> Set:
    """Return union of two events: A ∪ B"""
    return event_a | event_b
 
def intersection(event_a: Set, event_b: Set) -> Set:
    """Return intersection of two events: A ∩ B"""
    return event_a & event_b
 
def difference(event_a: Set, event_b: Set) -> Set:
    """Return set difference: A \ B (A but not B)"""
    return event_a - event_b
 
def symmetric_difference(event_a: Set, event_b: Set) -> Set:
    """Return symmetric difference: A △ B (exactly one of A or B)"""
    return event_a ^ event_b
 
def are_mutually_exclusive(event_a: Set, event_b: Set) -> bool:
    """Check if two events are mutually exclusive (disjoint): A ∩ B = ∅"""
    return len(intersection(event_a, event_b)) == 0
 
def are_exhaustive(events: list, sample_space: Set) -> bool:
    """Check if events cover the entire sample space: ∪ Eᵢ = Ω"""
    union_of_events = set()
    for event in events:
        union_of_events = union_of_events | event
    return union_of_events == sample_space
 
def is_partition(events: list, sample_space: Set) -> bool:
    """Check if events form a partition (mutually exclusive and exhaustive)"""
    # Check exhaustive
    if not are_exhaustive(events, sample_space):
        return False
    
    # Check pairwise mutually exclusive
    for i, event_i in enumerate(events):
        for j, event_j in enumerate(events):
            if i < j and not are_mutually_exclusive(event_i, event_j):
                return False
    return True
 
 
# Define some interesting events for two dice
def first_die_even(sample_space: Set[Tuple[int, int]]) -> Set[Tuple[int, int]]:
    """Event: First die shows an even number"""
    return {(i, j) for (i, j) in sample_space if i % 2 == 0}
 
def sum_equals(sample_space: Set[Tuple[int, int]], target: int) -> Set[Tuple[int, int]]:
    """Event: Sum of dice equals target"""
    return {(i, j) for (i, j) in sample_space if i + j == target}
 
def doubles(sample_space: Set[Tuple[int, int]]) -> Set[Tuple[int, int]]:
    """Event: Both dice show the same value"""
    return {(i, j) for (i, j) in sample_space if i == j}
 
 
if __name__ == "__main__":
    Ω = DICE_SAMPLE_SPACE
    
    # Define events
    A = first_die_even(Ω)
    B = sum_equals(Ω, 7)
    C = doubles(Ω)
    
    print(f"Sample space size: |Ω| = {len(Ω)}")
    print(f"|A| (first die even): {len(A)}")
    print(f"|B| (sum equals 7): {len(B)}")
    print(f"|C| (doubles): {len(C)}")
    
    print(f"\n|A ∩ B|: {len(intersection(A, B))}")
    print(f"A ∩ B = {intersection(A, B)}")
    
    print(f"\n|A ∪ C|: {len(union(A, C))}")
    
    print(f"\nAre B and C mutually exclusive? {are_mutually_exclusive(B, C)}")
    print(f"B ∩ C = {intersection(B, C)}")
    
    # Check partition properties
    E1 = first_die_even(Ω)
    E2 = complement(E1, Ω)  # First die odd
    print(f"\nIs {{E1, E2}} a partition? {is_partition([E1, E2], Ω)}")

De Morgan's Laws: The Fundamental Duality

De Morgan's Laws are among the most important identities in set theory and probability. They describe the relationship between unions, intersections, and complements.

De Morgan's Laws:

(A ∪ B)^c = A^c ∩ B^c

The complement of a union is the intersection of the complements.

(A ∩ B)^c = A^c ∪ B^c

The complement of an intersection is the union of the complements.

These laws generalize to any finite or countable collection of events:

(∪ᵢ Aᵢ)^c = ∩ᵢ Aᵢ^c

(∩ᵢ Aᵢ)^c = ∪ᵢ Aᵢ^c

Intuitive Understanding of De Morgan's Laws

First Law: 'NOT (A or B)' means 'NOT A and NOT B'—if neither A nor B happened, then both failed to happen.

Second Law: 'NOT (A and B)' means 'NOT A or NOT B'—if it's false that both happened, then at least one didn't happen.

Think of it as: when you take a complement, unions and intersections swap roles.

De Morgan's Laws in ML: Model Failure AnalysisDe Morgan's laws help us reason about when models succeed or fail.

Input

Output

Why De Morgan's Laws Matter in Probability

De Morgan's Laws allow us to transform complex probability calculations:

P(A ∪ B ∪ C) — The probability that at least one event occurs

Can be computed as:

P(A ∪ B ∪ C) = 1 - P((A ∪ B ∪ C)^c) = 1 - P(A^c ∩ B^c ∩ C^c)

If A, B, C are independent, this becomes much easier to calculate:

= 1 - P(A^c)P(B^c)P(C^c)

This technique—computing the probability of 'at least one' by subtracting the probability of 'none'—is ubiquitous in probability and statistics.

Sigma-Algebras: The Event Space (Advanced)

Definition: σ-algebra (Sigma-algebra)

A σ-algebra (or σ-field) on Ω is a collection ℱ of subsets of Ω satisfying:

Ω ∈ ℱ — The sample space is an event
If A ∈ ℱ, then A^c ∈ ℱ — Closed under complementation
If A₁, A₂, A₃, ... ∈ ℱ, then ∪ᵢ₌₁^∞ Aᵢ ∈ ℱ — Closed under countable unions

The pair (Ω, ℱ) is called a measurable space, and elements of ℱ are called measurable sets or events.

Why Restrict Events?

Common σ-algebras

Trivial σ-algebra: ℱ = {∅, Ω}

Only two events: something happens, or nothing happens
Carries no information about the experiment

Power set: ℱ = 2^Ω (all subsets)

Used for finite sample spaces
Every subset is a valid event

Borel σ-algebra on ℝ: ℬ(ℝ)

The smallest σ-algebra containing all open intervals
Includes all 'reasonable' subsets of real numbers
Used for continuous probability distributions

The Probability Triple

A complete probability model is specified by the probability triple (Ω, ℱ, P):

Ω — The sample space (all possible outcomes)
ℱ — The σ-algebra (events we can assign probabilities to)
P — The probability measure (assigns probabilities to events)

This rigorous foundation, developed by Andrey Kolmogorov in 1933, resolved centuries of ambiguity about probability and enables the advanced probabilistic machinery used in modern machine learning.

For Practical ML Purposes

Summary: Sample Spaces and Events in ML

We've established the foundational concepts of probability theory. Let's consolidate the key ideas and connect them explicitly to machine learning.

Core Concepts Mastered

•Random Experiment — A repeatable procedure with unpredictable outcomes; the fundamental unit of probabilistic analysis
•Sample Space (Ω) — The set of all possible outcomes; can be finite, countably infinite, or uncountably infinite (continuous)
•Events — Subsets of the sample space representing questions about outcomes; combined using set operations
•Set Operations — Complement, union, intersection, difference—allow us to express complex questions from simple events
•De Morgan's Laws — Fundamental identities connecting complements, unions, and intersections
•Mutually Exclusive Events — Events that cannot co-occur (A ∩ B = ∅)
•Partitions — Collections of mutually exclusive, exhaustive events covering the entire sample space
•σ-algebras — Collections of 'valid' events, essential for continuous sample spaces

Sample Spaces and Events in ML Applications
ML Application	Sample Space	Key Events	Why It Matters
Binary Classification	Ω = {0, 1} per sample	E = 'Positive class'	Defines precision, recall, accuracy
Multiclass Classification	Ω = {1, 2, ..., K}	Eₖ = 'Predict class k'	Partition structure enables multi-class metrics
Regression	Ω = ℝ (continuous)	E = 'Prediction within ε of true value'	Continuous Ω requires density functions
Sequence Modeling	Ω = Σⁿ (all sequences)	E = 'Sequence starts with pattern'	Exponentially large sample spaces
Reinforcement Learning	Ω = All possible trajectories	E = 'Reach goal state'	Stochastic policies sample from Ω
Generative Models (VAE, GAN)	Ω = All possible outputs	E = 'Generated sample is realistic'	Learning to sample from complex Ω

What's Next:

Page Complete