Hidden Markov Models - Learning Module

Loading content...

0/245

Applications

HMMs in the Real World

Hidden Markov Models aren't just elegant mathematical constructs—they have been the backbone of major technological advances across multiple domains. For decades, HMMs powered the world's best speech recognition systems. They remain fundamental in bioinformatics for gene finding and protein analysis. They enable natural language processing systems for part-of-speech tagging and named entity recognition.

Understanding how HMMs are applied in practice reveals:

How to design state spaces and observation models for specific problems
The trade-offs between model complexity and computational cost
How HMMs integrate with other techniques (neural networks, feature engineering)
When HMMs are the right choice versus modern alternatives like deep learning

This page surveys the most important application domains, providing concrete examples of how the algorithms we've learned translate into working systems.

What You Will Learn

By the end of this page, you will:

• Understand how HMMs revolutionized speech recognition • See how sequence labeling tasks (POS tagging, NER) use HMMs • Explore gene finding and biological sequence analysis • Understand gesture recognition and handwriting systems • Know when to use HMMs versus modern deep learning alternatives

Speech Recognition

The Speech Recognition Problem

Given an acoustic signal (audio waveform), transcribe the spoken words. This involves:

Acoustic modeling: Mapping audio features to phonemes/subword units
Language modeling: Ensuring output forms valid word sequences
Decoding: Finding the most likely transcription

HMMs dominated speech recognition from the 1980s until the deep learning revolution around 2012.

The HMM-Based Architecture

States: Phonemes or sub-phoneme units (typically 3 states per phoneme: beginning, middle, end)

Observations: Acoustic feature vectors, typically:

MFCCs (Mel-Frequency Cepstral Coefficients): ~13-39 dimensional
Computed every 10ms from overlapping windows

Emissions: Gaussian Mixture Models (GMM-HMMs)

Each state emits from a mixture of Gaussians
Captures the variability in how phonemes sound

Transitions: Left-to-right (Bakis) topology

States can only stay the same or move forward
Natural for modeling phoneme duration

Converting Mermaid diagram...

Training and Decoding

Training:

Collect transcribed audio (labeled data)
Forced alignment: Align audio to phoneme sequence using Viterbi
Baum-Welch: Refine parameters using forward-backward
Embedded training: Train word-level models from sentence-level labels

Decoding:

Viterbi search over combined acoustic + language model
Language model provides word-level constraints
Beam search for computational tractability

The GMM-HMM to DNN-HMM Transition

Around 2010-2012, Deep Neural Networks replaced GMMs for emission modeling:

DNNs estimate $P(\text{state} | \text{acoustic features})$
HMM structure still handles temporal dynamics
This hybrid DNN-HMM approach dramatically improved accuracy

Eventually, end-to-end neural models (CTC, attention-based) replaced HMMs entirely in most modern systems. But HMM principles—sequential modeling, forward-backward inference—still influence these architectures.

The HMM Legacy in Speech

Even though pure HMM systems are now rare in production speech recognition, their influence is profound:

• Connectionist Temporal Classification (CTC) is essentially a special HMM • Alignment concepts from HMMs inform attention mechanisms • Hybrid systems still use HMM decoder structures • Understanding HMMs is essential for understanding modern speech systems

Part-of-Speech Tagging

The POS Tagging Problem

Given a sentence, assign a grammatical tag (noun, verb, adjective, ...) to each word.

Example:

Sentence: The    dog   runs  quickly
Tags:     DET    NOUN  VERB  ADV

POS tagging is a fundamental NLP task that feeds into parsing, named entity recognition, and other downstream applications.

HMM Formulation

States: Part-of-speech tags (tagset size ~45 for Penn Treebank)

Observations: Words

Transition Matrix: $P(\text{tag}t | \text{tag}{t-1})$

Captures syntactic regularities (e.g., determiners followed by nouns)
Learned from tagged corpora by counting bigrams

Emission Distribution: $P(\text{word} | \text{tag})$

Learned from tagged corpora
Challenge: Handle unseen words (OOV problem)

The Generative Story

Generate first tag from initial distribution: $z_1 \sim \pi$
Generate first word from tag: $x_1 \sim P(\text{word} | z_1)$
For each subsequent position:
- Transition to next tag: $z_t \sim P(\cdot | z_{t-1})$
- Emit word: $x_t \sim P(\text{word} | z_t)$

pos_tagging_hmm.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
import numpy as np
from collections import defaultdict
from typing import List, Tuple
 
class POSTaggerHMM:
    """
    HMM-based Part-of-Speech tagger.
    
    Demonstrates supervised training from tagged corpus
    and Viterbi decoding for inference.
    """
    
    def __init__(self, smoothing: float = 0.1):
        self.smoothing = smoothing
        self.tag2idx = {}
        self.word2idx = {}
        self.idx2tag = {}
        
        # Parameters (populated during training)
        self.A = None  # Transition probabilities
        self.B = None  # Emission probabilities
        self.pi = None  # Initial distribution
    
    def train(self, tagged_sentences: List[List[Tuple[str, str]]]):
        """
        Train from labeled data (supervised learning).
        
        Parameters
        ----------
        tagged_sentences : List of sentences, where each sentence is
                          a list of (word, tag) tuples
        """
        # Build vocabularies
        for sentence in tagged_sentences:
            for word, tag in sentence:
                if tag not in self.tag2idx:
                    idx = len(self.tag2idx)
                    self.tag2idx[tag] = idx
                    self.idx2tag[idx] = tag
                if word not in self.word2idx:
                    self.word2idx[word] = len(self.word2idx)
        
        # Add UNK token
        self.word2idx['<UNK>'] = len(self.word2idx)
        
        N = len(self.tag2idx)  # Number of tags
        V = len(self.word2idx)  # Vocabulary size
        
        # Count statistics
        initial_counts = np.zeros(N) + self.smoothing
        transition_counts = np.zeros((N, N)) + self.smoothing
        emission_counts = np.zeros((N, V)) + self.smoothing
        
        for sentence in tagged_sentences:
            prev_tag_idx = None
            for i, (word, tag) in enumerate(sentence):
                tag_idx = self.tag2idx[tag]
                word_idx = self.word2idx[word]
                
                if i == 0:
                    initial_counts[tag_idx] += 1
                else:
                    transition_counts[prev_tag_idx, tag_idx] += 1
                
                emission_counts[tag_idx, word_idx] += 1
                prev_tag_idx = tag_idx
        
        # Normalize to get probabilities
        self.pi = initial_counts / initial_counts.sum()
        self.A = transition_counts / transition_counts.sum(axis=1, keepdims=True)
        self.B = emission_counts / emission_counts.sum(axis=1, keepdims=True)
    
    def tag(self, words: List[str]) -> List[str]:
        """
        Tag a sentence using Viterbi decoding.
        """
        T = len(words)
        N = len(self.tag2idx)
        
        # Map words to indices (use UNK for unknown words)
        obs = [self.word2idx.get(w, self.word2idx['<UNK>']) for w in words]
        
        # Viterbi in log space
        log_A = np.log(self.A)
        log_B = np.log(self.B)
        log_pi = np.log(self.pi)
        
        delta = np.zeros((T, N))
        psi = np.zeros((T, N), dtype=int)
        
        # Initialization
        delta[0] = log_pi + log_B[:, obs[0]]
        
        # Recursion
        for t in range(1, T):
            for j in range(N):
                candidates = delta[t-1] + log_A[:, j]
                psi[t, j] = np.argmax(candidates)
                delta[t, j] = candidates[psi[t, j]] + log_B[j, obs[t]]
        
        # Backtracking
        path = np.zeros(T, dtype=int)
        path[-1] = np.argmax(delta[-1])
        for t in range(T-2, -1, -1):
            path[t] = psi[t+1, path[t+1]]
        
        return [self.idx2tag[i] for i in path]
 
 
# Example usage
def demo_pos_tagger():
    # Simple training data
    training_data = [
        [('the', 'DET'), ('dog', 'NOUN'), ('runs', 'VERB')],
        [('a', 'DET'), ('cat', 'NOUN'), ('sleeps', 'VERB')],
        [('the', 'DET'), ('big', 'ADJ'), ('dog', 'NOUN'), ('barks', 'VERB')],
        [('dogs', 'NOUN'), ('run', 'VERB'), ('fast', 'ADV')],
        [('the', 'DET'), ('quick', 'ADJ'), ('fox', 'NOUN'), ('jumps', 'VERB')],
    ]
    
    tagger = POSTaggerHMM()
    tagger.train(training_data)
    
    # Test sentences
    test_sentences = [
        ['the', 'cat', 'runs'],
        ['a', 'big', 'dog', 'barks'],
        ['cats', 'sleep'],  # 'cats' is OOV
    ]
    
    print("POS Tagging Results:")
    print("-" * 50)
    for sentence in test_sentences:
        tags = tagger.tag(sentence)
        print(f"Words: {' '.join(sentence)}")
        print(f"Tags:  {' '.join(tags)}\n")
 
demo_pos_tagger()

Handling Unknown Words

A major challenge in HMM POS tagging is out-of-vocabulary (OOV) words—words not seen during training. Solutions include:

• Unknown word token: Map all OOV to <UNK>, trained on rare words • Suffix features: Words ending in "-ly" likely adverbs, "-tion" likely nouns • Word shape: Capitalized words may be proper nouns • Character-level models: Use spelling patterns for emission probability

Modern systems use these features in discriminative models (CRFs) or neural taggers.

Named Entity Recognition

The NER Problem

Identify and classify named entities in text—persons, organizations, locations, dates, etc.

Example:

Text:    Apple    Inc.   announced  that  Tim    Cook   will   visit   London
Labels:  B-ORG    I-ORG  O          O     B-PER  I-PER  O      O       B-LOC

The BIO Tagging Scheme

NER uses a special label scheme to mark entity boundaries:

B-X: Beginning of entity type X
I-X: Inside (continuation of) entity type X
O: Outside any entity

This converts the entity recognition problem into a sequence labeling problem that HMMs can handle.

HMM Formulation

States: BIO tags (e.g., B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC, O)

Observations: Words and features (capitalization, word shape)

Transition Constraints:

I-X can only follow B-X or I-X (same entity type)
O cannot follow I-X (entity must have proper ending)

These constraints can be enforced by setting transition probabilities to zero for invalid transitions.

BIO Tag Transition Validity
From \ To	B-PER	I-PER	B-ORG	I-ORG	O
B-PER	✓	✓	✓	✗	✓
I-PER	✓	✓	✓	✗	✓
B-ORG	✓	✗	✓	✓	✓
I-ORG	✓	✗	✓	✓	✓
O	✓	✗	✓	✗	✓

Feature Engineering for NER

Raw words are insufficient for NER—we need rich features:

Word-level features:

Capitalization (initial cap, all caps, lowercase)
Word shape (Xxxx → Initial cap, XXXX → All caps)
Prefixes/suffixes
Part-of-speech tag

Context features:

Previous/next word
Features of surrounding words

Gazetteers (external knowledge):

Lists of known entities (person names, cities, companies)
Binary feature: "is word in person name list?"

For HMMs, these features must be incorporated into the observation model—either by discretizing features or using Gaussian emissions for continuous features.

From HMMs to CRFs

HMMs are generative models: they model $P(x, z) = P(x|z)P(z)$. This requires specifying how observations are generated from states.

In NER, we want to use arbitrarily complex features that describe the relationship between words and labels. Conditional Random Fields (CRFs) are the discriminative counterpart: they model $P(z|x)$ directly, allowing any features without generative assumptions.

Modern NER systems typically use neural CRFs or transformer-based models (BERT + token classification), but understanding HMMs provides the foundation for these advances.

Gene Finding and Bioinformatics

The Gene Finding Problem

Given a DNA sequence (string over {A, C, G, T}), identify protein-coding genes:

Where do genes start and stop?
Which regions are coding (exons) vs. non-coding (introns)?

HMMs excel here because gene structure has clear sequential dependencies: genes have start codons, splice sites, stop codons, and characteristic statistics.

Gene Structure

─────────────[===EXON===]────INTRON────[===EXON===]──────────
            ATG                         ...        TGA
         (start codon)                           (stop codon)

HMM for Gene Finding (GENSCAN-style)

States:

Intergenic: Non-gene regions
Promoter: Gene regulatory regions
Start codon: ATG initiation
Exon states: Coding regions (may have multiple states per frame)
Intron states: Non-coding regions between exons
Splice sites: Exon-intron boundaries
Stop codon: TAA, TAG, TGA termination

Observations: Nucleotides (A, C, G, T)

Key insight: Different regions have different nucleotide statistics:

Coding regions show codon bias (prefer certain codons)
Intergenic regions look like random DNA
Splice sites have consensus sequences

gene_finding_hmm.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
import numpy as np
 
def simple_gene_finder_demo():
    """
    Simplified gene finder HMM demonstration.
    
    Real gene finders (GENSCAN, Augustus) use much more
    complex state spaces with duration modeling.
    """
    
    # Simplified states
    states = ['INTERGENIC', 'EXON', 'INTRON']  
    n_states = len(states)
    
    # Observations: nucleotides
    nucleotides = ['A', 'C', 'G', 'T']
    nuc2idx = {n: i for i, n in enumerate(nucleotides)}
    
    # Transition probabilities (simplified)
    # Exons tend to be followed by exons (self-loop) or introns
    # Intergenic regions are stable or transition to exons (gene start)
    A = np.array([
        [0.95, 0.05, 0.00],   # From INTERGENIC
        [0.01, 0.90, 0.09],   # From EXON
        [0.01, 0.10, 0.89],   # From INTRON
    ])
    
    # Emission probabilities
    # Coding regions (EXON) have biased nucleotide frequencies
    # due to codon usage preferences
    B = np.array([
        # A    C    G    T    - INTERGENIC (roughly uniform)
        [0.30, 0.20, 0.20, 0.30],
        # EXON - biased towards certain nucleotides in codons
        [0.20, 0.30, 0.35, 0.15],
        # INTRON - AT-rich
        [0.35, 0.15, 0.15, 0.35],
    ])
    
    pi = np.array([0.8, 0.1, 0.1])  # Usually start intergenic
    
    # Example DNA sequence
    dna_sequence = "ATGCGCGCGATCGATCGATCGATCGATCGATATATATATAAAA"
    obs = [nuc2idx[n] for n in dna_sequence]
    
    # Viterbi decoding
    T = len(obs)
    delta = np.zeros((T, n_states))
    psi = np.zeros((T, n_states), dtype=int)
    
    delta[0] = np.log(pi) + np.log(B[:, obs[0]])
    
    for t in range(1, T):
        for j in range(n_states):
            candidates = delta[t-1] + np.log(A[:, j])
            psi[t, j] = np.argmax(candidates)
            delta[t, j] = candidates[psi[t, j]] + np.log(B[j, obs[t]])
    
    # Backtrack
    path = np.zeros(T, dtype=int)
    path[-1] = np.argmax(delta[-1])
    for t in range(T-2, -1, -1):
        path[t] = psi[t+1, path[t+1]]
    
    # Display results
    print("Gene Finding Demo")
    print("=" * 60)
    print(f"DNA:    {dna_sequence}")
    print(f"States: {''.join([states[s][0] for s in path])}")
    print("\nLegend: I=INTERGENIC, E=EXON, N=INTRON")
    
    # Show segment summary
    print("\nSegment analysis:")
    current_state = path[0]
    start = 0
    for i in range(1, T+1):
        if i == T or path[i] != current_state:
            print(f"  Positions {start:3d}-{i-1:3d}: {states[current_state]:12s} "
                  f"({dna_sequence[start:i][:20]}{'...' if i-start > 20 else ''})")
            if i < T:
                current_state = path[i]
                start = i
 
 
simple_gene_finder_demo()

Other Bioinformatics Applications

Protein Secondary Structure Prediction:

States: Helix, Sheet, Coil
Observations: Amino acids
Transitions capture structural preferences

CpG Island Detection:

Regions with high CG dinucleotide frequency
Two-state HMM: CpG island vs. normal
Different emission statistics in each state

Sequence Alignment (Profile HMMs):

Model a family of related sequences
States: Match, Insert, Delete for each position
Used in HMMER for protein family searching

Chromatin State Annotation:

Observations: Histone modification signals
States: Promoter, enhancer, repressed regions
ChromHMM is widely used in genomics

Other Applications

Handwriting Recognition

Online handwriting (pen trajectory over time):

States: Character/stroke components
Observations: Pen position, velocity, pressure
Left-to-right HMMs for stroke modeling

Offline handwriting (image-based):

Extract features from vertical slices of image
Similar to speech: sequence of feature vectors → text

Gesture Recognition

Sign language recognition:

States: Hand shapes, positions, movements
Observations: Video features (hand positions, trajectories)
Multiple HMMs for different signs

Activity recognition:

States: Activities (walking, running, sitting)
Observations: Accelerometer/gyroscope readings
Temporal structure often hierarchical

Financial Applications

•Market regime detection: States = bull/bear/neutral markets; Observations = returns, volatility
•Credit scoring: Hidden credit-worthiness states from observable payment behavior
•Fraud detection: Normal vs. fraudulent transaction patterns
•Algorithmic trading: State-dependent trading strategies

Other Domains

•Network intrusion detection: Normal vs. attack states from packet statistics
•Musical analysis: Chord progression, beat tracking, genre classification
•Tracking and localization: Object tracking with hidden position states
•Error correction: Decoding transmitted signals (original Viterbi application!)

The Unifying Pattern

Across all these applications, the pattern is the same:

Define hidden states that represent the underlying process
Define observations that we actually measure
Model transitions between states (the Markov dynamics)
Model emissions from states to observations
Use Forward/Viterbi/Baum-Welch for inference and learning

The art is in choosing the right state space and observation model for your domain.

When to Use HMMs vs. Alternatives

The Modern Landscape

With the rise of deep learning, when should you still consider HMMs?

Advantages of HMMs

Interpretability:

States have clear meaning (unlike neural hidden states)
Transition/emission matrices are inspectable
Easy to incorporate domain knowledge

Data efficiency:

Work well with small datasets
EM learning from unlabeled data
Fewer parameters than deep models

Computational efficiency:

Inference is $O(TN^2)$ — fast and exact
No GPUs required
Easy to deploy in resource-constrained settings

Theoretical guarantees:

Well-understood convergence properties
Principled uncertainty quantification
Exact marginals via forward-backward

HMMs vs. Deep Learning for Sequential Models
Aspect	HMMs	Deep Learning (RNNs, Transformers)
Data requirements	Works with small data	Needs large datasets
Interpretability	High (inspectable parameters)	Low (black box)
Flexibility	Limited by model structure	Can learn complex patterns
Long-range dependencies	Limited (Markov assumption)	Better with attention
Training complexity	EM, local optima	SGD, hyperparameter tuning
Inference	Exact, efficient	Forward pass (no exact posteriors)
Uncertainty	Principled probabilistic	Requires additional techniques
Compute requirements	CPU sufficient	Often needs GPU

Choose HMMs When:

Data is limited — Deep learning needs thousands/millions of examples
Interpretability matters — You need to understand what the model learned
Domain has natural state structure — Clear discrete regimes
Computational resources are constrained — Embedded systems, real-time
You need proper uncertainty estimates — Forward-backward gives calibrated posteriors
First-order Markov is reasonable — Current state mostly determines next

Choose Deep Learning When:

Abundant data available — Millions of examples
Complex feature interactions — Rich input representations
Long-range dependencies — Information flows across many time steps
State-of-the-art performance required — Most modern benchmarks
End-to-end learning preferred — No manual feature engineering

Hybrid Approaches

Often the best solution combines both:

• Neural emissions: Use a neural network for $P(x|z)$, HMM structure for temporal model • Neural CRFs: CRF layer on top of BiLSTM/BERT for sequence labeling • Variational Sequential Models: VAEs with latent HMM structure

Understanding HMMs provides the foundation for these advanced architectures.

Implementation Considerations

Available Libraries

Python:

hmmlearn: Sklearn-compatible, Gaussian and GMM emissions
pomegranate: Flexible distributions, GPU support
seqlearn: Sequence learning with HMMs and CRFs

Java:

Jahmm: Pure Java HMM library
Used in bioinformatics tools

Specialty:

HMMER: Profile HMMs for protein sequences
HTK: HMM Toolkit for speech recognition
Kaldi: Speech recognition toolkit (HMM + neural)

Scaling to Large Problems

Challenge: Standard algorithms are $O(TN^2)$ per sequence

Solutions:

Sparse transitions: Many transitions are zero; only compute non-zero
Beam search: Keep only top-K states at each step (Viterbi)
Parallel processing: Independent sequences process in parallel
GPU acceleration: Matrix operations can use GPU
Approximate inference: Variational methods, particle filtering

hmm_with_hmmlearn.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
from hmmlearn import hmm
import numpy as np
 
def hmmlearn_example():
    """
    Example using the hmmlearn library for Gaussian HMMs.
    
    hmmlearn provides efficient implementations of HMM algorithms.
    """
    
    # Generate synthetic data from a 2-state Gaussian HMM
    np.random.seed(42)
    
    # True parameters
    true_startprob = np.array([0.6, 0.4])
    true_transmat = np.array([[0.7, 0.3],
                              [0.4, 0.6]])
    true_means = np.array([[0.0], [3.0]])
    true_covars = np.array([[[0.5]], [[1.0]]])
    
    # Create true model and sample
    true_model = hmm.GaussianHMM(n_components=2, covariance_type="full")
    true_model.startprob_ = true_startprob
    true_model.transmat_ = true_transmat
    true_model.means_ = true_means
    true_model.covars_ = true_covars
    
    # Generate training data
    X_train, _ = true_model.sample(1000)
    
    # Create and train a new model
    learned_model = hmm.GaussianHMM(
        n_components=2,
        covariance_type="full",
        n_iter=100,
        random_state=42
    )
    learned_model.fit(X_train)
    
    print("Learned HMM Parameters:")
    print("=" * 50)
    print("\nTransition matrix:")
    print(learned_model.transmat_.round(3))
    print("\nMeans:")
    print(learned_model.means_.round(3))
    
    # Decode a test sequence
    X_test, true_states = true_model.sample(20)
    log_prob, predicted_states = learned_model.decode(X_test)
    
    print("\nDecoding Example:")
    print(f"True states:      {true_states.flatten()}")
    print(f"Predicted states: {predicted_states}")
    print(f"Accuracy: {(true_states.flatten() == predicted_states).mean():.2%}")
    
    # Model selection using BIC
    print("\nModel Selection (BIC):")
    for n in range(1, 5):
        model = hmm.GaussianHMM(n_components=n, n_iter=100, random_state=42)
        model.fit(X_train)
        bic = -2 * model.score(X_train) + n * np.log(len(X_train))
        print(f"  {n} states: BIC = {bic:.2f}")
 
 
hmmlearn_example()

Summary: HMM Applications

Hidden Markov Models have been one of the most successful and widely-applied machine learning techniques, providing the foundation for numerous technologies we use daily.

Key Takeaways

•Speech recognition was dominated by GMM-HMMs for decades, with DNN-HMMs and now end-to-end models building on these foundations.
•Sequence labeling (POS tagging, NER) benefits from HMMs' ability to model label dependencies and use Viterbi for coherent predictions.
•Bioinformatics uses HMMs extensively: gene finding, protein family modeling, chromatin state annotation, and sequence alignment.
•Other domains include handwriting recognition, gesture recognition, financial modeling, and network security.
•HMMs excel when data is limited, interpretability matters, domain has natural states, or resources are constrained.
•Deep learning is preferred for large-scale data with complex patterns and long-range dependencies.
•Hybrid approaches combine HMM structure with neural components for the best of both worlds.

Module Complete: Hidden Markov Models

Congratulations! You have completed the comprehensive study of Hidden Markov Models. You now understand:

HMM Structure: States, observations, transitions, emissions, and the graphical model representation
Forward Algorithm: Efficiently computing observation likelihoods
Viterbi Algorithm: Finding the most likely hidden state sequence
Baum-Welch Algorithm: Learning parameters from unlabeled sequences
Applications: Real-world uses across speech, NLP, bioinformatics, and beyond

This knowledge provides a solid foundation for understanding more advanced sequential models, including Conditional Random Fields, Recurrent Neural Networks, and modern sequence-to-sequence architectures.

Module Complete

You have mastered Hidden Markov Models—from the mathematical foundations through core algorithms to real-world applications. HMMs remain relevant today as building blocks for understanding sequential data, and their principles permeate modern machine learning. Whether you apply HMMs directly or use them as a conceptual framework for more complex models, this knowledge will serve you well throughout your machine learning journey.

Applications

HMMs in the Real World

Understanding how HMMs are applied in practice reveals:

How to design state spaces and observation models for specific problems
The trade-offs between model complexity and computational cost
How HMMs integrate with other techniques (neural networks, feature engineering)
When HMMs are the right choice versus modern alternatives like deep learning

This page surveys the most important application domains, providing concrete examples of how the algorithms we've learned translate into working systems.

What You Will Learn

By the end of this page, you will:

Speech Recognition

The Speech Recognition Problem

Given an acoustic signal (audio waveform), transcribe the spoken words. This involves:

Acoustic modeling: Mapping audio features to phonemes/subword units
Language modeling: Ensuring output forms valid word sequences
Decoding: Finding the most likely transcription

HMMs dominated speech recognition from the 1980s until the deep learning revolution around 2012.

The HMM-Based Architecture

States: Phonemes or sub-phoneme units (typically 3 states per phoneme: beginning, middle, end)

Observations: Acoustic feature vectors, typically:

MFCCs (Mel-Frequency Cepstral Coefficients): ~13-39 dimensional
Computed every 10ms from overlapping windows

Emissions: Gaussian Mixture Models (GMM-HMMs)

Each state emits from a mixture of Gaussians
Captures the variability in how phonemes sound

Transitions: Left-to-right (Bakis) topology

States can only stay the same or move forward
Natural for modeling phoneme duration

Converting Mermaid diagram...

Training and Decoding

Training:

Collect transcribed audio (labeled data)
Forced alignment: Align audio to phoneme sequence using Viterbi
Baum-Welch: Refine parameters using forward-backward
Embedded training: Train word-level models from sentence-level labels

Decoding:

Viterbi search over combined acoustic + language model
Language model provides word-level constraints
Beam search for computational tractability

The GMM-HMM to DNN-HMM Transition

Around 2010-2012, Deep Neural Networks replaced GMMs for emission modeling:

DNNs estimate $P(\text{state} | \text{acoustic features})$
HMM structure still handles temporal dynamics
This hybrid DNN-HMM approach dramatically improved accuracy

The HMM Legacy in Speech

Even though pure HMM systems are now rare in production speech recognition, their influence is profound:

Part-of-Speech Tagging

The POS Tagging Problem

Given a sentence, assign a grammatical tag (noun, verb, adjective, ...) to each word.

Example:

Sentence: The    dog   runs  quickly
Tags:     DET    NOUN  VERB  ADV

POS tagging is a fundamental NLP task that feeds into parsing, named entity recognition, and other downstream applications.

HMM Formulation

States: Part-of-speech tags (tagset size ~45 for Penn Treebank)

Observations: Words

Transition Matrix: $P(\text{tag}t | \text{tag}{t-1})$

Captures syntactic regularities (e.g., determiners followed by nouns)
Learned from tagged corpora by counting bigrams

Emission Distribution: $P(\text{word} | \text{tag})$

Learned from tagged corpora
Challenge: Handle unseen words (OOV problem)

The Generative Story

Generate first tag from initial distribution: $z_1 \sim \pi$
Generate first word from tag: $x_1 \sim P(\text{word} | z_1)$
For each subsequent position:
- Transition to next tag: $z_t \sim P(\cdot | z_{t-1})$
- Emit word: $x_t \sim P(\text{word} | z_t)$

pos_tagging_hmm.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
import numpy as np
from collections import defaultdict
from typing import List, Tuple
 
class POSTaggerHMM:
    """
    HMM-based Part-of-Speech tagger.
    
    Demonstrates supervised training from tagged corpus
    and Viterbi decoding for inference.
    """
    
    def __init__(self, smoothing: float = 0.1):
        self.smoothing = smoothing
        self.tag2idx = {}
        self.word2idx = {}
        self.idx2tag = {}
        
        # Parameters (populated during training)
        self.A = None  # Transition probabilities
        self.B = None  # Emission probabilities
        self.pi = None  # Initial distribution
    
    def train(self, tagged_sentences: List[List[Tuple[str, str]]]):
        """
        Train from labeled data (supervised learning).
        
        Parameters
        ----------
        tagged_sentences : List of sentences, where each sentence is
                          a list of (word, tag) tuples
        """
        # Build vocabularies
        for sentence in tagged_sentences:
            for word, tag in sentence:
                if tag not in self.tag2idx:
                    idx = len(self.tag2idx)
                    self.tag2idx[tag] = idx
                    self.idx2tag[idx] = tag
                if word not in self.word2idx:
                    self.word2idx[word] = len(self.word2idx)
        
        # Add UNK token
        self.word2idx['<UNK>'] = len(self.word2idx)
        
        N = len(self.tag2idx)  # Number of tags
        V = len(self.word2idx)  # Vocabulary size
        
        # Count statistics
        initial_counts = np.zeros(N) + self.smoothing
        transition_counts = np.zeros((N, N)) + self.smoothing
        emission_counts = np.zeros((N, V)) + self.smoothing
        
        for sentence in tagged_sentences:
            prev_tag_idx = None
            for i, (word, tag) in enumerate(sentence):
                tag_idx = self.tag2idx[tag]
                word_idx = self.word2idx[word]
                
                if i == 0:
                    initial_counts[tag_idx] += 1
                else:
                    transition_counts[prev_tag_idx, tag_idx] += 1
                
                emission_counts[tag_idx, word_idx] += 1
                prev_tag_idx = tag_idx
        
        # Normalize to get probabilities
        self.pi = initial_counts / initial_counts.sum()
        self.A = transition_counts / transition_counts.sum(axis=1, keepdims=True)
        self.B = emission_counts / emission_counts.sum(axis=1, keepdims=True)
    
    def tag(self, words: List[str]) -> List[str]:
        """
        Tag a sentence using Viterbi decoding.
        """
        T = len(words)
        N = len(self.tag2idx)
        
        # Map words to indices (use UNK for unknown words)
        obs = [self.word2idx.get(w, self.word2idx['<UNK>']) for w in words]
        
        # Viterbi in log space
        log_A = np.log(self.A)
        log_B = np.log(self.B)
        log_pi = np.log(self.pi)
        
        delta = np.zeros((T, N))
        psi = np.zeros((T, N), dtype=int)
        
        # Initialization
        delta[0] = log_pi + log_B[:, obs[0]]
        
        # Recursion
        for t in range(1, T):
            for j in range(N):
                candidates = delta[t-1] + log_A[:, j]
                psi[t, j] = np.argmax(candidates)
                delta[t, j] = candidates[psi[t, j]] + log_B[j, obs[t]]
        
        # Backtracking
        path = np.zeros(T, dtype=int)
        path[-1] = np.argmax(delta[-1])
        for t in range(T-2, -1, -1):
            path[t] = psi[t+1, path[t+1]]
        
        return [self.idx2tag[i] for i in path]
 
 
# Example usage
def demo_pos_tagger():
    # Simple training data
    training_data = [
        [('the', 'DET'), ('dog', 'NOUN'), ('runs', 'VERB')],
        [('a', 'DET'), ('cat', 'NOUN'), ('sleeps', 'VERB')],
        [('the', 'DET'), ('big', 'ADJ'), ('dog', 'NOUN'), ('barks', 'VERB')],
        [('dogs', 'NOUN'), ('run', 'VERB'), ('fast', 'ADV')],
        [('the', 'DET'), ('quick', 'ADJ'), ('fox', 'NOUN'), ('jumps', 'VERB')],
    ]
    
    tagger = POSTaggerHMM()
    tagger.train(training_data)
    
    # Test sentences
    test_sentences = [
        ['the', 'cat', 'runs'],
        ['a', 'big', 'dog', 'barks'],
        ['cats', 'sleep'],  # 'cats' is OOV
    ]
    
    print("POS Tagging Results:")
    print("-" * 50)
    for sentence in test_sentences:
        tags = tagger.tag(sentence)
        print(f"Words: {' '.join(sentence)}")
        print(f"Tags:  {' '.join(tags)}\n")
 
demo_pos_tagger()

Handling Unknown Words

A major challenge in HMM POS tagging is out-of-vocabulary (OOV) words—words not seen during training. Solutions include:

Modern systems use these features in discriminative models (CRFs) or neural taggers.

Named Entity Recognition

The NER Problem

Identify and classify named entities in text—persons, organizations, locations, dates, etc.

Example:

Text:    Apple    Inc.   announced  that  Tim    Cook   will   visit   London
Labels:  B-ORG    I-ORG  O          O     B-PER  I-PER  O      O       B-LOC

The BIO Tagging Scheme

NER uses a special label scheme to mark entity boundaries:

B-X: Beginning of entity type X
I-X: Inside (continuation of) entity type X
O: Outside any entity

This converts the entity recognition problem into a sequence labeling problem that HMMs can handle.

HMM Formulation

States: BIO tags (e.g., B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC, O)

Observations: Words and features (capitalization, word shape)

Transition Constraints:

I-X can only follow B-X or I-X (same entity type)
O cannot follow I-X (entity must have proper ending)

These constraints can be enforced by setting transition probabilities to zero for invalid transitions.

BIO Tag Transition Validity
From \ To	B-PER	I-PER	B-ORG	I-ORG	O
B-PER	✓	✓	✓	✗	✓
I-PER	✓	✓	✓	✗	✓
B-ORG	✓	✗	✓	✓	✓
I-ORG	✓	✗	✓	✓	✓
O	✓	✗	✓	✗	✓

Feature Engineering for NER

Raw words are insufficient for NER—we need rich features:

Word-level features:

Capitalization (initial cap, all caps, lowercase)
Word shape (Xxxx → Initial cap, XXXX → All caps)
Prefixes/suffixes
Part-of-speech tag

Context features:

Previous/next word
Features of surrounding words

Gazetteers (external knowledge):

Lists of known entities (person names, cities, companies)
Binary feature: "is word in person name list?"

For HMMs, these features must be incorporated into the observation model—either by discretizing features or using Gaussian emissions for continuous features.

From HMMs to CRFs

HMMs are generative models: they model $P(x, z) = P(x|z)P(z)$. This requires specifying how observations are generated from states.

Modern NER systems typically use neural CRFs or transformer-based models (BERT + token classification), but understanding HMMs provides the foundation for these advances.

Gene Finding and Bioinformatics

The Gene Finding Problem

Given a DNA sequence (string over {A, C, G, T}), identify protein-coding genes:

Where do genes start and stop?
Which regions are coding (exons) vs. non-coding (introns)?

HMMs excel here because gene structure has clear sequential dependencies: genes have start codons, splice sites, stop codons, and characteristic statistics.

Gene Structure

─────────────[===EXON===]────INTRON────[===EXON===]──────────
            ATG                         ...        TGA
         (start codon)                           (stop codon)

HMM for Gene Finding (GENSCAN-style)

States:

Intergenic: Non-gene regions
Promoter: Gene regulatory regions
Start codon: ATG initiation
Exon states: Coding regions (may have multiple states per frame)
Intron states: Non-coding regions between exons
Splice sites: Exon-intron boundaries
Stop codon: TAA, TAG, TGA termination

Observations: Nucleotides (A, C, G, T)

Key insight: Different regions have different nucleotide statistics:

Coding regions show codon bias (prefer certain codons)
Intergenic regions look like random DNA
Splice sites have consensus sequences

gene_finding_hmm.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
import numpy as np
 
def simple_gene_finder_demo():
    """
    Simplified gene finder HMM demonstration.
    
    Real gene finders (GENSCAN, Augustus) use much more
    complex state spaces with duration modeling.
    """
    
    # Simplified states
    states = ['INTERGENIC', 'EXON', 'INTRON']  
    n_states = len(states)
    
    # Observations: nucleotides
    nucleotides = ['A', 'C', 'G', 'T']
    nuc2idx = {n: i for i, n in enumerate(nucleotides)}
    
    # Transition probabilities (simplified)
    # Exons tend to be followed by exons (self-loop) or introns
    # Intergenic regions are stable or transition to exons (gene start)
    A = np.array([
        [0.95, 0.05, 0.00],   # From INTERGENIC
        [0.01, 0.90, 0.09],   # From EXON
        [0.01, 0.10, 0.89],   # From INTRON
    ])
    
    # Emission probabilities
    # Coding regions (EXON) have biased nucleotide frequencies
    # due to codon usage preferences
    B = np.array([
        # A    C    G    T    - INTERGENIC (roughly uniform)
        [0.30, 0.20, 0.20, 0.30],
        # EXON - biased towards certain nucleotides in codons
        [0.20, 0.30, 0.35, 0.15],
        # INTRON - AT-rich
        [0.35, 0.15, 0.15, 0.35],
    ])
    
    pi = np.array([0.8, 0.1, 0.1])  # Usually start intergenic
    
    # Example DNA sequence
    dna_sequence = "ATGCGCGCGATCGATCGATCGATCGATCGATATATATATAAAA"
    obs = [nuc2idx[n] for n in dna_sequence]
    
    # Viterbi decoding
    T = len(obs)
    delta = np.zeros((T, n_states))
    psi = np.zeros((T, n_states), dtype=int)
    
    delta[0] = np.log(pi) + np.log(B[:, obs[0]])
    
    for t in range(1, T):
        for j in range(n_states):
            candidates = delta[t-1] + np.log(A[:, j])
            psi[t, j] = np.argmax(candidates)
            delta[t, j] = candidates[psi[t, j]] + np.log(B[j, obs[t]])
    
    # Backtrack
    path = np.zeros(T, dtype=int)
    path[-1] = np.argmax(delta[-1])
    for t in range(T-2, -1, -1):
        path[t] = psi[t+1, path[t+1]]
    
    # Display results
    print("Gene Finding Demo")
    print("=" * 60)
    print(f"DNA:    {dna_sequence}")
    print(f"States: {''.join([states[s][0] for s in path])}")
    print("\nLegend: I=INTERGENIC, E=EXON, N=INTRON")
    
    # Show segment summary
    print("\nSegment analysis:")
    current_state = path[0]
    start = 0
    for i in range(1, T+1):
        if i == T or path[i] != current_state:
            print(f"  Positions {start:3d}-{i-1:3d}: {states[current_state]:12s} "
                  f"({dna_sequence[start:i][:20]}{'...' if i-start > 20 else ''})")
            if i < T:
                current_state = path[i]
                start = i
 
 
simple_gene_finder_demo()

Other Bioinformatics Applications

Protein Secondary Structure Prediction:

States: Helix, Sheet, Coil
Observations: Amino acids
Transitions capture structural preferences

CpG Island Detection:

Regions with high CG dinucleotide frequency
Two-state HMM: CpG island vs. normal
Different emission statistics in each state

Sequence Alignment (Profile HMMs):

Model a family of related sequences
States: Match, Insert, Delete for each position
Used in HMMER for protein family searching

Chromatin State Annotation:

Observations: Histone modification signals
States: Promoter, enhancer, repressed regions
ChromHMM is widely used in genomics

Other Applications

Handwriting Recognition

Online handwriting (pen trajectory over time):

States: Character/stroke components
Observations: Pen position, velocity, pressure
Left-to-right HMMs for stroke modeling

Offline handwriting (image-based):

Extract features from vertical slices of image
Similar to speech: sequence of feature vectors → text

Gesture Recognition

Sign language recognition:

States: Hand shapes, positions, movements
Observations: Video features (hand positions, trajectories)
Multiple HMMs for different signs

Activity recognition:

States: Activities (walking, running, sitting)
Observations: Accelerometer/gyroscope readings
Temporal structure often hierarchical

Financial Applications

•Market regime detection: States = bull/bear/neutral markets; Observations = returns, volatility
•Credit scoring: Hidden credit-worthiness states from observable payment behavior
•Fraud detection: Normal vs. fraudulent transaction patterns
•Algorithmic trading: State-dependent trading strategies

Other Domains

•Network intrusion detection: Normal vs. attack states from packet statistics
•Musical analysis: Chord progression, beat tracking, genre classification
•Tracking and localization: Object tracking with hidden position states
•Error correction: Decoding transmitted signals (original Viterbi application!)

The Unifying Pattern

Across all these applications, the pattern is the same:

Define hidden states that represent the underlying process
Define observations that we actually measure
Model transitions between states (the Markov dynamics)
Model emissions from states to observations
Use Forward/Viterbi/Baum-Welch for inference and learning

The art is in choosing the right state space and observation model for your domain.

When to Use HMMs vs. Alternatives

The Modern Landscape

With the rise of deep learning, when should you still consider HMMs?

Advantages of HMMs

Interpretability:

States have clear meaning (unlike neural hidden states)
Transition/emission matrices are inspectable
Easy to incorporate domain knowledge

Data efficiency:

Work well with small datasets
EM learning from unlabeled data
Fewer parameters than deep models

Computational efficiency:

Inference is $O(TN^2)$ — fast and exact
No GPUs required
Easy to deploy in resource-constrained settings

Theoretical guarantees:

Well-understood convergence properties
Principled uncertainty quantification
Exact marginals via forward-backward

HMMs vs. Deep Learning for Sequential Models
Aspect	HMMs	Deep Learning (RNNs, Transformers)
Data requirements	Works with small data	Needs large datasets
Interpretability	High (inspectable parameters)	Low (black box)
Flexibility	Limited by model structure	Can learn complex patterns
Long-range dependencies	Limited (Markov assumption)	Better with attention
Training complexity	EM, local optima	SGD, hyperparameter tuning
Inference	Exact, efficient	Forward pass (no exact posteriors)
Uncertainty	Principled probabilistic	Requires additional techniques
Compute requirements	CPU sufficient	Often needs GPU

Choose HMMs When:

Data is limited — Deep learning needs thousands/millions of examples
Interpretability matters — You need to understand what the model learned
Domain has natural state structure — Clear discrete regimes
Computational resources are constrained — Embedded systems, real-time
You need proper uncertainty estimates — Forward-backward gives calibrated posteriors
First-order Markov is reasonable — Current state mostly determines next

Choose Deep Learning When:

Abundant data available — Millions of examples
Complex feature interactions — Rich input representations
Long-range dependencies — Information flows across many time steps
State-of-the-art performance required — Most modern benchmarks
End-to-end learning preferred — No manual feature engineering

Hybrid Approaches

Often the best solution combines both:

Understanding HMMs provides the foundation for these advanced architectures.

Implementation Considerations

Available Libraries

Python:

hmmlearn: Sklearn-compatible, Gaussian and GMM emissions
pomegranate: Flexible distributions, GPU support
seqlearn: Sequence learning with HMMs and CRFs

Java:

Jahmm: Pure Java HMM library
Used in bioinformatics tools

Specialty:

HMMER: Profile HMMs for protein sequences
HTK: HMM Toolkit for speech recognition
Kaldi: Speech recognition toolkit (HMM + neural)

Scaling to Large Problems

Challenge: Standard algorithms are $O(TN^2)$ per sequence

Solutions:

Sparse transitions: Many transitions are zero; only compute non-zero
Beam search: Keep only top-K states at each step (Viterbi)
Parallel processing: Independent sequences process in parallel
GPU acceleration: Matrix operations can use GPU
Approximate inference: Variational methods, particle filtering

hmm_with_hmmlearn.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
from hmmlearn import hmm
import numpy as np
 
def hmmlearn_example():
    """
    Example using the hmmlearn library for Gaussian HMMs.
    
    hmmlearn provides efficient implementations of HMM algorithms.
    """
    
    # Generate synthetic data from a 2-state Gaussian HMM
    np.random.seed(42)
    
    # True parameters
    true_startprob = np.array([0.6, 0.4])
    true_transmat = np.array([[0.7, 0.3],
                              [0.4, 0.6]])
    true_means = np.array([[0.0], [3.0]])
    true_covars = np.array([[[0.5]], [[1.0]]])
    
    # Create true model and sample
    true_model = hmm.GaussianHMM(n_components=2, covariance_type="full")
    true_model.startprob_ = true_startprob
    true_model.transmat_ = true_transmat
    true_model.means_ = true_means
    true_model.covars_ = true_covars
    
    # Generate training data
    X_train, _ = true_model.sample(1000)
    
    # Create and train a new model
    learned_model = hmm.GaussianHMM(
        n_components=2,
        covariance_type="full",
        n_iter=100,
        random_state=42
    )
    learned_model.fit(X_train)
    
    print("Learned HMM Parameters:")
    print("=" * 50)
    print("\nTransition matrix:")
    print(learned_model.transmat_.round(3))
    print("\nMeans:")
    print(learned_model.means_.round(3))
    
    # Decode a test sequence
    X_test, true_states = true_model.sample(20)
    log_prob, predicted_states = learned_model.decode(X_test)
    
    print("\nDecoding Example:")
    print(f"True states:      {true_states.flatten()}")
    print(f"Predicted states: {predicted_states}")
    print(f"Accuracy: {(true_states.flatten() == predicted_states).mean():.2%}")
    
    # Model selection using BIC
    print("\nModel Selection (BIC):")
    for n in range(1, 5):
        model = hmm.GaussianHMM(n_components=n, n_iter=100, random_state=42)
        model.fit(X_train)
        bic = -2 * model.score(X_train) + n * np.log(len(X_train))
        print(f"  {n} states: BIC = {bic:.2f}")
 
 
hmmlearn_example()

Summary: HMM Applications

Hidden Markov Models have been one of the most successful and widely-applied machine learning techniques, providing the foundation for numerous technologies we use daily.

Key Takeaways

•Speech recognition was dominated by GMM-HMMs for decades, with DNN-HMMs and now end-to-end models building on these foundations.
•Sequence labeling (POS tagging, NER) benefits from HMMs' ability to model label dependencies and use Viterbi for coherent predictions.
•Bioinformatics uses HMMs extensively: gene finding, protein family modeling, chromatin state annotation, and sequence alignment.
•Other domains include handwriting recognition, gesture recognition, financial modeling, and network security.
•HMMs excel when data is limited, interpretability matters, domain has natural states, or resources are constrained.
•Deep learning is preferred for large-scale data with complex patterns and long-range dependencies.
•Hybrid approaches combine HMM structure with neural components for the best of both worlds.

Module Complete: Hidden Markov Models

Congratulations! You have completed the comprehensive study of Hidden Markov Models. You now understand:

HMM Structure: States, observations, transitions, emissions, and the graphical model representation
Forward Algorithm: Efficiently computing observation likelihoods
Viterbi Algorithm: Finding the most likely hidden state sequence
Baum-Welch Algorithm: Learning parameters from unlabeled sequences
Applications: Real-world uses across speech, NLP, bioinformatics, and beyond

Module Complete