Language Model Perplexity Score (Easy) — Practice with Code Visualizer

Perplexity is one of the most fundamental and widely-used metrics for evaluating the quality of language models. It provides an intuitive measure of how "surprised" or "uncertain" a model is when predicting the next token in a sequence—a lower perplexity indicates that the model is more confident and accurate in its predictions.

Understanding Perplexity

At its core, perplexity quantifies how well a probability distribution predicts a sample. For language models, it answers the question: "On average, how many equally likely words would the model be choosing from at each step?"

For example:

A perplexity of 1 means the model perfectly predicts every token with 100% confidence
A perplexity of 10 means the model's uncertainty is equivalent to randomly selecting from 10 equally likely options
A perplexity of 1000 indicates the model is highly uncertain, as if guessing among 1000 possibilities

Mathematical Definition

Given a sequence of N tokens where the model assigns probabilities P(tᵢ | context) to each token, perplexity is computed as:

$$\text{Perplexity} = \exp\left(-\frac{1}{N} \sum_{i=1}^{N} \ln P(t_i | \text{context})\right)$$

This is equivalent to the geometric mean of the inverse probabilities:

$$\text{Perplexity} = \left(\prod_{i=1}^{N} \frac{1}{P(t_i | \text{context})}\right)^{1/N}$$

Your Task

Write a function that computes the perplexity of a language model given a list of token probabilities. Each probability in the list represents how likely the model predicted the actual token that appeared in the sequence.

Input:

probabilities: A list of floats where each value represents P(token_i | context) for the i-th token in the sequence. All probabilities are in the range (0, 1].

Output:

A single float representing the perplexity score, rounded to 6 decimal places.

With 4 tokens, each predicted with probability 0.5:

Calculate log probabilities: ln(0.5) ≈ -0.693 for each token
Compute average log probability: (-0.693 × 4) / 4 = -0.693
Apply exponential: exp(-(-0.693)) = exp(0.693) ≈ 2.0

This result makes intuitive sense: when the model assigns equal probability to 2 outcomes for each token, the perplexity equals 2, meaning the model is effectively choosing between 2 equally likely options at each step.

When every token is predicted with 100% confidence (probability = 1.0):

Calculate log probabilities: ln(1.0) = 0 for each token
Compute average log probability: (0 + 0 + 0) / 3 = 0
Apply exponential: exp(-0) = exp(0) = 1.0

A perplexity of 1.0 represents perfect prediction—the model had zero uncertainty and correctly assigned probability 1 to each actual token. This is the theoretical minimum perplexity.

For a sequence with varying token probabilities:

Calculate log probabilities:
- ln(0.8) ≈ -0.2231
- ln(0.6) ≈ -0.5108
- ln(0.9) ≈ -0.1054
- ln(0.3) ≈ -1.2040
- ln(0.5) ≈ -0.6931
Sum of log probabilities: -0.2231 + (-0.5108) + (-0.1054) + (-1.2040) + (-0.6931) ≈ -2.7364
Average log probability: -2.7364 / 5 ≈ -0.5473
Perplexity: exp(0.5473) ≈ 1.728562

The lower confidence predictions (especially 0.3) increase the overall perplexity, while high-confidence predictions (0.8, 0.9) help keep it relatively low.

Understanding Perplexity

For example:

A perplexity of 1 means the model perfectly predicts every token with 100% confidence
A perplexity of 10 means the model's uncertainty is equivalent to randomly selecting from 10 equally likely options
A perplexity of 1000 indicates the model is highly uncertain, as if guessing among 1000 possibilities

Mathematical Definition

Given a sequence of N tokens where the model assigns probabilities P(tᵢ | context) to each token, perplexity is computed as:

$$\text{Perplexity} = \exp\left(-\frac{1}{N} \sum_{i=1}^{N} \ln P(t_i | \text{context})\right)$$

This is equivalent to the geometric mean of the inverse probabilities:

$$\text{Perplexity} = \left(\prod_{i=1}^{N} \frac{1}{P(t_i | \text{context})}\right)^{1/N}$$

Your Task

Input:

probabilities: A list of floats where each value represents P(token_i | context) for the i-th token in the sequence. All probabilities are in the range (0, 1].

Output:

A single float representing the perplexity score, rounded to 6 decimal places.

With 4 tokens, each predicted with probability 0.5:

Calculate log probabilities: ln(0.5) ≈ -0.693 for each token
Compute average log probability: (-0.693 × 4) / 4 = -0.693
Apply exponential: exp(-(-0.693)) = exp(0.693) ≈ 2.0

When every token is predicted with 100% confidence (probability = 1.0):

Calculate log probabilities: ln(1.0) = 0 for each token
Compute average log probability: (0 + 0 + 0) / 3 = 0
Apply exponential: exp(-0) = exp(0) = 1.0

A perplexity of 1.0 represents perfect prediction—the model had zero uncertainty and correctly assigned probability 1 to each actual token. This is the theoretical minimum perplexity.

For a sequence with varying token probabilities:

Calculate log probabilities:
- ln(0.8) ≈ -0.2231
- ln(0.6) ≈ -0.5108
- ln(0.9) ≈ -0.1054
- ln(0.3) ≈ -1.2040
- ln(0.5) ≈ -0.6931
Sum of log probabilities: -0.2231 + (-0.5108) + (-0.1054) + (-1.2040) + (-0.6931) ≈ -2.7364
Average log probability: -2.7364 / 5 ≈ -0.5473
Perplexity: exp(0.5473) ≈ 1.728562

The lower confidence predictions (especially 0.3) increase the overall perplexity, while high-confidence predictions (0.8, 0.9) help keep it relatively low.

Language Model Perplexity Score

Understanding Perplexity

Mathematical Definition

Your Task

Hints

Language Model Perplexity Score

Understanding Perplexity

Mathematical Definition

Your Task

Hints