Loading problem...
Perplexity is one of the most fundamental and widely-used metrics for evaluating the quality of language models. It provides an intuitive measure of how "surprised" or "uncertain" a model is when predicting the next token in a sequence—a lower perplexity indicates that the model is more confident and accurate in its predictions.
At its core, perplexity quantifies how well a probability distribution predicts a sample. For language models, it answers the question: "On average, how many equally likely words would the model be choosing from at each step?"
For example:
Given a sequence of N tokens where the model assigns probabilities P(tᵢ | context) to each token, perplexity is computed as:
$$\text{Perplexity} = \exp\left(-\frac{1}{N} \sum_{i=1}^{N} \ln P(t_i | \text{context})\right)$$
This is equivalent to the geometric mean of the inverse probabilities:
$$\text{Perplexity} = \left(\prod_{i=1}^{N} \frac{1}{P(t_i | \text{context})}\right)^{1/N}$$
Write a function that computes the perplexity of a language model given a list of token probabilities. Each probability in the list represents how likely the model predicted the actual token that appeared in the sequence.
Input:
probabilities: A list of floats where each value represents P(token_i | context) for the i-th token in the sequence. All probabilities are in the range (0, 1].Output:
probabilities = [0.5, 0.5, 0.5, 0.5]2.0With 4 tokens, each predicted with probability 0.5:
This result makes intuitive sense: when the model assigns equal probability to 2 outcomes for each token, the perplexity equals 2, meaning the model is effectively choosing between 2 equally likely options at each step.
probabilities = [1.0, 1.0, 1.0]1.0When every token is predicted with 100% confidence (probability = 1.0):
A perplexity of 1.0 represents perfect prediction—the model had zero uncertainty and correctly assigned probability 1 to each actual token. This is the theoretical minimum perplexity.
probabilities = [0.8, 0.6, 0.9, 0.3, 0.5]1.728562For a sequence with varying token probabilities:
Calculate log probabilities:
Sum of log probabilities: -0.2231 + (-0.5108) + (-0.1054) + (-1.2040) + (-0.6931) ≈ -2.7364
Average log probability: -2.7364 / 5 ≈ -0.5473
Perplexity: exp(0.5473) ≈ 1.728562
The lower confidence predictions (especially 0.3) increase the overall perplexity, while high-confidence predictions (0.8, 0.9) help keep it relatively low.
Constraints