0/318

00:00:00

Description

Editorial

N-Gram Precision Score for Text Generation Evaluation

MEDIUM20 pts

In natural language processing and machine translation, evaluating the quality of machine-generated text is a critical challenge. One of the most widely adopted approaches is to measure how well the generated text overlaps with one or more human-written reference texts using n-gram precision scoring.

An n-gram is a contiguous sequence of n tokens from a text. For example, in the sentence "the cat sat", the 1-grams (unigrams) are ["the", "cat", "sat"], and the 2-grams (bigrams) are [("the", "cat"), ("cat", "sat")].

The N-Gram Precision Score evaluates the quality of generated text through the following components:

1. Modified N-Gram Precision

For each n from 1 to max_n, calculate the precision of n-grams in the candidate text:

$$p_n = \frac{\sum_{\text{n-gram} \in \text{candidate}} \text{Count}{\text{clipped}}(\text{n-gram})}{\sum{\text{n-gram} \in \text{candidate}} \text{Count}(\text{n-gram})}$$

The clipped count for each n-gram is the minimum of:

How many times it appears in the candidate
The maximum times it appears in any single reference

This clipping mechanism prevents gaming by repetition—a candidate of "the the the the" would not score highly just because "the" appears in the reference.

2. Brevity Penalty (BP)

To discourage overly short translations that might achieve high precision by only producing "safe" words, a brevity penalty is applied:

$$BP = \begin{cases} 1 & \text{if } c > r \ e^{(1 - r/c)} & \text{if } c \leq r \end{cases}$$

Where:

c is the candidate length (number of tokens)
r is the effective reference length

When multiple references exist, the effective reference length is chosen as the reference length closest to the candidate length. If two reference lengths are equidistant, choose the shorter one.

3. Final Score Calculation

The final score combines the brevity penalty with a geometric mean of the n-gram precisions:

$$\text{Score} = BP \cdot \exp\left(\sum_{n=1}^{N} w_n \cdot \log(p_n)\right)$$

Where $w_n = \frac{1}{N}$ (uniform weighting).

Edge Cases

Return 0.0 if the candidate is empty
Return 0.0 if any n-gram precision $p_n$ is zero (since log(0) is undefined and indicates no overlap at that n-gram level)

Your Task: Implement a function that computes this n-gram precision score given a candidate sentence (as a list of tokens), one or more reference sentences (each as a list of tokens), and the maximum n-gram order to consider.

Example

Input

candidate = ['a', 'b', 'c', 'd']
references = [['a', 'b', 'x', 'd']]
max_n = 2

Output

0.5

Explanation

Step 1: Calculate 1-gram (Unigram) Precision

Candidate 1-grams: {a: 1, b: 1, c: 1, d: 1} → 4 total Reference 1-grams: {a: 1, b: 1, x: 1, d: 1}

Clipped counts: • 'a': min(1, 1) = 1 ✓ • 'b': min(1, 1) = 1 ✓ • 'c': min(1, 0) = 0 ✗ (not in reference) • 'd': min(1, 1) = 1 ✓

Sum of clipped counts = 3 p₁ = 3/4 = 0.75

Step 2: Calculate 2-gram (Bigram) Precision

Candidate 2-grams: {(a,b): 1, (b,c): 1, (c,d): 1} → 3 total Reference 2-grams: {(a,b): 1, (b,x): 1, (x,d): 1}

Clipped counts: • (a,b): min(1, 1) = 1 ✓ • (b,c): min(1, 0) = 0 ✗ • (c,d): min(1, 0) = 0 ✗

Sum of clipped counts = 1 p₂ = 1/3 ≈ 0.333

Step 3: Calculate Brevity Penalty

Candidate length (c) = 4 Reference length (r) = 4 Since c = r, BP = 1.0

Step 4: Compute Final Score

Geometric mean = exp((log(0.75) + log(0.333)) / 2) = exp((-0.288 + -1.099) / 2) = exp(-0.693) ≈ 0.5

Final Score = BP × Geometric Mean = 1.0 × 0.5 = 0.5

Example

Input

candidate = ['the', 'cat', 'sat', 'on', 'the', 'mat']
references = [['the', 'cat', 'sat', 'on', 'the', 'mat']]
max_n = 4

Output

1.0

Explanation

This is a perfect match scenario where the candidate exactly equals the reference.

N-gram Precisions: • p₁ (unigrams): All 6 tokens match perfectly → 6/6 = 1.0 • p₂ (bigrams): All 5 bigrams match → 5/5 = 1.0 • p₃ (trigrams): All 4 trigrams match → 4/4 = 1.0 • p₄ (4-grams): All 3 4-grams match → 3/3 = 1.0

Brevity Penalty: Candidate length = Reference length = 6 BP = 1.0 (no penalty)

Final Score: Geometric mean of all 1.0 values = 1.0 Score = 1.0 × 1.0 = 1.0

This demonstrates that perfect reproduction of the reference achieves the maximum score.

Example

Input

candidate = ['the', 'quick', 'brown', 'fox']
references = [['a', 'fast', 'brown', 'fox'], ['the', 'slow', 'brown', 'dog']]
max_n = 2

Output

0.5

Explanation

With Multiple References, we take the maximum count from any single reference for clipping.

Step 1: Calculate 1-gram Precision

Candidate tokens: {the: 1, quick: 1, brown: 1, fox: 1}

Reference 1 tokens: {a: 1, fast: 1, brown: 1, fox: 1} Reference 2 tokens: {the: 1, slow: 1, brown: 1, dog: 1}

Clipped counts (max from any reference): • 'the': max(0, 1) = 1 ✓ (from ref 2) • 'quick': max(0, 0) = 0 ✗ • 'brown': max(1, 1) = 1 ✓ (both refs) • 'fox': max(1, 0) = 1 ✓ (from ref 1)

p₁ = 3/4 = 0.75

Step 2: Calculate 2-gram Precision

Candidate bigrams: (the,quick), (quick,brown), (brown,fox) Ref 1 bigrams: (a,fast), (fast,brown), (brown,fox) Ref 2 bigrams: (the,slow), (slow,brown), (brown,dog)

Clipped counts: • (the,quick): 0 ✗ • (quick,brown): 0 ✗ • (brown,fox): 1 ✓ (from ref 1)

p₂ = 1/3 ≈ 0.333

Step 3: Brevity Penalty

Candidate length = 4 Reference 1 length = 4 → distance = |4-4| = 0 Reference 2 length = 4 → distance = |4-4| = 0

Both references are equally close; effective reference length r = 4 BP = 1.0

Final Score = 0.5

Accepted0/0·0% Acceptance

Constraints

0 ≤ length of candidate ≤ 1000
1 ≤ number of references ≤ 10
1 ≤ length of each reference ≤ 1000
1 ≤ max_n ≤ 4
All tokens are non-empty strings containing alphanumeric characters
Token comparison is case-sensitive
If multiple references have lengths equally close to the candidate, choose the shorter length

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

max_n =

candidate =

["a","b","c","d"]

references =

[["a","b","x","d"]]

N-Gram Precision Score for Text Generation Evaluation

1. Modified N-Gram Precision

2. Brevity Penalty (BP)

3. Final Score Calculation

Edge Cases

Hints

N-Gram Precision Score for Text Generation Evaluation

1. Modified N-Gram Precision

2. Brevity Penalty (BP)

3. Final Score Calculation

Edge Cases

Hints