0/318

00:00:00

Description

Editorial

Unigram Overlap Evaluation Metric

MEDIUM20 pts

In natural language processing (NLP), evaluating the quality of machine-generated text is a fundamental challenge. One widely-used approach is to measure the lexical overlap between a generated text and a human-written reference. By quantifying how many words (unigrams) the candidate text shares with the reference, we can assess how well the generated output captures the essential content.

The Unigram Overlap Evaluation Metric computes three complementary scores that together provide a comprehensive view of text quality:

Precision

Precision measures the proportion of words in the candidate text that also appear in the reference text. It answers the question: "Of everything the model generated, how much was actually relevant?"

$$\text{Precision} = \frac{\text{Number of overlapping unigrams}}{\text{Total unigrams in candidate}}$$

High precision indicates that the generated text is concise and doesn't contain irrelevant words.

Recall

Recall measures the proportion of words in the reference text that are captured by the candidate text. It answers the question: "Of everything that should have been included, how much did the model capture?"

$$\text{Recall} = \frac{\text{Number of overlapping unigrams}}{\text{Total unigrams in reference}}$$

High recall indicates that the generated text comprehensively covers the reference content.

F1 Score

The F1 Score is the harmonic mean of precision and recall, providing a single balanced metric that penalizes extreme imbalances between the two:

$$F_1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$

If either precision or recall is zero, the F1 score is defined as 0.

Counting Overlapping Unigrams

When counting overlapping unigrams, we must account for word frequency. If a word appears multiple times in both texts, the overlap count for that word is the minimum of its counts in the reference and candidate:

$$\text{Overlap for word } w = \min(\text{count}\text{reference}(w), \text{count}\text{candidate}(w))$$

The total overlap is the sum of these minimum counts across all unique words.

Your Task

Write a Python function that computes the unigram overlap evaluation scores between a reference text and a candidate text. The function should:

Tokenize both texts into words (split by whitespace)
Count unigram frequencies in both texts
Calculate the number of overlapping unigrams (using minimum counts for repeated words)
Return a dictionary with precision, recall, and f1 scores as keys

Example

Input

reference = "the cat sat on the mat"
candidate = "the cat is on the mat"

Output

{"precision": 0.8333333333333334, "recall": 0.8333333333333334, "f1": 0.8333333333333334}

Explanation

Step 1: Tokenize both texts

Reference tokens: ["the", "cat", "sat", "on", "the", "mat"] → 6 tokens
Candidate tokens: ["the", "cat", "is", "on", "the", "mat"] → 6 tokens

Step 2: Count unigram frequencies

Reference counts: {"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1}
Candidate counts: {"the": 2, "cat": 1, "is": 1, "on": 1, "mat": 1}

Step 3: Calculate overlapping unigrams For each word present in both texts, take the minimum count:

"the": min(2, 2) = 2
"cat": min(1, 1) = 1
"on": min(1, 1) = 1
"mat": min(1, 1) = 1
"sat": only in reference (no overlap)
"is": only in candidate (no overlap)

Total overlap = 2 + 1 + 1 + 1 = 5

Step 4: Compute metrics

Precision = 5/6 ≈ 0.8333 (5 overlapping out of 6 candidate unigrams)
Recall = 5/6 ≈ 0.8333 (5 overlapping out of 6 reference unigrams)
F1 = 2 × (0.8333 × 0.8333) / (0.8333 + 0.8333) = 0.8333

Example

Input

reference = "machine learning is amazing"
candidate = "deep learning is powerful"

Output

{"precision": 0.5, "recall": 0.5, "f1": 0.5}

Explanation

Step 1: Tokenize both texts

Reference tokens: ["machine", "learning", "is", "amazing"] → 4 tokens
Candidate tokens: ["deep", "learning", "is", "powerful"] → 4 tokens

Step 2: Count unigram frequencies

Reference counts: {"machine": 1, "learning": 1, "is": 1, "amazing": 1}
Candidate counts: {"deep": 1, "learning": 1, "is": 1, "powerful": 1}

Step 3: Calculate overlapping unigrams Common words: "learning" and "is"

"learning": min(1, 1) = 1
"is": min(1, 1) = 1

Total overlap = 1 + 1 = 2

Step 4: Compute metrics

Precision = 2/4 = 0.5 (2 overlapping out of 4 candidate unigrams)
Recall = 2/4 = 0.5 (2 overlapping out of 4 reference unigrams)
F1 = 2 × (0.5 × 0.5) / (0.5 + 0.5) = 0.5

Example

Input

reference = "hello world"
candidate = "hello world"

Output

{"precision": 1.0, "recall": 1.0, "f1": 1.0}

Explanation

Perfect Match Case

When the candidate exactly matches the reference:

Reference tokens: ["hello", "world"] → 2 tokens
Candidate tokens: ["hello", "world"] → 2 tokens

Both words overlap completely:

"hello": min(1, 1) = 1
"world": min(1, 1) = 1

Total overlap = 2

Precision = 2/2 = 1.0 (all candidate words appear in reference)
Recall = 2/2 = 1.0 (all reference words captured by candidate)
F1 = 2 × (1.0 × 1.0) / (1.0 + 1.0) = 1.0

A perfect score of 1.0 for all metrics indicates the candidate is identical to the reference at the unigram level.

Accepted0/0·0% Acceptance

Constraints

Both reference and candidate are non-empty strings
Texts contain lowercase alphabetic characters and spaces only
Words are separated by single spaces
1 ≤ number of words in each text ≤ 1000
1 ≤ length of each word ≤ 100 characters
The comparison is case-sensitive (all inputs are lowercase)

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

candidate =

"the cat is on the mat"

reference =

"the cat sat on the mat"

Unigram Overlap Evaluation Metric

Precision

Recall

F1 Score

Counting Overlapping Unigrams

Your Task

Hints

Unigram Overlap Evaluation Metric

Precision

Recall

F1 Score

Counting Overlapping Unigrams

Your Task

Hints