Loading content...
When evaluating large language models (LLMs) on mathematical reasoning benchmarks, a critical challenge is extracting structured final answers from free-form text responses. Models are typically prompted to enclose their definitive answer within a specific delimiter pattern—commonly a LaTeX-style command like \boxed{...}—to distinguish the final result from intermediate reasoning steps.
Implementing a robust answer extraction mechanism is essential for automated evaluation pipelines. The extractor must handle various complexities including nested delimiters (when the answer itself contains mathematical expressions with braces), multiple answer candidates (when a model revises its answer during reasoning), and edge cases where no valid answer delimiter exists.
Problem Statement:
Given a string representing a model's complete response to a mathematical problem, implement a function that extracts the content within the \boxed{} delimiter. Your implementation must:
\boxed{ followed by content and a matching closing brace\frac{1}{2} or \sqrt{x^{2}+y^{2}} contain nested braces that must be parsed properly\boxed{} expressions exist (indicating answer revisions), extract from the final one\boxed{} pattern is found, return an empty stringTechnical Details:
The brace-matching algorithm maintains a depth counter initialized to 1 after encountering the opening brace of \boxed{. For each subsequent character:
{}Note: The input contains literal backslash characters followed by "boxed{", not rendered LaTeX. You are processing raw text strings.
response = "Let me solve step by step. First 2+2=4. Therefore \\boxed{4}""4"The function scans the input string for the pattern \boxed{. It locates \boxed{ and then tracks brace depth: starting at depth 1 after the opening brace, it reads characters until depth returns to 0 at the closing }. The extracted content between the braces is "4", which is returned as the final answer.
response = "The answer is \\boxed{\\frac{1}{2}}""\\frac{1}{2}"This example demonstrates nested brace handling. After encountering \boxed{, the depth starts at 1. When \frac{ is encountered, depth increases to 2 at the first inner brace. The nested {2} increases then decreases depth. The algorithm correctly identifies the final } that reduces depth to 0, extracting the complete fraction expression "\frac{1}{2}".
response = "First attempt \\boxed{wrong} but correct answer is \\boxed{42}""42"This response contains two \boxed{} expressions, representing an initial incorrect attempt and a revised final answer. As specified, the function returns the content from the last \boxed{} occurrence, which is "42". This behavior is crucial for handling self-correcting model responses.
Constraints