Loading problem...
A national insurance operations team runs a risk model that scores every submitted claim. Senior adjusters can only review a limited slice, so each state sends only its highest-risk tier for manual investigation.
Table: Fraud
Task: For every state independently, return the top 5% highest-risk policies.
Formal selection rule per state:
Output requirements:
Important notes:
Supported submission environments:
Fraud:
| policy_id | state | fraud_score |
|-----------|------------|-------------|
| 1 | California | 0.92 |
| 2 | California | 0.68 |
| 3 | California | 0.17 |
| 4 | New York | 0.94 |
| 5 | New York | 0.81 |
| 6 | New York | 0.77 |
| 7 | Texas | 0.98 |
| 8 | Texas | 0.97 |
| 9 | Texas | 0.96 |
| 10 | Florida | 0.97 |
| 11 | Florida | 0.98 |
| 12 | Florida | 0.78 |
| 13 | Florida | 0.88 |
| 14 | Florida | 0.66 |[
{"policy_id":1,"state":"California","fraud_score":0.92},
{"policy_id":11,"state":"Florida","fraud_score":0.98},
{"policy_id":4,"state":"New York","fraud_score":0.94},
{"policy_id":7,"state":"Texas","fraud_score":0.98}
]Each state has fewer than 20 rows, so top_k = 1 everywhere. We keep one highest-risk policy per state.
Fraud:
| policy_id | state | fraud_score |
|-----------|-------|-------------|
| 101 | Ohio | 0.95 |
| 102 | Ohio | 0.95 |
| 103 | Ohio | 0.90 |
| ... | ... | ... |
| 124 | Ohio | 0.21 |[
{"policy_id":101,"state":"Ohio","fraud_score":0.95},
{"policy_id":102,"state":"Ohio","fraud_score":0.95}
]Ohio has n = 24, so top_k = ceil(24 * 0.05) = 2. Two rows are selected. The tie at score 0.95 is resolved by ascending policy_id.
Fraud:
| policy_id | state | fraud_score |
|-----------|---------|-------------|
| 2001 | Arizona | 88.7 |
| 2002 | Arizona | 92.5 |
| 2003 | Arizona | 92.5 |
| 2004 | Arizona | 91.1 |
| 3001 | Nevada | 70.0 |[
{"policy_id":2002,"state":"Arizona","fraud_score":92.5},
{"policy_id":3001,"state":"Nevada","fraud_score":70.0}
]Arizona has 4 rows so top_k = 1; among tied 92.5 rows, policy_id 2002 wins. Nevada has 1 row so it is always selected.
Constraints