Loading content...
In real-world classification problems, datasets often exhibit severe class imbalance, where some categories appear far more frequently than others. Standard cross-entropy loss treats all examples equally, causing models to become biased toward majority classes and perform poorly on rare but critical minority classes.
The Focused Loss function addresses this challenge by introducing a modulating factor that dynamically adjusts the loss contribution based on prediction confidence. Easy examples (those classified with high confidence) are automatically down-weighted, while hard examples (those misclassified or classified with low confidence) receive greater emphasis during training.
Mathematical Formulation:
For a sample with true class label t and predicted probability pₜ for that class, the focused loss is computed as:
$$FL(p_t) = -\alpha_t \cdot (1 - p_t)^\gamma \cdot \log(p_t)$$
Where:
Understanding the Focusing Mechanism:
When γ = 0, the focused loss reduces to standard weighted cross-entropy. As γ increases:
This creates an automatic hard example mining effect, focusing the model's learning capacity on the samples that matter most.
Your Task:
Implement a function that computes the average focused loss across all samples given:
Implementation Requirements:
y_true = [0, 1, 2]
y_pred = [[0.9, 0.05, 0.05], [0.1, 0.8, 0.1], [0.1, 0.2, 0.7]]
gamma = 2.00.014Let's trace through the computation step by step:
Step 1: Extract predicted probabilities for true classes For each sample, we extract the predicted probability corresponding to the true class index: • Sample 0 (true class 0): pₜ = 0.9 • Sample 1 (true class 1): pₜ = 0.8 • Sample 2 (true class 2): pₜ = 0.7
Step 2: Compute the focusing weight (1 - pₜ)^γ With γ = 2.0: • Sample 0: (1 - 0.9)² = (0.1)² = 0.01 • Sample 1: (1 - 0.8)² = (0.2)² = 0.04 • Sample 2: (1 - 0.7)² = (0.3)² = 0.09
Step 3: Compute cross-entropy component -log(pₜ) • Sample 0: -log(0.9) ≈ 0.1054 • Sample 1: -log(0.8) ≈ 0.2231 • Sample 2: -log(0.7) ≈ 0.3567
Step 4: Compute focused loss per sample FL = focusing_weight × cross_entropy: • Sample 0: 0.01 × 0.1054 ≈ 0.00105 • Sample 1: 0.04 × 0.2231 ≈ 0.00893 • Sample 2: 0.09 × 0.3567 ≈ 0.0321
Step 5: Average the losses Average = (0.00105 + 0.00893 + 0.0321) / 3 ≈ 0.014
y_true = [0, 1]
y_pred = [[0.7, 0.3], [0.4, 0.6]]
gamma = 0.00.434When γ = 0, the focused loss reduces to standard cross-entropy loss:
Step 1: Extract predicted probabilities for true classes • Sample 0 (true class 0): pₜ = 0.7 • Sample 1 (true class 1): pₜ = 0.6
Step 2: With γ = 0, focusing weight = (1 - pₜ)⁰ = 1 for all samples The modulating factor has no effect, so every sample receives equal importance.
Step 3: Compute cross-entropy loss • Sample 0: -log(0.7) ≈ 0.3567 • Sample 1: -log(0.6) ≈ 0.5108
Step 4: Average the losses Average = (0.3567 + 0.5108) / 2 ≈ 0.434
This demonstrates that γ = 0 recovers the behavior of standard cross-entropy.
y_true = [0, 1, 2]
y_pred = [[0.8, 0.1, 0.1], [0.2, 0.6, 0.2], [0.1, 0.3, 0.6]]
gamma = 1.5
alpha = [1.0, 1.5, 2.0]0.157This example demonstrates class-weighted focused loss:
Step 1: Extract predicted probabilities and class weights • Sample 0: pₜ = 0.8, class 0 → α₀ = 1.0 • Sample 1: pₜ = 0.6, class 1 → α₁ = 1.5 • Sample 2: pₜ = 0.6, class 2 → α₂ = 2.0
Step 2: Compute focusing weights with γ = 1.5 • Sample 0: (1 - 0.8)^1.5 = (0.2)^1.5 ≈ 0.0894 • Sample 1: (1 - 0.6)^1.5 = (0.4)^1.5 ≈ 0.2530 • Sample 2: (1 - 0.6)^1.5 = (0.4)^1.5 ≈ 0.2530
Step 3: Compute cross-entropy components • Sample 0: -log(0.8) ≈ 0.2231 • Sample 1: -log(0.6) ≈ 0.5108 • Sample 2: -log(0.6) ≈ 0.5108
Step 4: Compute weighted focused loss per sample FL = α × focusing_weight × cross_entropy: • Sample 0: 1.0 × 0.0894 × 0.2231 ≈ 0.0199 • Sample 1: 1.5 × 0.2530 × 0.5108 ≈ 0.1938 • Sample 2: 2.0 × 0.2530 × 0.5108 ≈ 0.2585
Step 5: Average the losses Average = (0.0199 + 0.1938 + 0.2585) / 3 ≈ 0.157
The alpha weights increase the contribution of minority classes (class 2 has the highest weight).
Constraints