0/318

00:00:00

Description

Editorial

Recency Weighted Value Estimate

MEDIUM20 pts

In sequential decision-making systems and reinforcement learning, agents must continually update their estimates of expected value based on observed outcomes. A powerful technique for this is the recency-weighted value estimate, which applies exponential discounting to give greater influence to recent observations while allowing older data to gradually fade in significance.

Given an initial value estimate Q₁ (representing our prior belief before any observations), a sequence of k observed rewards [R₁, R₂, ..., Rₖ], and a decay rate α (alpha), the recency-weighted estimate is computed using the following closed-form formula:

$$\text{Estimate} = (1 - \alpha)^k \cdot Q_1 + \sum_{i=1}^{k} \alpha \cdot (1 - \alpha)^{k-i} \cdot R_i$$

Understanding the Components:

$(1 - \alpha)^k \cdot Q_1$: The contribution of the initial estimate, which decays exponentially with each new observation
$\alpha \cdot (1 - \alpha)^{k-i} \cdot R_i$: The contribution of each reward Rᵢ, where more recent rewards (larger i) have higher weights since the exponent (k-i) is smaller
α (decay rate): Controls how quickly old information is discounted. Higher α means faster forgetting of historical data

Key Properties:

Recency Bias: The most recent observation Rₖ has weight α, while earlier observations have progressively smaller weights
Forgetting Factor: The term (1 - α) acts as a forgetting factor, causing exponential decay of historical influence
Normalization: The weights approximately sum to 1, maintaining a proper weighted average interpretation
Non-Stationary Adaptation: This method naturally adapts to changing environments by down-weighting outdated information

Your Task: Implement a function that computes this recency-weighted estimate directly from the formula (not using incremental/recursive updates). The result should be rounded to 4 decimal places.

Example

Input

initial_value = 2.0
observed_rewards = [5.0, 9.0]
decay_rate = 0.3

Output

4.73

Explanation

With k = 2 observations, we apply the formula:

Initial estimate contribution: $(1 - 0.3)^2 \times 2.0 = 0.49 \times 2.0 = 0.98$

Reward contributions: • R₁ = 5.0: $0.3 \times (1 - 0.3)^{2-1} \times 5.0 = 0.3 \times 0.7 \times 5.0 = 1.05$ • R₂ = 9.0: $0.3 \times (1 - 0.3)^{2-2} \times 9.0 = 0.3 \times 1.0 \times 9.0 = 2.70$

Total estimate: $0.98 + 1.05 + 2.70 = 4.73$

Notice how the most recent reward (R₂ = 9.0) contributes the most (2.70), demonstrating the recency bias of this method.

Example

Input

initial_value = 1.0
observed_rewards = [10.0]
decay_rate = 0.5

Output

5.5

Explanation

With k = 1 observation and α = 0.5 (equal weighting):

Initial estimate contribution: $(1 - 0.5)^1 \times 1.0 = 0.5 \times 1.0 = 0.5$

Reward contribution: • R₁ = 10.0: $0.5 \times (1 - 0.5)^{1-1} \times 10.0 = 0.5 \times 1.0 \times 10.0 = 5.0$

Total estimate: $0.5 + 5.0 = 5.5$

With α = 0.5, the initial estimate and the single observation are given equal weight, resulting in their simple average.

Example

Input

initial_value = 0.0
observed_rewards = [1.0, 2.0, 3.0]
decay_rate = 0.2

Output

1.048

Explanation

With k = 3 observations and a conservative α = 0.2:

Initial estimate contribution: $(1 - 0.2)^3 \times 0.0 = 0.512 \times 0.0 = 0.0$

Reward contributions: • R₁ = 1.0: $0.2 \times (0.8)^{2} \times 1.0 = 0.2 \times 0.64 \times 1.0 = 0.128$ • R₂ = 2.0: $0.2 \times (0.8)^{1} \times 2.0 = 0.2 \times 0.8 \times 2.0 = 0.320$ • R₃ = 3.0: $0.2 \times (0.8)^{0} \times 3.0 = 0.2 \times 1.0 \times 3.0 = 0.600$

Total estimate: $0.0 + 0.128 + 0.320 + 0.600 = 1.048$

The low α value means slower forgetting, so even the oldest observation (R₁) still contributes meaningfully to the final estimate.

Accepted0/0·0% Acceptance

Constraints

0.0 ≤ initial_value ≤ 10⁶
1 ≤ length of observed_rewards ≤ 1000
-10⁶ ≤ observed_rewards[i] ≤ 10⁶
0.0 < decay_rate < 1.0
The result should be rounded to 4 decimal places
Use direct formula computation, not incremental updates

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

Q1 =

alpha =

0.3

rewards =

[5,9]

Recency Weighted Value Estimate

Hints

Recency Weighted Value Estimate

Hints