Loading content...
In sequential decision-making systems and reinforcement learning, agents must continually update their estimates of expected value based on observed outcomes. A powerful technique for this is the recency-weighted value estimate, which applies exponential discounting to give greater influence to recent observations while allowing older data to gradually fade in significance.
Given an initial value estimate Q₁ (representing our prior belief before any observations), a sequence of k observed rewards [R₁, R₂, ..., Rₖ], and a decay rate α (alpha), the recency-weighted estimate is computed using the following closed-form formula:
$$\text{Estimate} = (1 - \alpha)^k \cdot Q_1 + \sum_{i=1}^{k} \alpha \cdot (1 - \alpha)^{k-i} \cdot R_i$$
Understanding the Components:
Key Properties:
Your Task: Implement a function that computes this recency-weighted estimate directly from the formula (not using incremental/recursive updates). The result should be rounded to 4 decimal places.
initial_value = 2.0
observed_rewards = [5.0, 9.0]
decay_rate = 0.34.73With k = 2 observations, we apply the formula:
Initial estimate contribution: $(1 - 0.3)^2 \times 2.0 = 0.49 \times 2.0 = 0.98$
Reward contributions: • R₁ = 5.0: $0.3 \times (1 - 0.3)^{2-1} \times 5.0 = 0.3 \times 0.7 \times 5.0 = 1.05$ • R₂ = 9.0: $0.3 \times (1 - 0.3)^{2-2} \times 9.0 = 0.3 \times 1.0 \times 9.0 = 2.70$
Total estimate: $0.98 + 1.05 + 2.70 = 4.73$
Notice how the most recent reward (R₂ = 9.0) contributes the most (2.70), demonstrating the recency bias of this method.
initial_value = 1.0
observed_rewards = [10.0]
decay_rate = 0.55.5With k = 1 observation and α = 0.5 (equal weighting):
Initial estimate contribution: $(1 - 0.5)^1 \times 1.0 = 0.5 \times 1.0 = 0.5$
Reward contribution: • R₁ = 10.0: $0.5 \times (1 - 0.5)^{1-1} \times 10.0 = 0.5 \times 1.0 \times 10.0 = 5.0$
Total estimate: $0.5 + 5.0 = 5.5$
With α = 0.5, the initial estimate and the single observation are given equal weight, resulting in their simple average.
initial_value = 0.0
observed_rewards = [1.0, 2.0, 3.0]
decay_rate = 0.21.048With k = 3 observations and a conservative α = 0.2:
Initial estimate contribution: $(1 - 0.2)^3 \times 0.0 = 0.512 \times 0.0 = 0.0$
Reward contributions: • R₁ = 1.0: $0.2 \times (0.8)^{2} \times 1.0 = 0.2 \times 0.64 \times 1.0 = 0.128$ • R₂ = 2.0: $0.2 \times (0.8)^{1} \times 2.0 = 0.2 \times 0.8 \times 2.0 = 0.320$ • R₃ = 3.0: $0.2 \times (0.8)^{0} \times 3.0 = 0.2 \times 1.0 \times 3.0 = 0.600$
Total estimate: $0.0 + 0.128 + 0.320 + 0.600 = 1.048$
The low α value means slower forgetting, so even the oldest observation (R₁) still contributes meaningfully to the final estimate.
Constraints