0/318

00:00:00

Description

Editorial

Cumulative Reward with Temporal Decay

EASY10 pts

In reinforcement learning and sequential decision-making, agents collect rewards over time as they interact with an environment. However, immediate rewards are typically valued more than future rewards—a fundamental concept known as temporal discounting. This principle reflects the uncertainty of future outcomes and the preference for immediate gratification.

The cumulative reward with temporal decay (also known as the discounted return or discounted cumulative reward) quantifies the total value of a sequence of rewards, where each future reward is progressively diminished by a decay factor.

Mathematical Formulation:

Given a sequence of rewards ( R = [r_0, r_1, r_2, ..., r_{T-1}] ) and a decay factor ( \gamma ) (gamma), where ( 0 < \gamma \leq 1 ), the cumulative decayed reward ( G ) is computed as:

$$G = \sum_{t=0}^{T-1} \gamma^t \cdot r_t = r_0 + \gamma \cdot r_1 + \gamma^2 \cdot r_2 + ... + \gamma^{T-1} \cdot r_{T-1}$$

Understanding the Decay Factor (γ):

When γ = 1: No decay is applied; all future rewards are valued equally as immediate rewards. This represents a myopic agent that doesn't discount the future.
When γ → 0: Future rewards become negligible; only the immediate reward matters. The agent is extremely short-sighted.
When 0 < γ < 1: A balanced approach where future rewards matter but are progressively discounted. Typical values in practice range from 0.9 to 0.99.

Your Task: Write a Python function that computes the cumulative reward with temporal decay given a list of rewards and a decay factor gamma. The function should use NumPy for efficient computation and return the scalar value representing the total decayed cumulative reward.

Example

Input

rewards = [1, 1, 1]
gamma = 0.5

Output

1.75

Explanation

With gamma = 0.5, each subsequent reward is multiplied by an increasing power of 0.5:

• Time step 0: 1 × (0.5)⁰ = 1 × 1 = 1.0 • Time step 1: 1 × (0.5)¹ = 1 × 0.5 = 0.5 • Time step 2: 1 × (0.5)² = 1 × 0.25 = 0.25

Total cumulative decayed reward: 1.0 + 0.5 + 0.25 = 1.75

Example

Input

rewards = [1, 2, 3, 4, 5]
gamma = 1.0

Output

15.0

Explanation

When gamma = 1.0, there is no temporal decay applied. Each reward contributes its full value:

• 1 × (1.0)⁰ + 2 × (1.0)¹ + 3 × (1.0)² + 4 × (1.0)³ + 5 × (1.0)⁴ • = 1 + 2 + 3 + 4 + 5 • = 15.0

This is equivalent to a simple sum of all rewards, representing an agent that values all future rewards equally.

Example

Input

rewards = [10, 5, 2]
gamma = 0.9

Output

16.12

Explanation

With gamma = 0.9 (a typical value in RL), future rewards are discounted but still significant:

• Time step 0: 10 × (0.9)⁰ = 10 × 1 = 10.0 • Time step 1: 5 × (0.9)¹ = 5 × 0.9 = 4.5 • Time step 2: 2 × (0.9)² = 2 × 0.81 = 1.62

Total: 10.0 + 4.5 + 1.62 = 16.12

Note how the immediate large reward (10) contributes most significantly, while the smaller future rewards (5 and 2) are progressively diminished.

Accepted0/0·0% Acceptance

Constraints

0 < gamma ≤ 1 (decay factor is positive and at most 1)
1 ≤ length of rewards ≤ 10,000
-10⁶ ≤ rewards[i] ≤ 10⁶ (reward values can be positive, negative, or zero)
The rewards list will contain at least one element
Use NumPy for implementation as specified
Output should be rounded to 2 decimal places

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

gamma =

0.5

rewards =

[1,1,1]

Cumulative Reward with Temporal Decay

Hints

Cumulative Reward with Temporal Decay

Hints