00:00:00

Description

Editorial

Coin Flip Wealth Strategy Optimizer

HARD40 pts

In the realm of reinforcement learning and dynamic programming, solving sequential decision problems under uncertainty is a fundamental challenge. This problem explores a classic scenario where strategic decision-making can maximize the probability of achieving a target goal.

The Wealth Accumulation Game

Consider a player who participates in a series of biased coin flips with the goal of accumulating wealth. The game mechanics are as follows:

Initial State: The player starts with an initial capital s where 0 < s < 100
Target Goal: The player aims to reach exactly 100 units of wealth
Terminal Conditions: The game ends when the player reaches either 0 (bankruptcy) or 100 (success)
Wager Constraints: On each turn, the player can bet any integer amount from 1 to min(s, 100 - s)
Outcome Mechanics:
- If the coin lands heads (probability = p_win): The player gains the wagered amount
- If the coin lands tails (probability = 1 - p_win): The player loses the wagered amount
Reward Structure: A reward of +1 is received upon reaching 100, and 0 otherwise

Value Iteration Approach

Your task is to implement the value iteration algorithm to solve this Markov Decision Process (MDP):

State Space: States s = 0, 1, 2, ..., 100 represent the player's current capital
Action Space: For state s, valid actions (wagers) are a ∈ {1, 2, ..., min(s, 100-s)}
Value Function: V(s) represents the optimal expected value (probability of reaching 100) from state s
Bellman Optimality Equation:

$$V(s) = \max_{a} \left[ p_{win} \cdot V(s + a) + (1 - p_{win}) \cdot V(s - a) \right]$$

Boundary Conditions: V(0) = 0 (bankruptcy), V(100) = 1 (goal reached)

Iterate until the maximum change in value function across all states falls below the convergence threshold (theta).

Your Task

Implement a function that computes both the optimal state-value function and the optimal betting strategy for all states from 0 to 100 using value iteration.

Example

Input

win_probability = 0.4
convergence_threshold = 1e-9

Output

values[50] ≈ 0.5433, strategy[50] = 50

Explanation

With an unfavorable coin (40% win probability), the optimal strategy from state 50 involves careful risk management.

Value Iteration Process:

The algorithm iteratively computes V(s) for each state using the Bellman equation
For state 50, the player can wager anywhere from 1 to 50
The optimal value V(50) ≈ 0.5433 represents approximately a 54.33% chance of eventually reaching 100

Optimal Strategy Analysis:

When the coin is biased against the player (p < 0.5), larger bets often become optimal
This is because with an unfavorable game, taking bigger risks quickly is better than slowly grinding
The strategy[50] = 50 means betting everything gives the best chance of reaching the goal

Key Insight: When odds are against you, betting conservatively prolongs the inevitable losses, while bold bets leverage the remaining probability of a lucky streak.

Example

Input

win_probability = 0.5
convergence_threshold = 1e-9

Output

values[50] = 1.0, strategy[50] = 1

Explanation

With a fair coin (50% win probability), the game becomes a fascinating study in random walks.

Fair Game Dynamics:

When p = 0.5, this becomes an unbiased random walk with absorbing barriers
The probability of reaching 100 from state s is exactly s/100 (linear relationship)
Therefore, V(50) = 50/100 = 0.5 * 2 = 1.0 (accounting for the reward structure)

Policy Indifference:

In a fair game, ALL betting strategies yield the same expected value!
The value function is linear: V(s) = s × 0.02 (normalized by the reward)
Whether betting 1 or 50, the expected outcome remains mathematically identical

Strategy Selection:

When multiple strategies are optimal, the algorithm may return any valid action
A conservative strategy (betting 1) maximizes play duration
An aggressive strategy (betting 50) terminates quickly with dramatic outcomes

This illustrates the principle of strategy indifference in zero-sum fair games.

Example

Input

win_probability = 0.6
convergence_threshold = 1e-9

Output

values[1] ≈ 0.6667, strategy[1] = 1

Explanation

With a favorable coin (60% win probability), the optimal strategy shifts dramatically toward caution.

Favorable Odds Analysis:

When p > 0.5, the player has a long-term edge
The optimal strategy is to bet conservatively (typically betting 1)
This maximizes the number of opportunities to leverage the favorable odds

Value Function Behavior:

V(1) ≈ 0.6667 represents roughly a 66.67% probability of eventually reaching 100 from state 1
Values converge rapidly toward 2.0 (maximum expected reward) for higher states
By state 27+, the value function is essentially 2.0 (near-certain success)

Expected Value Calculation: For state 1 with bet 1: V(1) = 0.6 × V(2) + 0.4 × V(0) = 0.6 × V(2)

Strategic Insight: When you have an edge, patience and consistency triumph. The "grind" strategy of small, repeated bets lets the law of large numbers work in your favor.

Accepted0/0·0% Acceptance

Constraints

0 < win_probability < 1 (probability of winning each coin flip)
1e-12 ≤ convergence_threshold ≤ 1e-6 (value iteration stopping criterion)
State space: s ∈ {0, 1, 2, ..., 100}
Action space for state s: a ∈ {1, 2, ..., min(s, 100-s)}
Terminal states: s = 0 (loss) and s = 100 (win)
Reward: +1 upon reaching state 100, 0 otherwise
Values should be rounded to 4 decimal places for output
Strategy should return any optimal action when ties exist

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

ph =

0.4

theta =

1e-9

Loading problem...

101

00:00:00

Description

Editorial

Coin Flip Wealth Strategy Optimizer

HARD40 pts

The Wealth Accumulation Game

Consider a player who participates in a series of biased coin flips with the goal of accumulating wealth. The game mechanics are as follows:

Initial State: The player starts with an initial capital s where 0 < s < 100
Target Goal: The player aims to reach exactly 100 units of wealth
Terminal Conditions: The game ends when the player reaches either 0 (bankruptcy) or 100 (success)
Wager Constraints: On each turn, the player can bet any integer amount from 1 to min(s, 100 - s)
Outcome Mechanics:
- If the coin lands heads (probability = p_win): The player gains the wagered amount
- If the coin lands tails (probability = 1 - p_win): The player loses the wagered amount
Reward Structure: A reward of +1 is received upon reaching 100, and 0 otherwise

Value Iteration Approach

Your task is to implement the value iteration algorithm to solve this Markov Decision Process (MDP):

State Space: States s = 0, 1, 2, ..., 100 represent the player's current capital
Action Space: For state s, valid actions (wagers) are a ∈ {1, 2, ..., min(s, 100-s)}
Value Function: V(s) represents the optimal expected value (probability of reaching 100) from state s
Bellman Optimality Equation:

$$V(s) = \max_{a} \left[ p_{win} \cdot V(s + a) + (1 - p_{win}) \cdot V(s - a) \right]$$

Boundary Conditions: V(0) = 0 (bankruptcy), V(100) = 1 (goal reached)

Iterate until the maximum change in value function across all states falls below the convergence threshold (theta).

Your Task

Implement a function that computes both the optimal state-value function and the optimal betting strategy for all states from 0 to 100 using value iteration.

Example

Input

win_probability = 0.4
convergence_threshold = 1e-9

Output

values[50] ≈ 0.5433, strategy[50] = 50

Explanation

With an unfavorable coin (40% win probability), the optimal strategy from state 50 involves careful risk management.

Value Iteration Process:

The algorithm iteratively computes V(s) for each state using the Bellman equation
For state 50, the player can wager anywhere from 1 to 50
The optimal value V(50) ≈ 0.5433 represents approximately a 54.33% chance of eventually reaching 100

Optimal Strategy Analysis:

When the coin is biased against the player (p < 0.5), larger bets often become optimal
This is because with an unfavorable game, taking bigger risks quickly is better than slowly grinding
The strategy[50] = 50 means betting everything gives the best chance of reaching the goal

Key Insight: When odds are against you, betting conservatively prolongs the inevitable losses, while bold bets leverage the remaining probability of a lucky streak.

Example

Input

win_probability = 0.5
convergence_threshold = 1e-9

Output

values[50] = 1.0, strategy[50] = 1

Explanation

With a fair coin (50% win probability), the game becomes a fascinating study in random walks.

Fair Game Dynamics:

When p = 0.5, this becomes an unbiased random walk with absorbing barriers
The probability of reaching 100 from state s is exactly s/100 (linear relationship)
Therefore, V(50) = 50/100 = 0.5 * 2 = 1.0 (accounting for the reward structure)

Policy Indifference:

In a fair game, ALL betting strategies yield the same expected value!
The value function is linear: V(s) = s × 0.02 (normalized by the reward)
Whether betting 1 or 50, the expected outcome remains mathematically identical

Strategy Selection:

When multiple strategies are optimal, the algorithm may return any valid action
A conservative strategy (betting 1) maximizes play duration
An aggressive strategy (betting 50) terminates quickly with dramatic outcomes

This illustrates the principle of strategy indifference in zero-sum fair games.

Example

Input

win_probability = 0.6
convergence_threshold = 1e-9

Output

values[1] ≈ 0.6667, strategy[1] = 1

Explanation

With a favorable coin (60% win probability), the optimal strategy shifts dramatically toward caution.

Favorable Odds Analysis:

When p > 0.5, the player has a long-term edge
The optimal strategy is to bet conservatively (typically betting 1)
This maximizes the number of opportunities to leverage the favorable odds

Value Function Behavior:

V(1) ≈ 0.6667 represents roughly a 66.67% probability of eventually reaching 100 from state 1
Values converge rapidly toward 2.0 (maximum expected reward) for higher states
By state 27+, the value function is essentially 2.0 (near-certain success)

Expected Value Calculation: For state 1 with bet 1: V(1) = 0.6 × V(2) + 0.4 × V(0) = 0.6 × V(2)

Strategic Insight: When you have an edge, patience and consistency triumph. The "grind" strategy of small, repeated bets lets the law of large numbers work in your favor.

Accepted0/0·0% Acceptance

Constraints

0 < win_probability < 1 (probability of winning each coin flip)
1e-12 ≤ convergence_threshold ≤ 1e-6 (value iteration stopping criterion)
State space: s ∈ {0, 1, 2, ..., 100}
Action space for state s: a ∈ {1, 2, ..., min(s, 100-s)}
Terminal states: s = 0 (loss) and s = 100 (win)
Reward: +1 upon reaching state 100, 0 otherwise
Values should be rounded to 4 decimal places for output
Strategy should return any optimal action when ties exist

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

ph =

0.4

theta =

1e-9

Coin Flip Wealth Strategy Optimizer

The Wealth Accumulation Game

Value Iteration Approach

Your Task

Hints

Coin Flip Wealth Strategy Optimizer

The Wealth Accumulation Game

Value Iteration Approach

Your Task

Hints