You play a game flipping a fair coin. You may stop after any trial, at which point you are paid in dollars
the percentage of heads flipped. So if on the first trial you flip a head, you should stop and earn \$100
because you have 100% heads. If you flip a tail then a head, you could either stop and earn \$50,
or continue on, hoping the ratio will exceed 1/2. This second strategy is superior.
A paper by Medina and Zeilberger (arXiv:0907.0032v2 [math.PR]) says that it is an unsolved
problem to determine if it is better to continue or stop after you have flipped 5 heads in 8 trials: accept \$62.50 or hope for more. It is easy to simulate this problem and it is clear
from even limited experimental data that it is better to continue (perhaps more than 70% chance you’ll improve over \$62.50):
My question is basically: Why is this difficult to prove? Presumably it is not that difficult
to write out an expression for the expectation of exceeding 5/8 in terms of the cumulative binomial distribution.
Olle Häggström, Johan Wästlund.
“Rigorous computer analysis of the Chow-Robbins game.”
(pre-journal arXiv link).
The American Mathematical Monthly, Vol. 120, No. 10, December 2013.
(Jstor link).
From the Abstract:
“In particular, we confirm that with 5 heads and 3 tails, stopping is optimal.”
I accept Qiaochu’s answer “Have you tried actually doing that?” I did try now, and now I can appreciate the challenge. 🙂 The paper I cited refers to another by Chow and Robbins from 1965
that has a beautiful formulation, much clearer than the cummulative binomials with which I struggled. Let me explain it, because it is really cool.
For the natural strategy I mentioned in the comments (and echoed by Raynos),
let $f(n,h,t)$ be the expected payoff
if you start with $h$ heads and $t$ tails, and let the game continue no more than $n$ further trials. Then there is an easy recursive formulation for $f$:
$$f(n,h,t) = \max \left( \frac{1}{2} f(n,h+1,t) + \frac{1}{2} f(n,h,t+1) , h/(h+t) \right) $$
because you have an equal chance of increasing to $h+1$ heads or to $t+1$ tails on the next
flip if you continue, and you get the current ratio if you stop.
Now, when $h+t=n$, you need to make a “boundary” assumption. Assuming the law of
large numbers for large $n$ leads to the reasonable value $\max ( 1/2, h/n )$ in this case.
So now all you need to do is fill out the table for all $h$ and $t$ values up to $n=h+t$.
The real answer is the limit when
$n \rightarrow \infty$, but using large $n$ approximates this limit.
After the Medina and Zeilberger paper was released, in fact just about three weeks ago,
a very careful calculation using the above recursive formulation was made by Julian Wiseman and reported on this web page. The conclusion is to me remarkable: “Choosing to continue lowers the expected payoff [from 0.625] to 0.62361957757.”
This is still not a proof, but the “answer” is now known.
So my “it is clear from even limited experimental data that” was completely wrong!
I am delighted to learn from my mistake.
This seems to be related to Gittins Indices. Gittins Indices are a way of solving these kind of optimal stopping problems for some classes of problems, and basically give you a way of balancing how much you are expected to gain given your current knowledge and how much more you could gain by risking obtaining more information about the process (or probability of flipping heads, etc).
Bruno
I’m probably doing something wrong, but…
If you play for n more turns and get k heads, your percentage will be
(5+k)/(n+8). The chance of this occuring is Binomial[n,k]/2^n. Summing
over all k for a given n, we have:
ex2[n_] = Sum[(5+k)*Binomial[n,k]/2^n, {k,0,n}]/(n+8)
Mathematics gives this as: 1/2 + 1/(8+n)
So, if we play for n more turns, our expected value is 1/2 + 1/(8+n).
Subtracting off the 5/8 we already have and simplifying yields
-1/8 + 1/(8+n) which is obviously negative for any natural number n.
Please critique.
I admit my first answer was rubbish, and I’m still not sure I fully understand this problem.
As the OP notes, it’s trivial to prove that there’s a better than 50% chance you can improve on your current score, unless your current score is already 100%:
There’s a 50% chance you’ll get a head on your next flip, improving your score. Solely based on this, you have at least a 50% chance of improving your score.
If you get a tail on your next flip, there’s a non-zero chance that you will eventually beat your score.
Thus, the total chance you will beat your current score is always greater than 50%.
Suppose, instead, you are concerned with the mean value of your winnings if you continue to flip. There’s a better than 50% chance you’ll increase your winnings, but also a chance you’ll decrease your winnings, and the amount of the decrease might be enough to wipe out possible gains, even though the probability is lower.
For example, if you have 5/8 and get a head, your winnings increase to 6/9, an increase of 1/24 (or 3/72). However, if you get a tail, your winnings decrease to 5/9, a decrease of 5/72. Of course, that’s just a one-flip example. If you choose to flip a fixed number of times, your winnings will approach 1/2. However, suppose you follow this strategy:
When my winnings exceed 5/8 (or whatever my initial winnings were), stop.
Otherwise, continue flipping, potentially forever.
The question: in this scenario, what would the mean winnings be?
I believe that’s what the question is actually asking.