Average number of guesses to guess number between 1 and 1000?

There is a game where you get to pick a number between 1 and 1000 and then the system will tell if you guessed to high or too low. Using a binary search you can find the answer in a maximum of ten guesses (2^10 = 1024) , but how would I go about calculating the average number of guesses required to guess the given method, assuming the user uses this binary chop method?

Solutions Collecting From Web of "Average number of guesses to guess number between 1 and 1000?"

Let us take the more ideal case of guessing integers between $1$ and $2^{10}-1=1111111111_2$ writing in base $2$ from now on without explicitly writing subscript $2$ each time. Then the algorithm for the guesses will be like this:

  1. The initial guess is $1\overline 0$ where $\overline 0$ means fill up with zeros until we have a ten digit number
  2. Write the $n$-th guess as $g_n=x_n1\overline 0$ where $x_n$ is a binary string of length $(n-1)$
  3. If the game responds by “higher”, the next guess is $g_{n+1}=x_n11\overline 0$
  4. If the game responds by “lower”, the next guess is $g_{n+1}=x_n01\overline 0$
  5. If the game responds by “correct” we are done after $n$ guesses

So we see that the game terminates whenever there are only zeros remaining right of $x_n1$. Then note that there are $2^{n-1}$ binary strings $x_n$ of length $(n-1)$. So $2^{n-1}$ numbers are guessed after exactly $n$ guesses. So for this setup the expected number of guesses will be, returning to writing in base $10$:
$$
\frac{1}{2^{10}-1}\sum_{n=1}^{10}n\cdot 2^{n-1}=\frac{9217}{1023}\approx9.00978
$$
So the average case here is very close to the worst case of $10$ guesses. This could seem odd at first, but odd is also the clue here … Half of the time it takes $10$ guesses to reveal that the randomly picked number $x$ is in fact odd. The first $9$ guesses fills up with zeros which in base $2$ means “divisible by $2$”. One quarter of the time $x$ is divisible by $2^2=4$ forcing the algorithm to make $9$ guesses before succes. And so on.


I wrote the following code in Python:

L=[1,1000]
G={}
for n in range(1,11):
    i=0
    while i<len(L)-1:
        guess = int(0.5*(L[i]+L[i+1]+1))
        if guess not in L:
            L = L[:i+1]+[guess]+L[i+1:]
            i+=1
            G[guess] = n
        i+=1
S=0
for g in G:
    S+=G[g]
print (10+S+10)

which responded with $8987$, so the average for the $1$ to $1000$ case, assuming that we always make the guess of the rounded average of the limits of the interval we know that $x$ belongs to, must be $8.987$ guesses.

Let $c$ be the target number somewhere between lower bound $1$ and upper bound $n$, inclusive. We will search at the value $M = \lceil\frac{n}{2}\rceil$. If we’re told that the guess is too high, we will set $n=M-1$ and try the search again. Otherwise, we will set the lower bound to $M+1$ and search again. However, what really matters is $c$’s position relative to the bounds, so this is also the same as starting a new search with lower bound $1$ and upper bound $n-M$ with a new target number $c-M$.

This recursive process ends in two possible ways: We either stop when we find the target number, or we stop if the lower and upper bounds are the same (in which case only one search is performed).

Let $F(n)$ represent the expected number of searches needed to find the unknown target number, where $F(0) = 0$ and $F(1) = 1$.

Overall, there is a $\frac{M-1}{n}$ probability that $c < M$, a $\frac{1}{n}$ probability that $c=M$, and a $\frac{n-M}{n}$ probability that $c>M$.

$$F(n) = \left(\frac{M-1}{n}\right)\left(1+F(M-1)\right) + \left(\frac{1}{n}\right)\left(1\right) + \left(\frac{n-M}{n}\right)\left(1+F(n-M)\right)$$

Multiply by $n$, simplify things a bit, and substitute in the full value for $M$:

$$nF(n) = \left(\left\lceil\frac{n}{2}\right\rceil-1\right)F\left(\left\lceil\frac{n}{2}\right\rceil-1\right) + n + \left(n-\left\lceil\frac{n}{2}\right\rceil\right)F\left(n-\left\lceil\frac{n}{2}\right\rceil\right)$$

Let $S(n) = nF(n)$ and $T(n) = S(n) – S(n-1)$, yielding:

$$T(n) = \left(n-\left\lceil\frac{n}{2}\right\rceil\right)F\left(n-\left\lceil\frac{n}{2}\right\rceil\right) – \left(n-\left\lceil\frac{n}{2}\right\rceil-1\right)F\left(n-\left\lceil\frac{n}{2}\right\rceil-1\right) + 1$$

Based on the definitions $S(n)$ and $T(n)$, this is equivalent to the following:

$$T(n) = S\left(n-\left\lceil\frac{n}{2}\right\rceil\right) – S\left(n-\left\lceil\frac{n}{2}\right\rceil-1\right) +1 = T\left(n-\left\lceil\frac{n}{2}\right\rceil\right)+ 1 = T\left(\left\lfloor\frac{n}{2}\right\rfloor\right)+1$$

In other words, $T(n) = \lfloor \log_2(n)\rfloor + 1$. But we want $F(n)$, not just $T(n)$. Thankfully, since $T(n) = S(n) – S(n-1) = nF(n) – (n-1)F(n-1)$, the sum over many $T(k)$ values cancels out intermediate terms, giving us $nF(n)$. After dividing by $n$, we finally isolate $F(n)$:

$$F(n) = \frac{1}{n}\sum_{k=1}^{n} T(k) = 1 + \frac{1}{n}\sum_{k=1}^{n}\lfloor \log_2(k)\rfloor$$

This summation actually has a closed form as well:

$$F(n) = 1 + \frac{1}{n}\left((n+1)\lfloor \log_2(n)\rfloor – 2(2^{\lfloor \log_2(n)\rfloor} – 1)\right)$$

To offer a Python program in the same spirit as String’s answer:

from fractions import Fraction
from math import log

def F(n):
    if n==0: return 0
    lg = int(log(n,2))
    return 1 + Fraction(1,n)*((n+1)*lg - 2*(2**lg-1))

For $n=1000$ the result is $\frac{8987}{1000} = 8.987$.