Picking Multiples of 4

I recently came up with and tried to solve the following problem: If you are randomly picking integers in the range $[1,30]$ out of a hat without replacement, on average, how many integers will you have to pick until you have picked all of the multiples of $4$?

There are $7$ multiples of $4$ that can be chosen. I know that the expected value of picks until you pick a multiple of 4 is the smallest value of $n$ such that $1-\displaystyle\prod_{i=0}^{n-1}\dfrac{23-i}{30-i}>0.5$, which is $3$. However, I don’t know how to figure out how many picks are needed until all multiples of $4$ have been chosen. Can I please have some assistance?

Solutions Collecting From Web of "Picking Multiples of 4"

To begin, you correctly note that there are seven multiples of four to be chosen.

Let us choose a convenient sample space to work with. In this case, I find it convenient to let the sample space be all permutations of $[30]$. We know the samplespace size is $30!$. This is done because we could choose to simply pick more numbers out of the hat until the hat is empty and the probabilities of success in $k$ rounds will remain the same. In addition, each outcome in the sample space is equally likely to occur.

Let us count how many permutations satisfy the following properties for each value of $k$:

  • The $k^{th}$ number is a multiple of four
  • The remaining six multiples of four all appear before the $k^{th}$ number

To do so,

  • Pick the spaces occupied by the multiples of $4$: $\binom{k-1}{6}$ choices.
  • Pick the order in which the multiples of four appear: $7!$ choices
  • Pick the order in which the rest of the numbers appear: $23!$ choices

Letting $X=\text{number of rounds until all multiples of four are drawn}$, we have $Pr(X=k)=\dfrac{\binom{k-1}{6}7!23!}{30!}$. Note that this formula makes sense even in the situation that $k<7$ since it is impossible to have drawn all multiples of four in that time and the formula gives the probability of occurrence as zero.

Now, we can apply the definition of Expected Value.

$\mathbb{E}[X] = \sum\limits_{k\in \Delta} kPr(X=k)$

which in this case yields

$$\mathbb{E}[X]=\sum\limits_{k=7}^{30} \dfrac{k\binom{k-1}{6}7!23!}{30!}=27.125$$

wolfram calculation

Through similar methods, one could also calculate the estimated time before the first occurrence of a multiple of four to be $3.875$ as in @lulu’s comment above.

That answer yields the same result as ours above by recognizing that via interpreting this as selecting occurrences of numbers in our permutation in reverse order and an appropriate shifting of indices, that the first occurrence of a multiple of four being $3.875$ from the end corresponds exactly with the last occurrence of $27.125$ from the start.

Another approach would be to think of a permutation of $\{1, 2, …, 30\}$. We can think of the seven multiples of $4$ as separating runs (some possibly empty) of the $23$ nonmultiples of $4$. Since there are seven separators, there are $8$ runs, with the average run length being $\frac{23}{8}$. Thus the average number of nonmultiples of $4$ after the last (seventh) multiple of $4$ is $\frac{23}{8}=2.875$. Thus the average position of the last multiple of $4$ is position $30-2.875=27.125$.

The probability of getting all multiples of $4$ precisely on draw $k$ is
Thus, the expected number of draws would be

Colour the $23$ numbers not divisible by $4$ blue, and call them $b_1$ to $b_{23}$. Colour the $7$ multiples of $4$ red. For $i=1$ to $23$, let indicator random variable $X_i$ be defined by $X_i=1$ if there is a red number which is chosen later than $b_i$, and by $X_i=0$ otherwise.

Then the total number $W$ of trials until we get all the red numbers is given by $W=7+\sum_1^{23}X_i$.

By the linearity of expectation,

The probability that $X_i=1$ is $\frac{7}{8}$, for the blue integer $b_i$ is equally likely to be first, second, and so on up to eighth among the $8$ numbers consisting of the red numbers and $b_i$. It follows that
This simplifies to $\frac{217}{8}$.

Remark: The indicator random variable technique bypasses finding the distribution of $W$. That distribution is in this case not difficult to find, but the technique can be valuable when the distribution is less accessible.