Expected value of size of subset

Given a set $S$ such that $|S|=n$,

A random item is chosen randomly from $S$, and being appended to a new set $T$.

This process is being repeated $n$ times (with repetition), what is the expected value of $|T|$ ?

Solutions Collecting From Web of "Expected value of size of subset"

Let $T_k$ be the number of elements in $T$ after $k$ rounds. We also set $T_0 = 0$.

Assuming that $T_k = l$, you have a probability of $\frac{l}{n}$ that the $k + 1$-th element is already in $T$ and a probability of $\frac{n – l}{n}$ that it isn’t. This means:

$$E[T_{k + 1} \mid T_k = l] = \frac{l}{n} l + \frac{n – l}{n}(l+1) = l\left(1 – \frac{1}{n}\right) + 1$$

Now we can apply the law of total expectation:
$$E[T_{k + 1}] = E[E[T_{k + 1} \mid T_k]] = E\left[T_k \left(1 – \frac{1}{n}\right) + 1\right] = E[T_k] \left(1 – \frac{1}{n}\right) + 1$$

This is a geometric progession and we can explicitly calculate:
$$E[T_k] = \sum \limits_{i = 0}^{k – 1} \left(1 – \frac{1}{n}\right)^i = n\left(1 – \left(1 – \frac{1}{n}\right)^k\right)$$

Specifically for $k = n$ this yields
$$E[T_n] = n\left(1 – \left(1 – \frac{1}{n}\right)^n\right) \approx n(1 – e^{-1}) \approx \frac{2}{3} n.$$

For $x\in S$ define $A_x$ to be the event that $x\in T$.
Then $|T|=\sum_{x\in S} 1_{A_x}$ which has expected value
$$\mathbb{E}(X)=\sum_{x\in S} \mathbb{P}(A_x)=n\left[1-\left(1-{1\over n}\right)^n\right].$$

This question can also be answered by total enumeration using Stirling
numbers of the second kind. Observe that the expectation is given by

$$\mathrm{E}[X] = n^{-n} \sum_{k=0}^n k \times {n\choose k}
k! {n\brace k}.$$

Here we multiply the value of the random variable by the probability,
which is obtained from the fact that a sample with $k$ different
entries needs a choice of these entries from the $n$ possibles and a
partition of the $n$ slots into $k$ sets whose elements receive the
same value. All $k!$ permutations of the $k$ chosen values result in
a valid unique assignment.

We have
$${n\brace k} = n! [z^n] \frac{(\exp(z)-1)^k}{k!}$$

by the species equation for set partititions
$$\mathfrak{P}(\mathcal{U}\mathfrak{P}_{\ge 1}(\mathcal{Z}))$$

which yields for the expectation
$$n^{-n} n! [z^n]
\sum_{k=0}^n k \times {n\choose k}
(\exp(z)-1)^k
\\ = n^{-n} n! [z^n] n
\sum_{k=1}^n {n-1\choose k-1}
(\exp(z)-1)^k
\\ = n^{-n} n! [z^n] n (\exp(z)-1)
\sum_{k=0}^{n-1} {n-1\choose k}
(\exp(z)-1)^k
\\ = n^{-n} n! [z^n] n (\exp(z)-1) \exp((n-1)z)
= n n^{-n} n! [z^n] (\exp(nz)-\exp((n-1)z))
\\ = n n^{-n} (n^n – (n-1)^n)
\\ = n \left(1 – \left(1-\frac{1}{n}\right)^n\right)
\sim n \left(1 – \frac{1}{e}\right).$$

Now we can also compute higher moments with this,
the variance for example.

We first compute the expectation of $X(X-1)$, getting
$$\mathrm{E}[X(X-1)] =
n^{-n} \sum_{k=0}^n k (k-1) \times {n\choose k}
k! {n\brace k}
\\ = n^{-n} n! [z^n] n (n-1)
\sum_{k=2}^n {n-2\choose k-2}
(\exp(z)-1)^k
\\ = n^{-n} n! [z^n] n (n-1) (\exp(z)-1)^2
\sum_{k=0}^{n-2} {n-2\choose k}
(\exp(z)-1)^k
\\ = n^{-n} n! [z^n] n (n-1) (\exp(z)-1)^2 \exp((n-2)z).$$

Extracting coefficients we obtain
$$n (n-1) n^{-n} (n^n – 2(n-1)^n + (n-2)^n)
= n(n-1) \left(1-2\left(1-\frac{1}{n}\right)^n
+\left(1-\frac{2}{n}\right)^n\right)
\\ \sim
n(n-1) \left(1-\frac{2}{e}+\frac{1}{e^2}\right).$$

We thus get for the variance
$$\mathrm{Var}[X] = \mathrm{E}[X^2] – \mathrm{E}[X]^2
\\ = n(n-1) \left(1-2\left(1-\frac{1}{n}\right)^n
+\left(1-\frac{2}{n}\right)^n\right)
\\ + n\left(1-\left(1-\frac{1}{n}\right)^n\right)
– n^2\left(1-\left(1-\frac{1}{n}\right)^n\right)^2
\\ = n^2 \left(\left(1-\frac{2}{n}\right)^n
– \left(1-\frac{1}{n}\right)^{2n}\right)
\\ + n \left(\left(1-\frac{1}{n}\right)^n
– \left(1-\frac{2}{n}\right)^n \right).$$

The asymptotic expansion then yields
$$\mathrm{Var}[X] \sim n \left(\frac{1}{e}-\frac{2}{e^2}\right).$$

Addendum. We need the second term rather than the constant term in
the asymptotic expansion of the factor on $n^2.$

We get for the first term in the difference
$$\exp\left(n\log\left(1-\frac{2}{n}\right)\right)
= \exp\left(-\sum_{q\ge 1} \frac{2^q}{n^{q-1} q}\right)
\\ = \sum_{k\ge 0} \frac{(-1)^k}{k!}
\left(\sum_{q\ge 1} \frac{2^q}{n^{q-1} q}\right)^k.$$

This yields for the constant term
$$\sum_{k\ge 0} \frac{(-1)^k}{k!} 2^k = \frac{1}{e^2}.$$

We get for the next term
$$\frac{1}{n}
\sum_{k\ge 0} \frac{(-1)^k}{k!}
{k\choose 1} 2\times 2^{k-1}
= -\frac{2}{n}
\sum_{k\ge 1} \frac{(-1)^{k-1}}{(k-1)!} 2^{k-1}
= -\frac{2}{n} \frac{1}{e^2}.$$

For the second term in the difference we have
$$\exp\left(2n\log\left(1-\frac{1}{n}\right)\right)
= \exp\left(-2\sum_{q\ge 1} \frac{1}{n^{q-1} q}\right)
\\ = \sum_{k\ge 0} \frac{(-2)^k}{k!}
\left(\sum_{q\ge 1} \frac{1}{n^{q-1} q}\right)^k.$$

This time we get for the constant term
$$\sum_{k\ge 0} \frac{(-2)^k}{k!} = \frac{1}{e^2}$$

and for the next term
$$\frac{1}{n}
\sum_{k\ge 0} \frac{(-2)^k}{k!}
{k\choose 1} \frac{1}{2} \times 1^{k-1}
= -\frac{1}{n}
\sum_{k\ge 1} \frac{(-2)^{k-1}}{(k-1)!}
= -\frac{1}{n} \frac{1}{e^2}.$$

The difference then yields for the variance
$$n^2 \left(-\frac{1}{n} \frac{1}{e^2}\right)
+ n\left(\frac{1}{e}-\frac{1}{e^2}\right)$$
which is the desired result.

We can count the number of arrangements of $n$ items picked from a pool of $n$ items that have $d$ distinct items.

Let $S(i)$ be the collection of arrangements missing item $i$. Let $N(j)$ be the sum of the sizes of all intersections of $j$ of the $S(i)$.

Since we can choose $\binom{n}{j}$ items to leave out and for each of those choices there are $(n-j)^n$ arrangements of those items, we have
$$
N(j)=\binom{n}{j}(n-j)^n\tag{1}
$$
Since the number of arrangements with $d$ distinct items is the number of arrangements missing exactly $n-d$ items, according to the Generalized Inclusion-Exclusion Principle, the number of arrangements with exactly $d$ distinct items is
$$
\sum_{j=n-d}^n(-1)^{j-n+d}\binom{j}{n-d}N(j)
=\sum_{j=n-d}^n(-1)^{j-n+d}\binom{j}{n-d}\binom{n}{j}(n-j)^n\tag{2}
$$
To get the total number of arrangements, sum $(2)$ in $d$:
$$
\begin{align}
\sum_{d=0}^n\sum_{j=n-d}^n(-1)^{j-n+d}\binom{j}{n-d}\binom{n}{j}(n-j)^n
&=\sum_{j=0}^n\sum_{d=n-j}^n(-1)^{j-n+d}\binom{j}{n-d}\binom{n}{j}(n-j)^n\\
&=\sum_{j=0}^n[j=0]\binom{n}{j}(n-j)^n\\[9pt]
&=n^n\tag{3}
\end{align}
$$
as expected.

To get the total number of arrangements times their size, multiply by $d$ and sum in $d$:
$$
\begin{align}
\sum_{d=0}^n\sum_{j=n-d}^n(-1)^{j-n+d}\binom{j}{n-d}\binom{n}{j}(n-j)^nd
&=\sum_{j=0}^n\sum_{d=n-j}^n(-1)^{j-n+d}\color{#00A000}{\binom{j}{n-d}}\binom{n}{j}(n-j)^n(\color{#C00000}{n}-\color{#00A000}{(n-d)})\\
&=\color{#C00000}{n\,n^n}-\sum_{j=0}^n\sum_{d=n-j}^n(-1)^{j-n+d}\color{#00A000}{\binom{j-1}{n-d-1}}\binom{n}{j}(n-j)^n\color{#00A000}{j}\\
&=n\,n^n-\sum_{j=0}^n[j=1]\binom{n}{j}(n-j)^nj\\[9pt]
&=n\,n^n-n(n-1)^n\tag{4}
\end{align}
$$
Therefore, the expected number of distinct items picked would be
$$
n\left[1-\left(1-\frac1n\right)^n\right]\sim n\left(1-\frac1e\right)\tag{5}
$$