I’m randomly generating strings of length N from that alphabet.
Examples: Examples: N=5, AACBC, AAAAA, BBCAA
What is the likelihood that exactly k characters of that string are the same? (k <= N)
(k corresponds to the maximum number of similar characters…
Example: With string AABCAAA: N=7, k=5 because there are 5 A’s.
String AABBCC: N=6, k=2 because there are equally-sized groups of A’s, B’s, and C’s.)
Initially, my solution looked like this:
P(k characters are the same) = $(\frac{1}{3})^k * (\frac{2}{3})^{n-k}$
Until I realized that this solution wasn’t robust enough– it doesn’t matter WHICH characters are the same, only that k characters are the same.
Thanks so much in advance for your help.
Let k, l, m be the # of occurrences for different letters in decreasing order.
Using the multinomial distribution formula, to illustrate,
for string AAABBC, Pr = $\frac{6!}{3!2!1!}\cdot\left(\frac{1}{3}\right)^6$
and for string AABBCC, Pr = $\frac{6!}{2!2!2!}\cdot\left(\frac{1}{3}\right)^6$
In general, Pr = $\frac{N!}{k!l!m!}\cdot\left(\frac{1}{3}\right)^N$, k+l+m = N
continued…
I have taken that you want probabilities for particular values of k. For k = 3 and N = 6, for example, you will need to sum up probabilities for permutations of 3-3-0 (3#s) & 3-2-1 (6#s)
edited by the questioner…
The final solution comes down to this.
Imagine that we have three bins, each representing the number of times each character appears in a string. For AAABC, the bins would be {A:3, B:1, C:1}
For:
Pr = $\frac{N!}{k!l!m!}\cdot\left(\frac{1}{3}\right)^N \cdot\frac{C!}{(C – d)!}$, k+l+m = N