# If I randomly generate a string of length N from an alphabet {A, B, C}, what's the likelihood that exactly k characters will be the same?

• I have an alphabet: {A, B, C}.
• I’m randomly generating strings of length N from that alphabet.

Examples: Examples: N=5, AACBC, AAAAA, BBCAA

• What is the likelihood that exactly k characters of that string are the same? (k <= N)
(k corresponds to the maximum number of similar characters…
Example: With string AABCAAA: N=7, k=5 because there are 5 A’s.
String AABBCC: N=6, k=2 because there are equally-sized groups of A’s, B’s, and C’s.)

Initially, my solution looked like this:
P(k characters are the same) = $(\frac{1}{3})^k * (\frac{2}{3})^{n-k}$
Until I realized that this solution wasn’t robust enough– it doesn’t matter WHICH characters are the same, only that k characters are the same.

#### Solutions Collecting From Web of "If I randomly generate a string of length N from an alphabet {A, B, C}, what's the likelihood that exactly k characters will be the same?"

Let k, l, m be the # of occurrences for different letters in decreasing order.

Using the multinomial distribution formula, to illustrate,

for string AAABBC, Pr = $\frac{6!}{3!2!1!}\cdot\left(\frac{1}{3}\right)^6$

and for string AABBCC, Pr = $\frac{6!}{2!2!2!}\cdot\left(\frac{1}{3}\right)^6$

In general, Pr = $\frac{N!}{k!l!m!}\cdot\left(\frac{1}{3}\right)^N$, k+l+m = N

continued…

I have taken that you want probabilities for particular values of k. For k = 3 and N = 6, for example, you will need to sum up probabilities for permutations of 3-3-0 (3#s) & 3-2-1 (6#s)

edited by the questioner…
The final solution comes down to this.
Imagine that we have three bins, each representing the number of times each character appears in a string. For AAABC, the bins would be {A:3, B:1, C:1}
For:

• N = the length of the string,
• k = the maximum bin value (there can be ties),
• l = the next bin value,
• m = the last bin value,
• d = the number of bin values that are different from k’s bin value (max 2)
• Examples:
1. ABC: bins = {A: 1, B: 1, C: 1}. d = 0
2. AAA: bins = {A: 3, B: 0, C: 0}. d = 1
3. AAC: bins = {A: 2, B: 0, C: 1}. d = 2
• C = the number of letters in our alphabet (always 3),

Pr = $\frac{N!}{k!l!m!}\cdot\left(\frac{1}{3}\right)^N \cdot\frac{C!}{(C – d)!}$, k+l+m = N