Intereting Posts

Definite integration of a high order exponential function mixed with rational function
Prove that the polynomial $f_n(x)=nx^{n+1}-(n+1)x^n+1$ is divisible by $(x-1)^2$
What are examples of parallelizable complex projective varieties?
Decomposition of a finite measure on the sum of an atomless measure and a purely atomic measure
How find limit $\displaystyle \lim_{n\to\infty}n\left(1-\tfrac{\ln n}{n}\right)^n$
“Well defined” function – What does it mean?
The properties of integral
Matrix determinant lemma with adjugate matrix
The weight of Sorgenfrey line
Using the Uniform Continuity of the Characteristic Function to Show it's Differentiable
Calculate the height of a building
Does “This is a lie” prove the insufficiency of binary logic?
Convert from Nested Square Roots to Sum of Square Roots
Shortest ternary string containing all ternary strings of length 3?
Is there a size of rectangle that retains its ratio when it's folded in half?

Suppose that there is a certain collected works of plays that is N symbols long in the following sense: a “symbol” is one of the 26 letters of the alphabet, a line break, period, space, or a colon; in other words there are 30 possible symbols.

If “a monkey” randomly “types” 1 of these 30 symbols at a rate of one per second, how long will it take M monkeys working at this rate, on average, for one of them to randomly write this specific N symbol long collected works?

- Probability Bertsekas Question
- Understanding Borel sets
- $n$ points are picked uniformly and independently on the unit circle. What is the probability the convex hull does not contain the origin?
- probability question related to pattern in coin tossing
- Solitaire probability
- How to explain why the probability of a continuous random variable at a specific value is 0?

For clarity let me state that I am assuming each monkey ceaselessly types random symbols at this rate, and unless a monkey immediately types the right things, the collected works will be preceded by gibberish.

- Good books on “advanced” probabilities
- Coupon Problem generalized, or Birthday problem backward.
- Past coin tosses affect the latest one if you know about them?
- Prove that the maximum of $n$ independent standard normal random variables, is asymptotically equivalent to $\sqrt{2\log n}$ almost surely.
- Why is the probability of a continuous variable taking a particular value zero? Explain only logically
- Why the principal components correspond to the eigenvalues?
- Inequality with Expectations
- Probability of number of unique numbers in $37$ Roulette Wheel spins.

We will estimate the probability for a “generic” string. The number of occurrences of the string in any given monkey’s output is roughly distributed Poisson with $\lambda = 30^{-N}$. The time until the first event happens is thus roughly distributed exponentially with rate $\lambda = 30^{-N}$. The minimum of $M$ such processes is also distributed exponentially with rate $\lambda = M/30^N$. Thus the expected time is roughly $30^N/M$.

The same estimate can be obtained if we calculate the expected number of appearances. The expected number of appearances in any given monkey’s stream for the first $t+M-1$ characters is $t/30^N$. For $M$ monkeys, it is $tM/30^N$. This is $1$ for $t = 30^N/M$, and this gives a rough estimate for the actual expectation.

In fact, assuming that the string “doesn’t overlap with itself”, we can get an exact expression for the expectation (depending only on $N$ and $M$) using Theorem 1.1 in “String overlaps, pattern matching, and nontransitive games” (Guibas and Odlyzko ’81), which gives a generating function for the probability that a given monkey is not done after $t$ steps.

The paper also gives an expression for “non-generic” strings and for multiple strings, but the collected works are not going to overlap themselves; even if they do, it will probably have only a slight effect on the probabilities.

Unfortunately, an exact answer will depend on the specific sequence of $N$ symbols. To see why, take a simple example in which you only have two possible symbols, A and B, with $N = 2$, and only one monkey. Compare the expected times until the monkey types the sequence AA vs. the sequence BA. Unless the monkey types AA at the very beginning (with probability $\frac{1}{4}$), the first time AA appears the BA sequence *must* have already occurred. So the latter sequence will have the shorter expected time.

There are lots of these sorts of counterintuitive results about the likelihood of seeing a certain sequence before another certain sequence and about the expected time until a sequence is first seen. For more information, see MathWorld’s entry on Coin Tossing, or, for even more information, the article “Penney Ante” that appeared in the *UMAP Journal*.

Now, if $N$ is large and the symbols are assumed to be evenly distributed in the target sequence, maybe you can avoid the problems with the two-symbol and unevenly distributed example I gave here. Or, if you force each monkey to stop after typing exactly $N$ symbols, check to see if they have typed the target sequence, and then have them start over again from scratch if they have not, then you can definitely avoid these problems. (For more on the latter, see this article “Infinite Monkey Business”.)

- Criterion for locally free modules of rank $1$
- How to integrate $\frac {\cos (7x)-\cos (8x)}{1+2\cos (5x)} $ ?
- When is a rational number a sum of three squares?
- Prove that T is compact
- Clifford Algebra Isomorphic to Exterior Algebra
- What is the difference between the sphere and projective space?
- Linear algebra proof regarding matrices
- Show that there exist continuous functions $g,h:\rightarrow \mathbb{R}$
- Is this variant of the Jordan Curve Theorem true?
- Cesaro summable implies Abel summable
- Simple binomial theorem proof: $\sum_{j=0}^{k} \binom{a+j}j = \binom{a+k+1}k$
- Techniques to find regression parameters for multiple datasets where a subset of parameters should be the same for all datasets
- N circles in the plane
- Using gauss's lemma to find $(\frac{n}{p})$ (Legendre Symbol)
- Generalisation of Dominated Convergence Theorem