Intereting Posts

What is the difference between minimum and infimum?
Two Circles and Tangents from Their Centers Problem
If $\left| f'(x) \right| \leq A |f(x)|^\beta $ then f is a constant function
Find the sum of the series $\sum \frac{1}{n(n+1)(n+2)}$
2D walks on a square grid; The number of Paths leading to specific $(X,Y)$
Significance of starting the Fibonacci sequence with 0, 1…
Bounds on off-diagonal entries of a correlation matrix
Small cycles in an undirected graph
proving the function $\frac{1}{1+x^2}$ is analytic
Given a matrix $A$ with a known Jordan decomposition, what is the Jordan decomposition of $A^2+A+I$?
Does $a^3 + 2b^3 + 4c^3 = 6abc$ have solutions in $\mathbb{Q}$
Do probability measures have to be the same if they agree on a generator of Borel $\sigma$–algebra $\mathcal{B}(\mathbb{R})$?
Are there any interesting semigroups that aren't monoids?
How can one prove that $e<\pi$?
How to understand why $x^0 = 1$, where $x$ is any real number?

I have random (uniform distribution) stream of bytes (integers from $0$ upto $255$). What probability $p(n)$ to find the sequence “Rar!” (ASCII) at a position $\le n$ from the start of the stream?

- finding the probability that Alice has at least one class each day
- How to calculate the probability of rolling 6 at least 5 times in a row, out of 50 tries?
- Should you ever stop rolling, THE SEQUEL
- Gaussian integral evaluation
- Probability, integers and reals (soft question)
- What is the PDF of random variable Z=XY?
- X,Y are independent exponentially distributed then what is the distribution of X/(X+Y)
- What is the new probability density function by generating a random number by taking the reciprocal of a uniformly random number between 0 and 1?
- The “true” domain of random variables
- Alternative Expected Value Proof

There’s a simple approximate answer and a more complicated exact answer.

For the approximate answer, we can neglect the fact that the probability of finding the string at some point isn’t exactly independent of the probability of having found it before that point. In this approximation, at every byte beginning with the fourth you have a $(1/256)^4$ chance of encountering your string, so the probability of having found it by $n$-th byte is approximately

$$1-\left(1-\left(\frac1{256}\right)^4\right)^{n-3}\;.$$

for $n\ge4$. (Of course it’s exactly $0$ for $n\lt4$.)

To get the exact answer, you can keep track of your state according to the number of consecutive matches you’ve just encountered. You start in state $0$. The probability to move to state $n+1$ from state $n$ is $1/256$, and the probability to return to state $0$ is $255/256$. Once you get to state $4$ you stay there. Thus you have a Markov process with transition matrix

$$\pmatrix{\frac{255}{256}&\frac1{256}&0&0&0\\\frac{255}{256}&0&\frac{1}{256}&0&0\\\frac{255}{256}&0&0&\frac{1}{256}&0\\\frac{255}{256}&0&0&0&\frac{1}{256}\\0&0&0&0&1}\;.$$

The initial probability distribution is $1$ in state $0$ and $0$ everywhere else, i.e. $(1,0,0,0,0)$. In each step, the transition matrix is applied to this vector from the right, so after $n$ steps the distribution is the initial distribution times the $n$-th power of the transition matrix. The powers of a matrix can be expressed succinctly using its eigenvectors and eigenvalues. Wolfram|Alpha will calculate the eigensystem of the matrix for you. (I don’t know how to get it to calculate the left eigenvectors, so I transposed the matrix.) Note that three eigenvalues are very small, so the corresponding components of the distribution will decay very quickly and will be negligible in practice. The initial distribution is approximately given by $\nu_1-\nu_2$, so the distribution after the $n$-th step is approximately $\nu_1-\lambda_2^n\nu_2$ (since $\lambda_1$ is exactly $1$), so the probability you want is about $1-\lambda_2^n$. If you click on “More Digits”, you’ll find that $\lambda_2\approx0.9999999997681$ is very close to but not exactly equal to $1-(1/256)^4\approx0.9999999997672$, so the first approximation was quite a good one.

Consider the Markov chain with states corresponding to $0$,”R”,”Ra”,”Rar” and “Rar!” and transition matrix

$$ P = \pmatrix{\frac{255}{256} & \frac{1}{256} & 0 & 0 & 0\cr

\frac{254}{256} & \frac{1}{256} & \frac{1}{256} & 0 & 0\cr

\frac{254}{256} & \frac{1}{256} & 0 & \frac{1}{256} & 0 \cr

\frac{254}{256} & \frac{1}{256} & 0 & 0 & \frac{1}{256}\cr

0 & 0 & 0 & 0 & 1\cr} $$

Then what you’re looking for is $(P^n)_{15}$.

The solution will be of the form $p(n) = 1 + \sum_r c_r r^n$ where the sum is over the roots of the polynomial ${t}^{4}-t^3 + 1/256^4$, and the $c_r$ are chosen so that $p(0) = p(1) = p(2) = p(3) = 0$.

EDIT: using Maple’s **rsolve** to solve the recurrence $-{s}^{4}p \left( n \right) +{s}^{4}p \left( n+1 \right) +p \left( n+3

\right) -2\,p \left( n+4 \right) +p \left( n+5 \right) =0$ with $p(0)=p(1)=p(2)=p(3)=0, p(4) = s^4$ (where $s = 1/256$) I get

$$ p(n) = 1 + \sum_r \frac{1}{(4 r^3 s^4 – 1) r^{n+1}}$$

where the sum is over the roots $r$ of $s^4 z^4 – z + 1$. Three

of the roots are quite large (giving very rapidly-decaying terms),

one is very near 1: in fact it is

$$ r_4 = 1+s^4+4 s^8+22 s^{12}+140s^{16}+\ldots = {}_3F_2(1/4,1/2,3/4;\,2/3,4/3;\,{\frac {256}{27}}\,{s}^{4})$$.

which for $s = 1/256$ is approximately $1.00000000023283064387071006368$.

Thus for all but very small $n$, $p(n)$ is very close to

$$1- 1.00000000093132257613336155988\,{( 1.00000000023283064387071006368)}

^{-n- 1}$$

Set $p=1/256^4$. Then the inclusion-exclusion principle gives the exact formula

$$p(n)=\sum_{m \geq1}(-1)^{m+1}{n-3m\choose m} p^m.$$

Only finitely many terms in the sum are non-zero, but there are about $n/4$ of them,

so the full formula may not be too practical. On the other hand, partial sums alternate

between upper and lower bounds on the true value $p(n)$.

In practice, the Poisson approximation may be your best bet.

It is easy to understand and work with:

$$p(n)\approx 1-\exp(-np).$$

EDIT: As pointed out in the comments below, this answer is incorrect. However, it is a good approximation, to 16 or 17 decimal places (before exponentiation by n).

The probability of finding a given character at any position is $1/256 = 2^{-8}$, so the probability of finding “Rar!” at any position is $(2^{-8})^4 = 2^{-32}.$

The (unconditional) probability of finding “Rar!” at position 0 is $2^{-32}$.

The (unconditional) probability of finding “Rar!” at position 1 is $2^{-32}$.

$\vdots$

The (unconditional) probability of finding “Rar!” at position n is $2^{-32}$.

These are all unconditional (i.e., independent) probabilities. But if “Rar!” were to occur at position $i$, it can’t occur at position $i+1$, $i+2$, or $i+3$, so we can’t really use these unconditional probabilities, and it’s actually kind of tricky to calculate directly the probability that “Rar!” will occur at least once in the first $n$ positions.

It’s much easier to calculate the probability that “Rar!” won’t occur in the first $n$ positions, because the act of “Rar!” not occurring does not affect the future probability of “Rar!” ocurring. The probability that it won’t occur in a particular position is $(1-2^{-32})$. The probability that it won’t occur in the first $n$ positions is $(1-2^{-32})^n$.

So, to answer your question, the probability that “Rar!” *will* occur in the first $n$ positions is

$$ p(n) = 1 – (1 – 2^{-32})^n.$$

Finally, I would just like to remark that although bytes range from 0 to 255, the ASCII encoding only ranges from 0 to 127, and a good chunk of those are control characters like “delete” or “start of header”, that don’t display an actual symbol/letter/digit. If it’s up to you, perhaps you could choose a narrower range, if you’re looking for strings in randomly generated text.

Happy number crunching!

- Prove that the equation $18x+42y=22$ has no integer solution?
- Compute $\int_0^{\pi/2}\frac{\sin 2013x }{\sin x} \ dx\space$
- If $\sin\theta+\sin\phi=a$ and $\cos\theta+ \cos\phi=b$, then find $\tan \frac{\theta-\phi}2$.
- Proof of $M$ Noetherian if and only if all submodules are finitely generated
- Gosper's unusual formula connecting $e$ and $\pi$
- On limits, schemes and Spec functor
- Hom-functor preserves pullbacks
- Finding equation of an ellipsoid
- Monotonicity in alternating Series
- How do I tell if matrices are similar?
- Finite Subgroups of GL(n,R)
- What is a maximal abelian extension of a number field and what does its Galois group look like?
- Inverse image of the sheaf associated to a module
- Algorithm for scrolling through different orbits in a permutation group
- Show that $\sqrt{2+\sqrt{2+\sqrt{2…}}}$ converges to 2