Number of occurrences of k consecutive 1's in a binary string of length n (containing only 1's and 0's)

Say a sequence $\{X_1, X_2,\ldots ,X_n\}$ is given, where $X_p$ is either one or zero ($0 < p < n$). How can I determine the number of strings, which do contain at least one occurrence of consequent $1$’s of length $k$ ($0 < k < n$).

For example, a string $\{1, 0, 1, 1, 1, 0\}$ is such a string for $n = 6$ and $k = 3$.

Here I have found an answer for arbitrary $n$ and $k = 2$, ($k = 1$ is trivial), but I need a more general answer for any natural number $k$ smaller than $n$.

Solutions Collecting From Web of "Number of occurrences of k consecutive 1's in a binary string of length n (containing only 1's and 0's)"

The answer you linked to for the case $k=2$ generalizes fairly easily:

Let $a_n$ count the number of strings of length $n$ with at least one occurrence of $k$ consecutive $1$’s, and let $b_n$ count the number of “bad” strings that have no substring of $k$ consecutive $1$’s. Clearly $a_n+b_n=2^n$, so it suffices to get a formula of some sort for $b_n$.

For $1\le m\le k-1$, we have $b_m=2^m$, since these strings are all too short to have any substring of length $k$, much less one consisting of all $1$’s. And $b_k=2^k-1$, since there’s just one string of $k$ $1$’s. For $n\gt k$, the first $0$ in a bad string must occur among the first $k$ bits. Consequently we get a $k$-term recurrence equation

$$b_n=b_{n-1}+b_{n-2}+\cdots+b_{n-k}$$

For example, if $k=4$, then the sequence $b_1,b_2,b_3,\ldots$ is

$$2,4,8,15,29,56,108,208,401,\ldots$$

and the sequence for $a_n=2^n-b^n$ is

$$0,0,0,1,3,8,20,48,111\ldots$$

In the $k=2$ case, the $2$-term recurrence gives the familiar Fibonacci numbers, for which there is a nice Binet-type formula involving $\phi=(1+\sqrt5)/2$. One can try for something similar for other values of $k$, but it involves finding the largest root of a polynomial of degree $k$, namely $x^k-x^{k-1}-x^{k-2}-\cdots-x-1$, for which there are no nice radical expressions. (Yes, yes, there do exist radical expressions for $k=3$ and $4$, but you tell me if you think they’re nice….)

This problem also goes under the name of ” run of $k$ consecutive successes in $n$ Bernoulli trials” or shortly Bernoulli runs.
It applies to many technical fields, among others in Digital Transmission(“error bursts”), System Reliability
( “consecutive k-out-of-n:F systems”) and of course, in Queue Systems.
Because of those technical applications, I’ve been studying this subject for a while.
I will briefly summarize herewith the result directly concerning your question. If you are interested in studying the subject further you may start
from this paper by M. Muselli and this by S. Aki.

Consider a binary string with $s$ “$1$” and $m$ “$0$” in total. Let’s put an additional (dummy) fixed $0$ at the start and at the end of the string.
We individuate as a run the consecutive $1$’s between two zeros, thereby including runs of null length: with this scheme we have a fixed number of $m+1$ runs.

Bernoulli_Run

If we sequentially enumerate the length of each run so individuated, we construct a bijection with the number of ways of putting
$s$ (undistinguishable) balls into $m+1$ (distinguishable) bins.
Now consider the case in which runs have a max length of $r$ ones, or that the bins have a limited capacity of $r$ balls, or otherwise the
$$N_{\,b} (s,r,m+1) = \text{No}\text{. of solutions to}\;\left\{ \begin{gathered}
0 \leqslant \text{integer }x_{\,j} \leqslant r \hfill \\
x_{\,1} + x_{\,2} + \cdots + x_{\,m+1} = s \hfill \\
\end{gathered} \right.$$
which as explained in this other post is expressed as
$$
N_b (s,r,m + 1)\quad \left| {\;0 \leqslant \text{integers }s,m,r} \right.\quad = \sum\limits_{\left( {0\, \leqslant } \right)\,\,k\,\,\left( { \leqslant \,\frac{s}
{r}\, \leqslant \,m + 1} \right)} {\left( { – 1} \right)^k \left( \begin{gathered}
m + 1 \hfill \\
k \hfill \\
\end{gathered} \right)\left( \begin{gathered}
s + m – k\left( {r + 1} \right) \\
s – k\left( {r + 1} \right) \\
\end{gathered} \right)}
$$
whose generating function in $s$ is
$$
F_b (x,r,m + 1) = \sum\limits_{0\,\, \leqslant \,\,s\,\,\left( { \leqslant \,\,r\,\left( {m + 1} \right)} \right)} {N_b (s,r,m + 1)\;x^{\,s} } = \left( {\frac{{1 – x^{\,r + 1} }}
{{1 – x}}} \right)^{m + 1}
$$

Therefore the number of strings with $s$ “$1$” and $m$ “$0$”,
having at least one run of length $r$, and not longer, will be:
$$N_b(s,r,m+1)-N_b(s,r-1,m+1)$$
and those with exactly $q$ runs of length $r$, and none longer, will be:
$$
\begin{gathered}
N_s (s,r,m + 1,q) = \quad \left| {\;\text{integer }s,r,m,q \geqslant 0} \right. \hfill \\
= \left[ {0 = r} \right]\left[ {0 = s} \right]\left[ {m + 1 = q} \right] + \left( \begin{gathered}
m + 1 \\
q \\
\end{gathered} \right)N(s – q\,r_\, ,r – 1,m + 1 – q) = \hfill \\
= \sum\limits_{\left( {0\, \leqslant } \right)\,\,j\,\,\left( {\, \leqslant \,m + 1} \right)} {\left( { – 1} \right)^j \left( \begin{gathered}
m + 1 \\
q \\
\end{gathered} \right)\left( \begin{gathered}
m + 1 – q \\
j \\
\end{gathered} \right)\left( \begin{gathered}
s – q\,r + m – q – jr \\
s – q\,r – jr \\
\end{gathered} \right)} \hfill \\
\end{gathered}
$$

For example, given
$s=5,\;r=2,\;m=2,\;q=2$
we have $N_s=3$, corresponding to the three strings
$1\;1\;0\;1\;1\;0\;1$
$1\;1\;0\;1\;0\;1\;1$
$1\;0\;1\;1\;0\;1\;1$

Finally, to connect to true blue anil‘s aswer, note that $N_b$ obeys to the recurrence
$$
N_{\,b} (s,r,m + 1) = \sum\limits_{i\, = \,0}^r {N_{\,b} (s – i,r,m)}
$$

One systematic way(not very cool to do it by hand tho) to do it is using automatons and the Chomsky-Schûtzenberger theorem in the following way.
Case $k = 2$:
The automata that accepts your language(namely $F = \{x\in \{0,1\}^*:\underbrace{11\cdots 11}_{\text{$k$ times}}\in Sub(x)\}$) is described by the image below($S_0$ is the initial state, $S_2$ is the final state and you can just reach that state if you have read $11$ as substring.). From there, by the C-S theorem you have the following set of equations( the equations relate the transition of the automaton, for example, if you see in $S_1$ there is one arrow going out to $S_0$ and the other one to $S_2$ and $x$ measures the number of letters of the transition) $$S_0 = xS_0+xS_1$$$$S_1 = xS_2+xS_0$$$$S_2 = 1+2xS_2,$$
and you want to recover $S_0$ as a power series. Solving for $S_2,$ we get $S_2=\frac{1}{1-2x},$ so $S_1 = xS_0+\frac{x}{1-2x}$ and finally $S_0 = xS_0+x(xS_0+\frac{x}{1-2x})=xS_0+x^2S_0+\frac{x^2}{1-2x}$ which implies $S_0(1-x-x^2)=\frac{x^2}{1-2x},$ so $S_0 = \frac{x^2}{(1-2x)(1-x-x^2)}.$ To recover the numbers you have can do partial fractions and you will end up with $S_0=\frac{1}{1-2x}-\frac{x+1}{1-x-x^2}$ which agrees with the answer in the post you have linked.

enter image description here
General Case:
In the general case you have $k+1$ states, the initial one, say $S_0 = xS_0+xS_1,$ the intermediate states i.e., $1\leq j<k$ $S_j = xS_{j+1}+xS_0$ and the final state $S_k = 1+2xS_k,$ from this $k+1$ equations, you can deduce, first that $S_k = \frac{1}{1-2x}$ and that $$S_{k-1} =xS_k+xS_0=\frac{x}{1-2x}+xS_0 $$
$$S_{k-2} =xS_{k-1}+xS_0=\frac{x^2}{1-2x}+x^2S_0+xS_0,$$
$$\vdots$$
$$S_{k-j}=\frac{x^j}{1-2x}+S_0\sum _{i=1}^{j}x^i=\frac{x^j}{1-2x}+S_0(\frac{1-x^{j+1}}{1-x}-1),$$
and so $$S_1 = S_{k-(k-1)}=\frac{x^{k-1}}{1-2x}+S_0(\frac{1-x^{k}}{1-x}-1),$$ therefore $$S_0 = xS_0+x(\frac{x^{k-1}}{1-2x}+S_0(\frac{1-x^{k}}{1-x}-1))=xS_0+\frac{x^{k}}{1-2x}+S_0(\frac{x^2-x^{k+1}}{1-x}),$$
concluding $$S_0(1-x-\frac{x^2-x^{k+1}}{1-x})=\frac{x^k}{1-2x},$$
So $$S_0=\frac{x^k}{1-2x}(\frac{(1-2x+x^2-x^2+x^{k+1})}{1-x})^{-1}=\frac{x^k(1-x)}{(1-2x)(1-2x+x^{k+1})}.$$

You can extract the numbers from there by saying that $$\frac{x^k(1-x)}{(1-2x)(1-2x+x^{k+1})}=\sum _{i=0}^{\infty}A_ix^i,$$
where $A_i = |\{x\in \{0,1\}^i:\underbrace{11\cdots 11}_{\text{$k$ times}}\in Sub(x)\}|$

Here is an approach based upon generating functions. We start considering words with no consecutive equal characters at all.

These words are called Smirnov words or Carlitz words. (See example III.24 Smirnov words from Analytic Combinatorics by Philippe Flajolet and Robert Sedgewick for more information.)

A generating function for the number of Smirnov words over a binary alphabet is given by
\begin{align*}
\left(1-\frac{2z}{1+z}\right)^{-1}\tag{1}
\end{align*}

Generating function: $G_k(z)$

In order to set up for the binary strings we are looking for, we replace occurrences of $1$ in a Smirnov word by runs of $1$ with length up to $k-1$ assuming $k\geq 2$. This corresponds to a substitution
\begin{align*}
z&\longrightarrow z+z^2+\cdots+z^{k-1}=\frac{z\left(1-z^{k-1}\right)}{1-z}\tag{2}\\
\end{align*}
Since there are no restrictions to $0$ at all, we replace occurrences of $0$ in a Smirnov word by any runs of $0$’s with length $\geq 1$.
\begin{align*}
z&\longrightarrow z+z^2+z^3+\cdots=\frac{z}{1-z}\tag{3}\\
\end{align*}

We obtain by substituting (2) and (3) in (1) a generating function $A_k(z)$
\begin{align*}
A_k(z)&=\left(1-\frac{\frac{z}{1-z}}{1+\frac{z}{1-zt}}-\frac{\frac{z\left(1-z^{k-1}\right)}{1-z}}{1+\frac{z\left(1-z^{k-1}\right)}{1-z}}\right)^{-1}\\
&=\frac{1-z^k}{1-(t+1)z+tz^{k+1}}\tag{4}\\
\end{align*}
counting all binary words having no 1-runs of length $k$. To obtain a generating function which counts all binary words having at least one 1-run of length $k$, we take the generating function
\begin{align*}
\frac{1}{1-2z}=1+2z+4z^2+8z^3+\cdots
\end{align*}
which counts all binary words and subtract $A_k(z)$ from it.

We conclude from (4) a generating function counting all binary words having at least one 1-run of length $k$ is $G_k(z)$ with
\begin{align*}
\color{blue}{G_k(z)}&\color{blue}{=\frac{1}{1-2z}-\frac{1-z^k}{1-2z+z^{k+1}}}\\
&\color{blue}{=\frac{(1-z)z^k}{(1-2z)(1-2z+z^{k+1]})}}
\end{align*}

Explicit formula:

We derive from $G_k(z)$ an explicit formula of the wanted numbers. Denoting with $[z^n]$ the coefficient of $z^n$ in a series we obtain using the geometric series expansion

\begin{align*}
[z^n]\frac{1}{1-2z+z^{k+1}}&=[z^n]\sum_{j=0}^\infty z^j(2-z^k)^j\tag{5}\\
&=\sum_{j=0}^n [z^{n-j}](2-z^k)^j\tag{6}\\
&=\sum_{j=0}^{\left\lfloor\frac{n}{k}\right\rfloor}[z^{kj}](2-z^k)^{n-kj}\tag{7}\\
&=\sum_{j=0}^{\left\lfloor\frac{n}{k}\right\rfloor}[z^{kj}]
\sum_{l=0}^{n-kj}\binom{n-kj}{l}(-z^k)^l2^{n-kj-l}\\
&=\sum_{j=0}^{\left\lfloor\frac{n}{k}\right\rfloor}\binom{n-kj}{j}(-1)^j2^{n-(k+1)j}\tag{8}
\end{align*}

Comment:

  • In (5) we use the binomial series expansion.

  • In (6) we use the linearity of the coefficient of operator and apply the rule $[z^{p-q}]F(z)=[z^p]z^qF(z)$. We also set the upper limit of the sum to $n$ since the exponent of $z^{n-j}$ is non-negative.

  • In (7) we change the order of summation $j\rightarrow n-j$ and respect that only multiples of $k$ give a contribution to the sum.

  • In (8) we select the coefficient of $z^{kj}$.

The sum in (8) gives the first part of the formula we need. We can use it to derive the second part

\begin{align*}
[z^n]\frac{z^k}{1-2z+z^{k+1}}&=[z^{n-k}]\frac{1}{1-2z+z^{k+1}}\\
&=\sum_{j=0}^{\left\lfloor\frac{n-k}{k}\right\rfloor}\binom{n-k(j+1)}{j}(-1)^j2^{n-k-(k+1)j}
\end{align*}

Finally putting all together we conclude the number of binary words of length $n$ containing 1-runs of length $k\geq 2$ is
\begin{align*}
\color{blue}{[z^n]G_k(z)=2^n-\sum_{j=0}^{\left\lfloor\frac{n}{k}\right\rfloor}
\left(\binom{n-kj}{j}-\frac{1}{2^k}\binom{n-k(j+1)}{j}\right)(-1)^j2^{n-(k+1)j}}
\end{align*}

We can also use $G_k(z)$ to derive a recurrence relation for the coefficients $[z^n]G_k(z)$. Due to the specific structure of
\begin{align*}
G_k(z)&=\frac{1}{1-2z}-A_k(z)\\
\end{align*}
it seems to be more convenient to derive a recurrence relation for the coefficients $a_n=[z^n]A_k(z)$ and subtract them from $2^n$.

Recurrence relation:

We obtain by coefficient comparison for $k\geq 2$
\begin{align*}
A_k(z)&=\frac{1-z^k}{1-2z+z^{k+1}}\\
\left(1-2z+z^{k+1}\right)A_k(z)&=1-z^k\\
\color{blue}{a_n-2a_{n-1}+a_{n-k-1}}&\color{blue}{=}
\color{blue}{\begin{cases}
1&\qquad n=0\\
-1&\qquad n=k\\
0&\qquad n \neq 0,k
\end{cases}}
\end{align*}
where we set $a_n=0$ if $n<0$.

Example: $k=2$

In case of $k=2$ we obtain

\begin{align*}
A_2(z)&=\frac{1-z^2}{1-2z+z^3}\\
&=1+2z+3z^2+5z^3+8z^4+13z^5+21z^6+\cdots
\end{align*}
and finally
\begin{align*}
G_2(z)&=\frac{1}{1-2z}-\frac{1-z^2}{1-2z+z^3}\\
&=z^2+3z^3+8z^4+19z^5+43z^6+94z^7+\cdots
\end{align*}

Let me take the case of $n= 6,k =3$

Let $a_n$ be a string of length w/o $111$, i.e. a bad string, then it can be
$a_{n-3}$ with $011$ added,
$a_{n-2}$ with $01$ added,
or $a_{n-1}$ with $0$ added,

so $a_n = a_{n-1} + a_{n-2} + a_{n-3},\;\; a_0 = 1, a_1, = 2, a_3=4$

You should now be able to easily find the number of good strings

Yet another recurrence-relation based explanation: let $S_n$ be the number of strings of length $n$ which have some run of $k$ consecutive 1s in them, and let $s$ be a string in $S_n$; suppose $s’$ is the string $s$ truncated by one (i.e., with its last character removed). Then either $s’\in S_{n-1}$, or we have that $s’$ ends with a run of exactly $(k-1)$
1s, with a 0 (or an empty string) before them; the remainder of the string (of length $n-(k+1)$) is then unconstrained except that it can’t have any such run in it. This gives a recurrence of the form $S_n = S_{n-1}+(2^{n-(k+1)}-S_{n-(k+1)})$; the starting conditions are then $S_i=0$ for $i\lt k$, $S_k=1$. From here you can build a generating function
$\mathcal{S}(x)=\sum_iS_ix^i$, find an equation for the generating function, and proceed from there.