# The myth of no prime formula?

Terence Tao claims:

For instance, we have an exact formula for the $n^\text{th}$ square
number – it is $n^2$ – but we do not have a (useful) exact
formula for the $n^\text{th}$ prime number $p_n$!
“God may not play dice with the universe, but something
strange is going on with the prime numbers.” (Paul Erdős,
1913–1996)

However there exist an exact formula for the nth prime

A double sum for the nth prime pn is $$p_n=1+\sum_{k=1}^{2(\lfloor n\ln n\rfloor+1)}\Biggl[1-\Biggl\lfloor\frac{\sum_{j=2}^k 1+\lfloor s(j)\rfloor}n\Biggr\rfloor\Biggr],\tag{13}$$ where $$s(j)\equiv-\frac{\sum_{s=1}^j \bigl(\bigl\lfloor\frac js\bigr\rfloor-\bigl\lfloor\frac{j-1}s\bigr\rfloor\bigr)-2}j\tag{14}$$

(See Prime formulas at MathWorld.) There exist even an exact formula for prime counting function $\pi (x)$. So why are mathematicians trying to prove the Riemann hypothesis to find a better estimate for $\pi(x)$ and $p_n$ when they have those exact formulas?

#### Solutions Collecting From Web of "The myth of no prime formula?"

If there is a mathematical body of knowledge that allows you to manipulate formulas (13) and (14) into a better form, then great! The Riemann zeta function has already proved its worth in this field. It was instrumental in the proofs by Hadamard and de la Vallée-Poussin of the Prime Number Theorem. What allowed that to happen was that the zeta function is a meromorphic function, and so complex variable theory could be applied to its study. One never knows with certainty whether further study of the zeta function will pay off, but people are betting that it will, based on past successes.

I don’t know whether anyone is promoting (13) and (14) as useful tools in the study of prime numbers, but if so, the burden is on those people to exhibit mathematical tools that allow one to manipulate (13) and (14) effectively.

Because of all the floor functions, (13) and (14) don’t have very nice analytical properties. As other posters have pointed out, they encode an elementary algorithm for identifying primes. In particular, the expression
$$\left\lfloor\frac{j}{s}\right\rfloor-\left\lfloor\frac{j-1}{s}\right\rfloor$$
that appears in the numerator of (14) equals $1$ if $s$ evenly divides $j$ and equals $0$ if it does not. Summing this quantity for $s$ from $1$ to $j$ gives you the number of divisors of $j.$ Prime numbers have exactly two divisors; composite numbers all have at least three. The numerator in (14) is therefore $0$ when $j$ is prime, and positive when it is composite. Dividing by $j$ and multiplying by $-1$ results in the quantity $s(j),$ which is $0$ when $j$ is prime and lies in the interval $(-1,0)$ when $j$ is composite. The floor of $s(j$) therefore equals $0$ when $j$ is prime and $-1$ when $j$ is composite. Adding $1$ to $\lfloor s(j)\rfloor$ gives the characteristic function of the primes.

Notice that using the formula (14) requires one to do more divisions than even a naive brute-force algorithm to compute the number of divisors would require. That price might be worth paying if the formula had nice mathematical properties that allowed one to extract additional information from it, but I don’t see that this has been demonstrated. To me, it looks like a cumbersome arithmetic encoding of an inefficient algorithm. I would be happy to be proved wrong about this!

Formula (13) likewise looks like a cumbersome arithmetic encoding of a brute-force method for computing the $n^\text{th}$ prime, and seems likely to require more calculation than state-of-the-art algorithms. Again, it would be nice if someone could show how to extract useful information from the formula, but it doesn’t look very promising.

Added: I think it worth pointing out that some non-trivial analytic number theory enters into formula (13) in the choice of upper bound on the summation over $k.$ You need to ensure that the $p_n$ lies in the range of summation, but that $p_{2n}$ does not. The reason is that the summand equals $1$ when $k<p_n,$ equals $0$ for $p_n\le k<p_{2n},$ and becomes negative for $k\ge p_{2n}.$ So you need the summation over $k$ to include all $p_n-1$ terms where the summand equals $1$, but none of the terms where it is negative. This will ensure that the sum evaluates to $p_n-1.$

It appears to me that you need something like (3.12) and (3.13) in this paper of Rosser and Schoenfeld to justify the upper bound. Formula (3.12) is Rosser’s Theorem which Dietrich Burde’s answer to this post suggests requires deep knowledge of zeta function zeros. Interestingly, the author of your formulas (13) and (14) (Sebastián Martín Ruiz) does not rigorously justify the upper bound in his article, giving only a heuristic argument and and stating that it is “very probable” that the necessary inequalities are satisfied.

Perhaps one could play with the constant in front of $\lfloor n\log n\rfloor$ and try to use Chebyshev’s bounds (Theorem 3.1 here) instead of Rosser’s Theorem. (The derivation of Chebyshev’s bounds doesn’t require knowledge of zeta function zeros.) But this seems possibly rather delicate, and may not work.

Second addition: Some discussion of the formula, and Ruiz’s view of it, can be found here.

Mathematicians are not really interested in particular values of the counting function. (They already know tons of them, obtained by running efficient sieves.)

What they are after is insight on its asymptotic behavior (how precisely does it vary as $x$ grows). Better, they would like to understand the intriguing irregularity in the distribution and put some order into it.

The given formula, as well as many other similar ones, is of no use for computing the function values (it is far too inefficient) and does not bring any insight (it is just a rewrite of a trivial algorithm).

The relation to the zeroes of the Zeta function is much deeper.

Usually when someone says “we have a formula for $f(x)$”, they mean that we have a way to compute $f$ faster than it’s direct definition (or by the definition, if that is fast enough).

For example, the function $f(x) = x^2$ can be computed by if $x$ is given in binary (or any radix) by the way you probably learned in elementary school. On the other hand, if you were asked to factor a large number, you would quickly learn that there is no known “formula”, in other words, nothing is much faster than simply checking all numbers (the fastest known approaches are about as fast as checking the factors on a number with 1/3 the number of digits IIRC).

Formulas for primes is a lovely 1983 paper by Underwood Dudley in Mathematics Magazine. It explains how to cook up impressive-looking but trivial/useless formulas for the $n$th prime or the prime-counting function, and gives a smattering of references to unfortunate cases in the literature of such formulas being published.

Those formulae don’t help us prove anything of use. For example, can you use those formulae to derive Dirichlet’s theorem (that every arithmetic sequence $ax+b$ where $\gcd(a,b)=1$ has an infinite number of primes)*? No. Those formulae don’t make that easier at all.

*It’s a very hard theorem to prove.

There are also plenty of conjectures (such as the Bunyakovsky conjecture) that are currently unsolved, and those formulae don’t make it any easier for us to solve them.

If this formula could say something about primes, for example how to factor sums of primes, it would be interesting.

The emphasis in Tao’s statement should be on useful: the formula you quoted is neither computationally fast nor gives any theoretical insight, it is simply a fancy rearrangement of a well-known fact about primes (Wilson’s theorem, I believe in this case). The Riemann Hypothesis on the other hand allows a much deeper understanding of the primes, and the hope is that a proof would yield new tools and approaches beyond the mere knowledge of its truth.

Hardy and Wright in “An Introduction to the Theory of Numbers”, give a short and simple formula in Theorem 419 for the n-th prime however they explain that to use the formula it is necessary to know the first n primes, which defeats the purpose of the formula.

They say that if the value of the inputs to their formula could be expressed independently of the primes then its status would be different, however, “There seems to be no likelihood of this, but it cannot be ruled out as entirely impossible”.