# Conditions needed to approximate a Binomial distribution using a Normal distribution?

Okay. So, many sources state different conditions for approximating binomial using normal. Variations I’ve seen are as follows. Assuming that we have Bin(n,p):
1. np and n(1-p) are both large (say, 20 or bigger)
2. large n, p close to 0.5
3. large n, np>9(1-p)

My questions:
2. Is there a set of conditions which can summarise all 3 given above (if they are valid)?

Note: Before any of you say “First, what is your opinion about this?”, I personally don’t know much about these distributions since I’m studying stats at a basic level. I know quite a few distributions and a bit about pdf’s, cdf’s, mgf’s but not really that much more

#### Solutions Collecting From Web of "Conditions needed to approximate a Binomial distribution using a Normal distribution?"

The usual conditions are that $n\geq 50$ and both $np$ and $n(1-p)>5$.

Note that if $p$ is close to $0.5$ your third condition is inconsistent with your first condition.

The Berry-Esseen theorem says that the uniform CDF error for approximating a sum of $n$ iid variables with 3 moments is bounded by $\frac{C \rho}{\sigma^2 \sqrt{n}}$, where $\rho=E[|X_i-m|^3]$ and $C$ is a universal constant. We do not exactly know the best possible value of $C$ but we do know that $0.4<C<0.5$. For the binomial distribution this gives a bound of

$$\frac{1}{2} \frac{p^3(1-p)+p(1-p)^3}{p^{3/2} (1-p)^{3/2} n^{1/2}}.$$

This bound is not tight, as can be checked by some direct calculation, but it is not bad (it is more pessimistic than the true error by a factor of maybe 4 or so). Cancelling as is possible, you get a bound of

$$\frac{1}{2} \frac{p^2+q^2}{(npq)^{1/2}}$$

where $q=1-p$. The numerator there is between $1$ and $1/2$, so we can intuitively understand this bound

$$\frac{C}{(npq)^{1/2}}$$

where what I have said so far tells us only that $C$ can be chosen to be smaller than $1/2$, though in actuality for the binomial distribution it can be chosen to be a fair bit smaller than that.

So if, say, you want a uniform error of at most $0.01$ then it will be enough (by the above analysis) to have $n>\frac{10000}{16 pq}$. In actuality the cutoff is probably more like $n>\frac{100}{pq}$.

First, versions of all three of your conditions can be useful. Practically speaking, the main goal is to keep the approximating normal distribution from putting much of its probability below $0$ or above $n.$
This idea gives rise to your #3 as follows: To keep most of the normal probability above $0$, you want normal parameters $\mu$ and
$\sigma$ to satisfy
$$0 < \mu – 3\sigma = np – 3\sqrt{np(1-p)}$$
Which implies
$np > 3\sqrt{np(1-p)}$ or $np/(1-p) > 9.$ Some authors are happy enough
with $np/(1-p) > 5.$ In the best case, you can expect only about two-place accuracy when approximating binomial probabilities by normal ones.
Rhe stricter rule usually makes the approximation a little better. [A similar argument to ensure $n > \mu – 3\sigma$ gives the same kind
of bound: $n(1-p)/p > 9$ (or 5).]

The rule that the smaller of $np$ and $n(1-p)$ should exceed some number
is just a simplified version of the above where the denominator in the previous argument is set to 1. I have seen suggested bound as small as 5, but 20 will
usually give better results.

The rule that $p \approx 1/2$ just recognizes that the approximating normal
distribution is symmetrical, and the binomial is also symmetrical when $p=1/2.$
Saying that $n$ should be ‘large’ is always a safe suggestion.

Three further comments on normal approximation to the binomial are also relevant.

(a) It is important to use the continuity correction unless $n$ is several hundred.

(b) A Poisson approximation is often better than a normal approximation,
especially if $n$ is large and $p$ is small.

(c) Nowadays, modern software (R, SAS, Minitab, MatLab, etc.) provides the opportunity to get exact binomial probabilities. In practical situations
the normal approximation is usually avoided.

[But the approximation persists
in theoretical probability texts: It is a nice
illustration of the Central Limit Theorem. It makes doing
exercises possible by using the (still ubiquitous) printed table of the normal
CDF in the ‘back of the book’.]

Some illustrative examples using R statistical software.

Breaks all the rules except $p = .5$, yet the normal approximation
happens to work quite well.

Let $X \sim Binom(3, .5).$ Find $P(X = 2) = P(1.5 < X < 2.5).$

dbinom(2, 3, .5)
## 0.375       # exact
mu = 3*.5;  sg = sqrt(3*.5*.5); diff(pnorm(c(1.5, 2.5), mu, sg))
## 0.3758935   # norm aprx

Large $n$ and small $p$. Satisfies all rules, but normal approximation is
relatively poor and Poisson approximation is better.

Let $X \sim Binom(1000, .03)$ Find $P(X \le 2)$.

pbinom(20, 1000, .03)
## 0.03328078      # exact via binom CDF
sum(dbinom(0:20, 1000, .03))
## 0.03328078      # exact via binom PDF
mu = 1000*.03; sg = sqrt(1000*.03 *.97);  pnorm(20.5, mu, sg)
## 0.03911311      # relatively poor normal aprx
ppois(20, mu)
## 0.03528462      # better Poisson aprx

Rules satisfied and normal approximation gives 2-place accuracy.

Let $X \sim Binom(100, .3).$ Find $P(24 < X \le 35) = P(24.5 < X < 35.5).$

diff(pbinom(c(24, 35), 100, .3))
## 0.7703512       # exact
sum(dbinom(25:35, 100, .3))
## 0.7703512       # exact
mu = 100*.3;  sg=sqrt(100*.3*.7);  diff(pnorm(c(24.5, 35.5), mu, sg))
## 0.7699377       # relatively good normal aprx
diff(ppois(c(24, 35), mu))
## 0.6853745       # inappropriate Poisson 'aprx'

The figures below show blue bars for binomial probabilities, smooth curves for
normal densities, small circles for Poisson probabilities, and vertical dotted lines to bound approximating areas. Red indicates a relatively poor fit.