Intereting Posts

Compactness in minimax theorem
Average minimum distance between $n$ points generate i.i.d. with uniform dist.
Spherical coordinates for sphere with centre $\neq 0$
Prove using Rolle's Theorem that an equation has exactly one real solution.
Mathematical explanation for the Repertoire Method
If R and S are rings then $R \times S$ is never a field
Absolute continuity on an open interval of the real line?
Find minimal value $ \sqrt {{x}^{2}-5\,x+25}+\sqrt {{x}^{2}-12\,\sqrt {3}x+144}$ without derivatives.
A finite group which has a unique subgroup of order $d$ for each $d\mid n$.
Bound for the degree
Convergence of alternating series based on prime numbers
Show that $f$ cannot have infinitely many zeroes in $$.
Enlightening proof that the algebraic numbers form a field
Using CLT to calculate probability question
Asymptotic formula for $\sum_{n\leq x}\mu(n)^2$ and the Totient summatory function $\sum_{n\leq x} \phi(n)$

In these notes by Terence Tao is a proof of Stirling’s formula. I really like most of it, but at a crucial step he uses the integral identity

$$n! = \int_{0}^{\infty} t^n e^{-t} dt$$

coming from the Gamma function. I have a mathematical confession to make: I have never “grokked” this identity. Why should I expect the integral on the right to give me the number of elements in the symmetric group on $n$ letters?

- Is it possible to generalize Ramanujan's lower bound for factorials when $\{\frac{x}{b_2}\} + \{\frac{x}{b_3}\} < 1$?
- Is there any close form solution possible for the following integral?
- Prove $\int_0^1 \frac{4\cos^{-1}x}{\sqrt{2x-x^2}}\,dx=\frac{8}{9\sqrt{\pi}}\left(9\Gamma(3/4)^2{}_4F_3(\cdots)+\Gamma(5/4)^2{}_4F_3(\cdots)\right)$
- Proving and deriving a Gamma function
- Roots of the incomplete gamma function
- Showing that $2 \Gamma(a) \zeta(a) \left(1-\frac{1}{2^{a}} \right) = \int_{0}^{\infty}\left( \frac{x^{a-1}}{\sinh x} - x^{a-2} \right) \, dx$

(It’s not that I don’t know how to prove it. It’s quite fun to prove; my favorite proof observes that it is equivalent to the integral identity $\int_{0}^{\infty} e^{(x-1)t} dt = \frac{1}{1 – x}$. But if someone were to ask me, “yes, but why, *really*?” I would have no idea what to say.)

So what are more intuitive ways of thinking about this identity? Is there a probabilistic interpretation? What kind of random variable has probability density function $\frac{t^n}{n!} e^{-t}$? (What does this all have to do with Tate’s thesis?)

As a rough measure of what I’m looking for, your answer should make it obvious that $t^n e^{-t}$ attains its maximum at $t = n$.

Edit: The kind of explanation I’m looking for, as I described in the comments, is similar to this explanation of the beta integral.

- Intuition — $c\mid a$ and $c\mid b$ if and only if $c\mid \gcd(a,b)$.
- Not divisible by $2,3$ or $5$ but divisible by $7$
- What is the intuition behind the Poisson distribution's function?
- Intuition on group homomorphisms
- What is the relationship between the second isomorphism theorem and the third one in group theory?
- what does $|x-2| < 1$ mean?
- Intuition for gradient descent with Nesterov momentum
- On the space $L^0$ and $\lim_{p \to 0} \|f\|_p$
- In (relatively) simple words: What is an inverse limit?
- Evaluating $\lim\limits_{n\rightarrow \infty} \frac1{n^2}\ln \left( \frac{(n!)^n}{(0!1!2!…n!)^2} \right)$

I haven’t quite got this straight yet, but I think one way to go is to think about choosing points at random from the positive reals. This answer is going to be rather longer than it really needs to be, because I’m thinking about this in a few (closely related) ways, which probably aren’t all necessary, and you can decide to reject the uninteresting parts and keep anything of value. Very roughly, the idea is that if you “randomly” choose points from the positive reals and arrange them in increasing order, then the probability that the $(n+1)^\text{th}$ point is in a small interval $(t,t+dt)$ is a product of probabilities of independent events, $n$ factors of $t$ for choosing $n$ points in the interval $[0,t]$, one factor of $e^{-t}$ as all the other points are in $[t,\infty)$, one factor of $dt$ for choosing the point in $(t,t+dt)$, and a denominator of $n!$ coming from the reordering. At least, as an exercise in making a simple problem much harder, here it goes…

I’ll start with a bit of theory before trying to describe intuitively why the probability density $\dfrac{t^n}{n!}e^{-t}$ pops out.

We can look at the homogeneous Poisson process (with rate parameter $1$). One way to think of this is to take a sequence on independent exponentially distributed random variables with rate parameter $1$, $S_1,S_2,\ldots$, and set $T_n=S_1+\cdots+S_n$. As has been commented on already, $T_{n+1}$ has the probability density function $\dfrac{t^n}{n!}e^{-t}$. I’m going to avoid proving this immediately though, as it would just reduce to manipulating some integrals. Then, the Poisson process $X(t)$ counts the number of times $T_i$ lying in the interval $[0,t]$.

We can also look at Poisson point processes (aka, Poisson random measures, but that Wikipedia page is very poor). This is just makes rigorous the idea of randomly choosing unordered sets of points from a sigma-finite measure space $(E,\mathcal{E},\mu)$. Technically, it can be defined as a set of nonnegative integer-valued random variables $\{N(A)\colon A\in\mathcal{E}\}$ counting the number of points chosen from each subset A, such that $N(A)$ has the Poisson distribution of rate $\mu(A)$ and $N(A_1),N(A_2),\ldots$ are independent for pairwise disjoint sets $A_1,A_2,\ldots$. By definition, this satisfies

$$

\begin{array}{}\mathbb{P}(N(A)=n)=\dfrac{\mu(A)^n}{n!}e^{-\mu(A)}.&&(1)\end{array}

$$

The points $T_1,T_2,\ldots$ above defining the homogeneous Poisson process also define a Poisson random measure with respect to the Lebesgue measure $(\mathbb{R}\_+,{\cal B},\lambda)$. Once you forget about the order in which they were defined and just regard them as a random set that is, which I think is the source of the $n!$. If you think about the probability of $T_{n+1}$ being in a small interval $(t,t+\delta t)$ then this is just the same as having $N([0,t])=n$ and $N((t,t+\delta t))=1$, which has probability $\dfrac{t^n}{n!}e^{-t}\delta t$.

So, how can we choose points at random so that each small set $\delta A$ has probability $\mu(\delta A)$ of containing a point, and why does $(1)$ pop out? I’m imagining a hopeless darts player randomly throwing darts about and, purely by luck, hitting the board with some of them. Consider throwing a very large number $N\gg1$ of darts, independently, so that each one only has probability $\mu(A)/N$ of hitting the set, and is distributed according to the probability distribution $\mu/\mu(A)$. This is consistent, at least, if you think about the probability of hitting a subset $B\subseteq A$. The probability of missing with all of them is $(1-\mu(A)/N)^N=e^{-\mu(A)}$. This is a multiplicative function due to independence of the number hitting disjoint sets. To get the probability of one dart hitting the set, multiply by $\mu(A)$ (one factor of $\mu(A)/N$ for each individual dart, multiplied by $N$ because there are $N$ of them). For $n$ darts, we multiply by $\mu(A)$ $n$ times, for picking $n$ darts to hit, then divide by $n!$ because we have over-counted the subsets of size $n$ by this factor (due to counting all $n!$ ways of ordering them). This gives $(1)$. I think this argument can probably be cleaned up a bit.

Getting back to choosing points randomly on the positive reals, this gives a probability of $\dfrac{t^n}{n!}e^{-t}dt$ of picking $n$ in the interval $[0,t]$ and one in $(t,t+dt)$. If we sort them in order as $T_1\lt T_2\lt\cdots$ then $\mathbb{P}(T_1\gt t)=e^{-t}$, so it is exponentially distributed. Conditional on this, $T_2,T_3,\ldots$ are chosen randomly from $[T_1,\infty)$, so we see that the differences $T_{i+1}-T_{i}$ are independent and identically distributed.

Why is $\dfrac{t^n}{n!}e^{-t}$ maximized at $t=n$? I’m not sure why the mode should be a simple property of a distribution. It doesn’t even exist except for unimodal distributions. As $T_{n+1}$ is the sum of $n+1$ IID random variables of mean one, the law of large numbers suggests that it should be peaked approximately around $n$. The central limit theorem goes further, and gives $\dfrac{t^n}{n!}e^{-t}\approx\dfrac{1}{\sqrt{2\pi n}}e^{-(t-n)^2/{2n}}$. Stirling’s formula is just this evaluated at $t=n$.

What’s this to do with Tate’s thesis? I don’t know, and I haven’t read it (but intend to), but have a vague idea of what it’s about. If there is anything to do with it, maybe it is something to do with the fact that we are relating the sums of independent random variables $S_1+\cdots+S_n$ distributed with respect to the Haar measure on the multiplicative group $\mathbb{R}_+$ (edit: oops, that’s not true, the multiplicative Haar measure has cumulative distribution given by $\log$, not $\exp$) with randomly chosen sets according to the Haar measure on the additive group $\mathbb{R}$.

The geometric approach works.

Let’s compute the volume of the $2n$ dimensional ball, $D^{2n}$, in two ways. One way is extremely clever but has been known for centuries and provides interesting insights: it’s based on Liouville’s trick. Specifically, we will compute two integrals in polar coordinates, one of which is the volume of the ball and the other of which reduces to a product of one-dimensional integrals. Both integrands will depend (at most) on the radial coordinate $r$, which lets us separate out the surface area of the boundary of the ball as a common factor. Write this surface area as $S_{2n-1}$.

There’s essentially just one way to do this trick: integrate $\exp(-r^2)$. Its integral over $\mathbb{R}^{2n}$ equals

$$S_{2n-1} \int_0^\infty {\exp\left(- r^2 \right) r^{2n-1} dr}.$$

However, because $r^2 = x_1^2 + x_2^2 + \ldots + x_{2n}^2$, the integrand (in *Cartesian* coordinates $\left( x_1, x_2, \ldots, x_{2n} \right)$) factors as $\exp\left(-r^2 \right) = \exp\left(-x_1^2 \right) \cdots \exp\left(-x_{2n}^2 \right)$, each of which must be integrated from $-\infty$ to $+\infty$. Whence

$$S_{2n-1} \int_0^\infty {\exp \left(- r^2 \right) r^{2n-1} dr} = \left( \int_{- \infty}^ \infty {\exp \left( -x^2 \right) dx} \right) ^{2n}.$$

I will call the left hand integral $\tfrac{1}{2} \Gamma \left(n \right)$, because that is what it is (as a simple change of variables shows). In the same notation, $\Gamma \left(1/2 \right) = \int_{-\infty}^\infty {\exp\left(-x^2 \right) dx}$. Algebraic re-arrangement of the foregoing yields the volume of $D^{2n}$ as

$$|D^{2n} | = S_{2n – 1} \int_0^1 {r^{2n – 1} dr} = \frac{{S_{2n – 1} }}

{{2n}} = \frac {\Gamma \left(1/2 \right)^{2n}} { n \Gamma \left(n \right) }.$$

That was the first method: the result is a familiar one, but has been left expressed in a way that better reveals its origins in polar and Cartesian integration.

The next way to compute the ball’s volume is, I believe, new. It is inspired by Cavalieri’s Principle: the idea that you can shift slices of a solid around without changing the volume of the solid. The generalization is to move two-dimensional slices around *and to change their shapes while you do so,* but in a way that does not change their areas. It follows that the new solid has the same (hypervolume) as the original, although it might have a completely different shape.

We will compute the volume of a region $Q_n$ in $\mathbb{R}^{2n}$. It is conveniently described by identifying $\mathbb{R}^{2n}$ with $\mathbb{C}^{n}$, using coordinates $z_i = \left( x_{2i – 1}, x_{2i} \right)$, in terms of which

$$Q_n = \{ \mathbf{z} \in \mathbb{C}^n :0 \leq \left| {z_1 } \right| \leq \left| {z_2 } \right| \leq \cdots \leq \left| {z_n } \right| \leq 1 \}.$$

If these were *real* variables, we could make the volume-preserving transformation $w_1 = z_1, w_2 = z_2 – z_1, \ldots , w_i = z_i – z_{i-1}, \ldots, w_n = z_n – z_{n-1}$, with the sole restriction that the sum of the $w_i$ (all of which are nonnegative) not exceed 1. Because they are *complex* variables, though, we have to consider the area of an annulus bounded by $z_{i-1}$ and $z_i$: it is proportional to $z_i^2 – z_{i – 1}^2$. The circle of the same area has radius $w_i$ for which $w_i^2 = z_i^2 – z_{i – 1}^2$. Therefore, *if we define new variables* $w_i$ *according to this formula*, we obtain a new region- – one of substantially different shape- – having the same volume. This region is defined by $\left| {w_1 }^2 \right| + \cdots + \left| {w_n }^2 \right| \le 1$: that is, it’s our old friend $D^{2n}$. Therefore, **the volume of** $Q_n$ **equals the volume of** $D^{2n}$ .

Now for the punch line: $Q_n$ is a fundamental domain for the action of $S[n]$, the symmetric group, on the product of $n$ disks $T^{2n} = \left( D^2 \right) ^n$; $S[n]$ acts by permuting the Complex coordinates $z_1, \ldots, z_n$. The volume of $T^{2n}$ equals $|D^2|^n = \pi ^n$. Writing $|S[n]|$ for the number of permutations and equating our two completely different calculations of the volume of the $2n$ ball gives

$$\pi ^ n / |S[n]| = \frac {\Gamma \left(1/2 \right)^{2n}} { n \Gamma \left(n \right) },$$

whence

$$|S[n]| = \frac{{\pi ^n n\Gamma \left( n \right)}}{{\Gamma \left( {1/2} \right)^{2n} }}.$$

This simplifies: the volume formula for $n = 2$ must give the area of the unit circle, equal to $\pi$, whence $\Gamma \left( 1/2 \right)^2 = \pi$. Finally, then,

$$|S[n]| = n\Gamma \left( n \right).$$

I will finish by remarking that Liouville’s method is a perfectly natural thing to encounter when working with the multivariate Normal distribution, so it’s not really an isolated trick, but is rather a pretty basic result expressing a defining property of Normal (Gaussian) variates. There are, of course, many other ways to compute the volume of $D^{2n}$, but this one gives us the Gamma function directly.

Sorry to “revive,” but here’s something I noticed while writing 2012 Fall OMO Problem 25. IMO it gives a kind of neat perspective on the gamma function (as a particular case of a “continuous” generating function), so hopefully this is not too off-topic. It may be somewhat related to George Lowther’s answer above, but I don’t have the background to fully understand/appreciate his post. Also, there might be a bit of seemingly irrelevant setup here.

Anyways, first consider the following “discrete” problem:

Find the number of integer solutions to $a+b+c+d=18$ with $0\le a,b,c,d\le 6$.

This is fairly standard via PIE or generating functions: for a decent discussion, see this AoPS thread.

Now consider a close “continuous” variant:

Let $N$ be a positive integer and $M$ be a positive real number. Find the probability that $y_1,y_2,\ldots,y_{N-1}\le M$ for a random solution in nonnegative reals to $y_0+y_1+\cdots+y_N=1$.

The direct generalization would have $y_0,y_N\le M$ as well, but I’ll keep it like this since the OMO problem uses this version (and both versions allow essentially the same solution).

It’s easy to generalize the PIE solution to the discrete version, but here’s what I got when I tried to generalize the generating functions solution (unfortunately it’s not really rigorous, but I feel like it should all be correct).

To extend the discrete generating functions solution, suppose we work with formal power “integrals” indexed by nonnegative reals instead of formal power series (indexed by nonnegative integers). As a basic example, the integral $\int_0^\infty x^t\; dt$ has a $1$ coefficient for each $x^t$ term, and thus corresponds to $y_0,y_N$—the discrete analog would be something like $\sum_{t=0}^{\infty}x^t$. For $y_1,\ldots,y_{N_1}$, which are bounded above by $M$, we instead have $\int_0^M x^t\; dt$. By “convolution” the desired probability is then $$\frac{[x^1](\int_0^M x^t\; dt)^{N-1}(\int_0^\infty x^t\; dt)^2}{[x^1](\int_0^\infty x^t\; dt)^{N+1}}.$$ But $\int_0^M x^t\; dt = \frac{x^M – 1}{\ln{x}}$ and for positive integers $L$, $(-\ln{x})^{-L} = \frac{1}{(L-1)!}\int_0^\infty x^t t^{L-1}\; dt$. Note that this is essentially the gamma function when $x=e^{-1}$, and for $L=1$ we have $\int_0^\infty x^t\; dt = (-\ln{x})^{-1}$. This is not hard to prove by differentiating (w.r.t. $x$) under the integral sign, but for integer $L$ there’s a simple combinatorial proof of $$\frac{1}{(L-1)!}\int_0^\infty x^t t^{L-1}\; dt = (\int_0^\infty x^t\; dt)^L$$ which may provide a bit of intuition for the gamma function. Indeed, suppose we choose $L-1$ points at random from the interval $[0,t]$; the resulting sequence is nondecreasing with probability $\frac{1}{(L-1)!}$. These $L-1$ “dividers” split the interval $[0,t]$ into $L$ nonnegative reals adding up to $t$, which corresponds to the convolution on the RHS. It’s not hard to check that this yields a bijection, which explains the $[x^t]$ coefficient of the LHS ($\frac{t^{L-1}}{(L-1)!}$). (Compare with the classical “balls and urns” or “stars and bars” argument for the discrete analog. In fact, it shouldn’t be hard to interpret the $[x^t]$ coefficient as a limiting case of the discrete version.)

Now back to the problem: by “convolution” again (but used in a substantive way this time) we find

$$\begin{align*}

[x^1]\left(\int_0^M x^t\; dt\right)^{N-1}\left(\int_0^\infty x^t\; dt\right)^2

&=[x^1](1-x^M)^{N-1}(-\ln{x})^{-(N-1)}(-\ln{x})^{-2} \\

&=[x^1](1-x^M)^{N-1}\frac{1}{N!}\int_0^\infty x^t t^N\; dt \\

&=\frac{1}{N!}\sum_{0\le k\le N-1,\frac{1}{M}}(-1)^k\binom{N-1}{k}(1-kM)^N

\end{align*}$$ and $$[x^1](\int_0^\infty x^t\; dt)^{N+1}

=[x^1](-\ln{x})^{-(N+1)}

=[x^1]\frac{1}{N!}\int_0^\infty x^t t^N\; dt

=\frac{1}{N!}.$$ The desired probability thus comes out to $\sum_{0\le k\le N-1,\frac{1}{M}}(-1)^k\binom{N-1}{k}(1-kM)^N$.

You can see how this explicitly applies to the OMO problem in my post here. Everything I wrote above is based off of the “Official ‘Continuous’ Generating Function Solution” there.

You are really asking for a direct connection between some property of $n!$ and the integral. This can be done from the recursion

$S[n] = nS[n-1], S[0] = 1$

where $S[n]$ is the order of the symmetric group on $n$ elements. The exponential generating function for this series equals $1/(1-t)$. As in your favorite proof, replace that by $\int_{0}^{\infty} e^{(x-1)t} dt$, expand $e^{xt}$ as a series, reverse the order of summation and integration, and you recover a power series in $t$ whose terms are precisely $\int_{0}^{\infty} t^n e^{-t} dt/n!$. The result follows upon term-by-term comparison.

I realize that’s not a whole lot more insight, but it does show explicitly a connection between a defining property of $n!$ and the integral for the $\Gamma$ function.

As far as your subsequent musings go: the integrand is the probability density of a Gamma variate, of course! One useful relation is that the Gamma distribution with parameter $n$ is the sum (i.e., convolution) of $n$ independent exponential variates (i.e., Gamma variates with $n=1$). The expectation of an exponential variate is 1 (this is easy), whence the expectation of a Gamma variate must be $n$ (because expectations add), strongly suggesting the mode of its pdf should be near $n$ (as justified for large $n$ by the Central Limit Theorem).

This is another answer in terms of Poisson processes and the Gamma distribution, and it still uses a bit of calculus which you might call a trick, but I think at least it does add another bit of intuition:

Consider the homogeneous Poisson process with rate parameter 1; this means we are counting the number of occurrences of an event that happens with rate 1. Let’s calculate the probabilities $p_k(t)$ that we are in state $k$ at time $t$, i.e., that the event occurs $k$ times in the interval $[0,t]$.

Since the event happens with rate 1, probability mass flows from $p_k(t)$ to $p_{k+1}(t)$ with rate 1. This means that ${p'}_0(t) = -p_0(t)$ and ${p'}_{k+1}(t) = p_k(t) – p_{k+1}(t)$. Also, $p_0(0) = 1$ and $p_{k+1}(0) = 0$. Here comes the bit of calculus: these equations have the solution $p_k(t) = \frac{t^k}{k!} e^{-t}$. (A bit fuzzily, we can read this in two parts: $\frac{t^k}{k!}$ is 1 integrated $k$ times, and $e^{-t}$ represents probability mass being lost at rate 1 to states further down the line. See below for yet another fuzzy explanation.)

Now consider the waiting time $T_k$ until the $k$’th occurrence. Clearly, $T_k = t$ means that the transition from state $k-1$ to state $k$ happens at time $t$, so the probability of $T_k \le t$ is the probability that the transition happens before time $t$, and the density is the derivative of this, i.e., the rate at which probability mass flows from state $k-1$ to $k$. This equals the occurrence rate (i.e., 1) times $p_{k-1}(t)$.

So the probability density of the random variable $T_k$ is $1 \cdot p_{k-1}(t) = \frac{t^{k-1}}{(k-1)!} e^{-t}$ (for $t \ge 0$). Since the probability that there is no occurrence ever is obviously zero, $\int_{0}^{\infty} \frac{t^{k-1}}{(k-1)!} e^{-t} dt = 1$.

Incidentally, this is related to a way of thinking about why $\sum_{k=0}^\infty \frac{t^k}{k!} = \lim_{n\to\infty} (1 + \frac{t}{n})^n$. Suppose you start with one unit of money in an account and get 100% interest, continuously compounded. However, the interest from the original account (number 0) is paid not the original account, but to account #1; interest from account #1 is paid to account #2, and so on.

Then the money $m_k(t)$ in account $k$ equals $1$ integrated $k$ times, and the total money $m(t)$ is $\sum_{k=0}^\infty m_k(t) = \sum_{k=0}^\infty \frac{t^k}{k!}$. But on the other hand, $m(t)$ is continuously compounded at 100% interest, so $m(t) = \lim_{n\to\infty} (1 + \frac{t}{n})^n$ by the usual reasoning.

This gives another fuzzy argument why we should have $p_k(t) = \frac{t^k}{k!} e^{-t}$. The change in the $p_k(t)$ over time consists of two parts: on the other hand, each $p_{k+1}(t)$ increases at rate $p_k(t)$; on the other hand, each $p_k(t)$ decreases at rate $p_k(t)$. If we view the $p_k(t)$ as accounts, since we are taking money out of every account at the constant rate 1, the effect is to decrease the total amount at the constant rate 1, i.e. by a factor of $e^{-t}$, which cancels out the increase of $e^t$ due to the accruing interest. It makes some intuitive sense that we can model this effect by simply rescaling the amount of money in each of the accounts by $e^{-t}$.

**Geometric approach**

Note that $\frac{t^n}{n!}$ is the volume of the set $S_t=\{(t_1,t_2,\dots,t_n)\in\mathbb R^{n}\mid t_i\geq 0\text{ and } t_1+t_2+\cdots+t_n\leq t\}$.

So we can perform a change of variables in the integral, replacing $t=t_1+\dots + t_{n+1}$, as:

$$\begin{align}\int_{0}^\infty \frac{t^n}{n!}e^{-t}\,dt &= \int_{0}^{\infty}\left(\int_{(t_i)\in S_t}\,dt_1\dots dt_n\right)e^{-t}\,dt \\&= \int_{t_1,t_2,\dots,t_n,t_{n+1}} e^{-(t_1+\dots +t_{n+1})}\,dt_1\dots dt_{n+1}

\end{align}$$

Where all variables go from $0$ to $\infty$. But this can clearly be factored as the $$\left(\int_{0}^\infty e^{-t}\,dt\right)^{n+1}=1$$

So, what’s happening is really in $n+1$ dimensions – the linear map $\mathbb R^{n+1}\to\mathbb R^{n+1}$ defined as $$(t_1,\dots,t_n,t_{n+1})\mapsto\left(t_1,\dots,t_n,\sum_{i=1}^{n+1}t_i\right)$$

preserves volumes.

And then using the property $e^{x+y}=e^xe^y$.

**Moment generating function approach**

The exponential random variable $T$ with $P(T<t)=1-e^{-t}$ for $t\geq 0$ has:

$$E(T^n)=\int_{0}^{\infty} t^ne^{-t}\,dt$$

So we get the moment generating function: $$M_T(x) = \sum \frac{E(T^n)}{n!}x^n$$

But we also have:

$$M_T(x) = E(e^{Tx}) = \int_0^{\infty} e^{tx}e^{-t}\,dt = \int_0^{\infty} e^{-t(1-x)}\,dt = \frac{1}{1-x}$$

This means that $E(T^n)=n!$, which is the result you want.

I suppose this could be written without reference to probability as: If $I_n$ is your integral, then, since everything is positive and convergent, then for $0<x<1$ you can show:

$$\sum \frac{I_n}{n!}x^n = \int_0^{\infty} \left(\sum \frac{(xt)^n}{n!}\right)e^{-t}\,dt = \int_0^{\infty} e^{-t(1-x)}\,dt = \frac{1}{1-x}$$

The Taylor remainder in integral form is a general framework that relates many important identities and ideas. The relation to $\mathrm{B}$ and $\Gamma$ will be clear as follows.

Taylor expand $f\colon\mathbf R\to \mathbf R$ around $0$ with explicit remainder. The expansion reads $$f(x)=\sum_{k=0}^n\frac{x^k}{k!}f^{(k)}(0)+\int_0^x f^{(n+1)}(x-t)\frac{t^n}{n!}\,dt.$$

Consider expanding $x^m$. It is not hard to see that $\dfrac{d^{n+1}}{dx^{n+1}}(x^m)=\dfrac{m!}{(m-n-1)!}x^{m-n-1}$. The crucial property of $f$ that makes it so special to Taylor expand is that $$f^{(k)}(0)=m!\delta_{k, m}=\begin{cases}m!,\, k=m \\ 0, \,\,\,\,\text{otherwise}\end{cases}$$

The Taylor sum is simply zero if $m\gt n$, so $$x^{m} = \frac{m!}{n! (m-n-1)!} \int_0^x t^{m-n-1}(x-t)^n\, dt. \tag{$m\gt n$}$$ Setting $x=1$ and mapping $m-n-1\mapsto p$ immediately gives the beta integral: $$\frac{n!\,p!}{(n+p+1)!}=\int_0^1 t^p(1-t)^n\, dt. \tag{$p, n \gt 0$}$$

To find the gamma function, we will need to recognize the Gauss factorial product for $p$ natural, a fairly intuitive one. Manipulating the expansion of $x^{p+n+1}$ yields $$x^{p+n+1}=\frac{(p+n+1)!}{p!\, n!}\int_0^x t^p(x-t)^n\, dt=\frac{(p+n+1)!}{p!\, n!}x^n \int_0^x t^p \left(1-\frac tx\right)^n\, dt,$$ so the finite version of the gamma function is $$\int_0^x t^p \left(1-\frac tx\right)^n\, dt = \left(\frac{x^{p+1}n!}{(p+n+1)!}\right) p!. \tag{$\star$}$$

Taking limits as $x, n \to \infty$, assuring $x/n \to 1$ and recognizing the Gauss product gives the well known gamma function: $$\int_0^\infty t^p e^{-t}\, dt=p!.$$

This is ready to be generalized: we didn’t prove it for real $p$, but we can use the standard investigation to extend it from now on.

A number of other concepts are also strongly related. To find Cauchy formula for repeated integration, we map $n+1\mapsto 0$. We are left with $$I^{n+1}f(x)=\int_0^x f(t) \frac{(x-t)^n}{n!}\,dt,$$ the usual formula. The Taylor expansion is again zero, but by induction: $If(0)=0$ and $I^kf(0)\implies I^{k+1}f(0)=0.$ We can now describe the Laplace transform and its limit version.

From Cauchy, $$\int_0^x f(t) \left(1-\frac tx\right)^n\,dt= \frac{(I^{n+1}f)(x)\,n!}{x^n}.$$ We want to investigate the limiting behaviour as $n\to \infty$ and $x\approx n/s$. Using limits immediately yields the Laplace transform in a limit fashion: $$\mathcal Lf(s)=\int_0^\infty f(t)e^{-st}\, dt=\lim_{n\to\infty}\frac{s^n\, n!(I^{n+1}f)(n/s) }{n^n}.$$ Unfortunately, this still doesn’t make the inverse transform obvious, for example. It seems all these processes have an interpretation in terms of approximating a function by Taylor, I don’t know how exactly yet.

Here is an elementary observation,

For any integer $ n \geq 1$ and a given real number $\mu > 0$ (this $\mu$ would be the parameter of a Poisson random variable $X$.)

Then by integration by parts, one can obtain the following identity:

$$ \int_0^\mu e^{-x} \cdot x^n \, dx = n! – e^{-\mu} \cdot n! \left( 1 + \mu + \frac{\mu^2}{2!} + \ldots + \frac{\mu^n}{n!} \,\, \right) \quad(\star) $$

The left side is a Riemann integral while the right side can be written as the following

$$n! \cdot \sum_{k = n +1}^\infty e^{-\mu} \frac{\mu^k}{k!} \quad \text{which is equal to } \quad n! \cdot \mathbb{P}\big( X \geq n+1 \big) $$

When the parameter $\mu$ goes to infinity, $X$ becomes almost surely infinite and the improper integral $\int_0^\infty e^{-x} \cdot x^n \, dx $ converges, that is,

$$\lim_{\mu\to+\infty} \int_0^\mu e^{-x} \cdot x^n \, dx = \lim_{\mu\to+\infty} n! \cdot \mathbb{P}\big( X \geq n+1 \big) = n! \,\, .$$

(the last limit-equality is derived better from ($\star$), since $n$ is fixed.)

I learned the identity $(\star)$ from some book, if I remember correct, may be Hardy’s a course of pure math.

- Using congruences, show $\frac{1}{5}n^5 + \frac{1}{3}n^3 + \frac{7}{15}n$ is integer for every $n$
- How to find the order of a group generated by two elements?
- Writing elements of field extension in terms of the basis determined by a root of a polynomial
- There is a well ordering of the class of all finite sequences of ordinals
- I feel that (physics) notation for tensor calculus is awful. Are there any alternative notations worth looking into?
- An example of a non Noetherian UFD
- Why do you add +1 in counting test questions?
- Is $f(x,y)=(x+2y+y^2+|xy|, 2x+y+x^2+|xy|)$ differentiable?
- Normal Subgroup Counterexample
- Find $\Big\{ (a,b)\ \Big|\ \big|a\big|+\big|b\big|\ge 2/\sqrt{3}\ \text{ and }\forall x \in\mathbb{R}\ \big|a\sin x + b\sin 2x\big|\le 1\Big\}$
- Is there a vector space that cannot be an inner product space?
- What does the function f: x ↦ y mean?
- $L^p$ norm of Dirichlet Kernel
- A Hamel basis for $l^{\,p}$?
- Proof about matrix invertibility using matrix norms and infima