Jensen's Inequality (with probability one)

In the following theorem, I have a problem about the second part. That is showing if $f$ is strictly convex then $X=EX$ with probability $1$. While I can see this must be true, I don’t know how to show it holds with probability $1$. I am specially interested in the case where $X$ is s continuous random variable in reals.
enter image description here

Thanks a lot in advance for explaining.

Solutions Collecting From Web of "Jensen's Inequality (with probability one)"

Let $f : \mathbb{R} \to \mathbb{R}$ be convex. This means that at every point $a \in \mathbb{R}$, there is an affine linear function $l_a : \mathbb{R} \to \mathbb{R}$ which is dominated by $f$, i.e.
$$
l_a(x) \leq f(x)
$$
and $l_a(a) = f(a)$. When $f$ is differentiable, for example, then $l_a$ is the tangent to $f$ at $a$.

When $f$ is strictly convex, we have the additional condition
$$
l_a(x) = f(x) ~\Rightarrow ~ x = a
$$
Before we define $a$ in this particular problem (and sweeping integrability problems under the rug), notice that
$$
l_a(X) \leq f(X)
$$
holds, hence $E l_a(X) \leq E f(X)$. Moreover $E l_a(X) = l_a(E X)$ because of the linearity of $l_a$. Finally, we set $a = EX$, and have obtained
$$
f(EX) \leq E f(X)
$$
Suppose now that $f(EX) = E(fX)$, which can be written as $E l_a(X) = E f(X)$ with our choice $a = E X$.

With this setup, consider $E [f(X) – l_a(X)] = 0$. Inside the expectation we have a nonnegative random variable (because of convexity) and it has expectation zero. We conclude that $f(X) = l_a(X)$ almost everywhere (because we used the integral to do so! the integral doesn’t see measure zero sets.)

Now we use strict convexity: $f(X) = l_a(X) ~\Rightarrow~ X = a = EX$ almost surely, i.e. $X$ is a constant.

Addendum:
Claim: If $Y$ is a nonnegative-valued random variable and $E Y = 0$, then $Y = 0$ almost surely.

To see this, let $A_n = \{Y \geq 1/n\}$, i.e. the set where $Y$ is greater than $1/n$. Note that $\cup_n A_n = A := \{Y > 0\}$. Let’s show that $P A_n = 0$ for any $n$, where $P$ is the probability measure.

$$
\frac{1}{n} P A_n \leq E (Y I_{A_n}) \leq E Y = 0
$$
Now recall that $P \cup_n A_n \leq \sum_n P A_n$, which is often called the ‘countable subadditivity’ property. This implies that $P A = 0$, and the claim follows.

Here is an alternative proof (given several years later) that is a bit more general as it does not require existence of an affine bounding function (subgradients do not always exist for convex functions defined over restricted domains).


Fix $n$ as a positive integer, let $\mathcal{X} \subseteq \mathbb{R}^n$ be a convex set, and let $f:\mathcal{X}\rightarrow\mathbb{R}$ be a strictly convex function, meaning that
$$f(px + (1-p)y) < pf(x) + (1-p)f(y)$$
whenever $0<p<1$ and $x, y \in \mathcal{X}$, $x \neq y$.

Let $X$ be a random vector that takes values in $\mathcal{X}$ and that has a finite expectation $E[X]$. We know that $E[X] \in \mathcal{X}$ (this is a precursor to Jensen’s inequality). Suppose that $f(E[X]) = E[f(X)]$. We show that $X=E[X]$ with prob 1.

Proof:

Define $m=E[X]$. Suppose $P[X>m] >0$ (we reach a contradiction).

Case 1: Suppose $P[X>m]=1$. Then $X-m$ is a positive random variable with prob 1 and so $E[X-m]>0$, meaning $m-m>0$, a contradiction.

Case 2: Suppose $0 < P[X>m] < 1$. Define $m_1 = E[X|X\leq m]$ and $m_2 = E[X|X>m]$. Note that $m_1 \leq m < m_2$ and
$$m_1P[X\leq m] + m_2 P[X>m] = m$$
Also
\begin{align}
f(m) &\overset{(a)}{=} E[f(X)] \\
&= E[f(X)|X\leq m]P[X\leq m] + E[f(X)|X>m]P[X>m] \\
&\overset{(b)}{\geq} f(E[X|X\leq m])P[X\leq m] + f(E[X|X>m])P[X>m] \\
&= f(m_1)P[X\leq m] + f(m_2)P[X>m] \\
&\overset{(c)}{>} f(m_1 P[X\leq m] + m_2 P[X>m])\\
&= f(m)
\end{align}
where (a) holds by the assumption $f(E[X]) = E[f(X)]$; (b) holds by Jensen’s inequality applied to the conditional expectations; (c) holds by strict convexity. Hence, $f(m)>f(m)$, a contradiction.

Cases 1 and 2 together imply that $P[X>m]=0$. Similarly it can be shown that $P[X<m]=0$. $\Box$