# How to prove the inequality between mathematical expectations?

Let $X$ and $Y$ be independent random variables having the same distribution and the finite mathematical expectation. How to prove the inequality $$E(|X-Y|) \le E(|X+Y|)?$$

#### Solutions Collecting From Web of "How to prove the inequality between mathematical expectations?"

After a little inspection, we see that
$$E(|X+Y|-|X-Y|) = 2E[Z(1_{XY\geq 0}-1_{XY<0})]$$
where $Z = \min(|X|,|Y|)$.

Remember that for any non-negative random variable $T$,
$$E(T) = \int_0^\infty P(T>t)\,dt.$$

We apply this with $T=Z\,1_{X \geq 0, Y\geq 0}$, $T=Z\,1_{X < 0, Y< 0}$ and $T=Z\,1_{X \geq 0, Y< 0}$. Since $\{Z > t\} = \{|X| > t\}\cap\{|Y| > t\}$, we obtain

$$E(Z \,1_{X \geq 0,Y\geq 0}) = \int_0^\infty P(X > t)P(Y > t)\,dt = \int_0^\infty P(X > t)^2\,dt$$

$$E(Z\, 1_{X < 0, Y < 0}) = \int_0^\infty P(X < -t)P(Y < – t)\,dt = \int_0^\infty P(X < -t)^2\,dt$$

$$E(Z\,1_{X \geq 0, Y< 0}) = E(Z\,1_{\{X < 0, Y \geq 0\}}) = \int_0^\infty P(X > t)P(X < -t)\,dt$$

So finally,
$$E(|X+Y|-|X-Y|) = 2\int_0^\infty (P(X>t)-P(X<-t))^2\,dt \geq 0$$

Remark 1. The inequality is an equality if and only if the distribution of $X$ is symetric, that is $P(X > t) = P(X < -t)$ for any $t \geq 0$.

Remark 2. When $|X|=1$ a.s. the inequality is nothing but the semi-trivial fact that if $X$ and $Y$ are independent with same distribution, then $P(XY \geq 0) \geq \dfrac{1}{2}$.

Remark 3. It is worthwile to mention a nice corollary : $E(|X+Y|) \geq E(|X|)$. The function $x \mapsto |x|$ is convex hence $|X| \leq \frac{1}{2}(|X+Y|+|X-Y|)$. Taking expectations we find
$$\Bbb E(|X+Y|-|X|) \geq \frac{1}{2}\Bbb E(|X+Y|-|X-Y|) \geq 0.$$
Furthermore, there is an equality if and only if $X=0$ a.s.

Edit: Question has changed. Will give answer when time permits.

By the linearity of expectation, the inequality $E(X-Y)\le E(X+Y)$ is equivalent to $-E(Y)\le E(Y)$, which in general is false. It is true precisely if $E(Y)\ge 0$.

Independence is not needed for the argument. Neither is the hypothesis that the random variables have the same distribution.

Below is a set of remarks that’s too long to be put in a comment.

Conjecture. The inequality becomes an equality iff $-X$ has the same
distribution as $X$.

Remark 1. The “if” part of the conjecture is easy : if $X$ and $-X$ have the same distribution, then by the independence hypothesis $(X,Y)$ and $(X,-Y)$ have the same joint distribution, therefore $|X+Y|$ and $|X-Y|$ share the same distribution, so
they will certainly share the same expectation.

Remark 2. Let $\phi_n(t)=t$ if $|t| \leq n$ and $0$ otherwise. If the inequality
holds for any $(\phi_n(X),\phi_n(Y))$ for any $n$, then it will hold for $(X,Y)$ also,
by a dominated convergence argument. So we may assume without loss of generality that
the support of $X$ is bounded.

Let’s consider the question of when $E[f(X,Y)] \geq 0$ in the generality of real-valued functions of arbitrary i.i.d. random variables on probability spaces. With no loss of generality take $f$ to be symmetric in $X$ and $Y$, because $E[f]$ is the same as $E$ of the symmetrization of $f$.

There is a simple, and greatly clarifying, reduction to the case of random variables with at most two values. The general case is a mixture of such distributions, by representing the selection of $(X,Y)$ as first choosing an unordered pair according to the induced distribution on those, and then the ordered pair conditional on the unordered one (the conditional distribution is the $1$ or $2$-valued distribution, and the weights in the mixture are the probability distribution on the de-ordered pair). One then sees, after some more or less mechanical analysis of the 2-valued case, that the key property is:

$f(x,y)=|x+y| – |x-y|$, the symmetric function for which we want to prove $E[f(X,Y)] \geq 0$, is diagonally dominant. That is, $f(x,x)$ and $f(y,y)$ both are larger than or equal to $|f(x,y)|$. By symmetry we really need only to check one of those conditions, $\forall x,y \hskip4pt f(x,x) \geq |f(x,y)|$.

A function satisfying these conditions, now on a general probability space, has non-negative expectation in the 2-valued case, because for $p+q=1$ (the probability distribution), $$E[f] = p^2 f(a,a) + q^2 f(b,b) + 2pq f(a,b) \geq (p-q)^2|f(a,b)| \geq 0$$

The equality cases when expectation is zero are when $p=q$ and $f(a,b) = -f(a,a) = -f(b,b)$. For 1-valued random variables, equality holds at values where $f(p,p)=0$. Due to diagonal dominance these are null points, with $f(p,x)=0$ for all $x$.

This allows a generalization and proof of Ewan Delanoy’s observation, in the general situation: if the support of the random variable has an involution $\sigma$ such that $\sigma(p)=p$ for null points and for non-null points $b=\sigma(a)$ is the unique solution of $f(a,a)=f(b,b)=-f(a,b)$, then the expectation is zero (when finite) if and only if the distribution is $\sigma$-invariant. That is because the expectation zero case must be a mixture of the $1$ and $2$-atom distributions with zero expectation, and all of those assign probability in a $\sigma$-invariant way to the atoms.

Returning to the original problem, for $f(x,y)=|x+y| – |x-y|$ with the absolute value interpreted as any norm on any vector space, diagonal dominance follows from the triangle inequality, $0$ is a unique null point, and the involution pairing every non-null $x$ with the unique solution of $f(x,y)=-f(x,x)=-f(y,y)$ is $x \to -x$. This recovers the characterization that the distribution is symmetric in the critical case, for any $f$ derived from a norm.

Note (*). In passing between ordered and unordered pairs, there might be some issue of “measurable choice” on general measure spaces, or not, and it is an interesting matter what exactly is true about that and whether any condition is needed on the measure space. In the original problem one has a selection function $(\min(X,Y),\max(X,Y))$, if needed to avoid any complications, and the same would be true in any concrete case by using order statistics on coordinates.

let $F(x) = P(X < x)$. I assume that $F$ is differentiable so there is no atom and $F’$ is the cdf of $X$ (and $Y$).

$E(|X+Y|) – E(|X-Y|) = E(|X+Y|-|X-Y|) \\ = 2E(X \; 1_{Y \ge |X|} + Y \; 1_{X \ge |Y|} – X \; 1_{-Y\ge |X|} – Y \; 1_{-X \ge |Y|}) \\ = 4E(X (1_{Y \ge |X|} – 1_{-Y \ge |X|})) \\ = 4E(X(1-F(-X)-F(X))) \\ = 4 \int_\Bbb R x(1-F(-x)-F(x))F'(x)dx \\ = 4 \int_\Bbb R (-x)(1-F(x)-F(-x))F'(-x)dx \\ = 2 \int_\Bbb R x(1-F(x)-F(-x))(F'(x)-F'(-x))dx \\ = \int_\Bbb R (1-F(x)-F(-x))^2dx – [x(1-F(x)-F(-x))^2]_\Bbb R \\ = \int_\Bbb R (1-F(x)-F(-x))^2dx \ge 0$

I am not entirely sure about the last step. $G(x) = 1-F(x)-F(-x)$ does converge to $0$ at both ends, and $G$ has finite variation. But still I am not convinced we can’t carefully pick $F$ such that the bracket doesn’t vanish.

However this is valid if $X$ has compact support or if $G(x)$ vanishes quickly enough (like the normal distribution for example). In this case it also proves Ewan’s conjecture : the difference is $0$ if and only if the distribution is symmetrical with respect to $0$.

E[x] is a linear operator.

This means E[X + Y] = E[X] + E[Y]

Also, E[X – Y] = E[X] – E[Y]

The statement will be true when $E[Y] \ge 0$