Intereting Posts

Number of zero digits in factorials
Contraction of non-zero prime ideals in the ring of algebraic integers
Number of strict total orders on $N$ objects
solution set for congruence $x^2 \equiv 1 \mod m$
Can a Mersenne number be a power (with exponent > 1) of a prime?
Integral domain with a finitely generated non-zero injective module is a field
Calculating the integral $\int_{0}^{\infty} \frac{\cos x}{1+x^2}\mathrm{d}x$ without using complex analysis
Good metric on $C^k(0,1)$ and $C^\infty(0,1)$
Find the positive integer solutions of $m!=n(n+1)$
$\lim_{x\to 0+}\ln(x)\cdot x = 0$ by boundedness of $\ln(x)\cdot x$
Homotopy groups of infinite Grassmannians
Mondrian Art Problem Upper Bound for defect
How to prove $D^n/S^{n-1}\cong S^n$?
Representing the dirac distribution in $H^1(\mathbb R)$ through the scalar product
Integral $\int\limits_0^\infty \prod\limits_{k=0}^\infty\frac{1+\frac{x^2}{(b+1+k)^2}}{1+\frac{x^2}{(a+k)^2}} \ dx$

Given a joint distribution $P(A,B,C)$, we can compute various marginal distributions. Now suppose:

\begin{align}

P1(A,B,C) &= P(A) P(B) P(C) \\

P2(A,B,C) &= P(A,B) P(C) \\

P3(A,B,C) &= P(A,B,C)

\end{align}

Is it true that $d(P1,P3) \geq d(P2,P3)$ where d is the total variation distance?

In other words, is it provable that $P(A,B) P(C)$ is a better approximation of $P(A,B,C)$ than $P(A) P(B) P(C)$ in terms of the total variation distance? Intuitively I think it’s true but could not find out a proof.

- there exist infinite many $n\in\mathbb{N}$ such that $S_n-<\frac{1}{n^2}$
- How do I prove $\sqrt{x^2 + y^2} \le |x| + |y|$?
- Do these inequalities regarding the gamma function and factorials work?
- Prove that $\frac{a_1^2}{a_1+a_2}+\frac{a_2^2}{a_2+a_3}+ \cdots \frac{a_n^2}{a_n+a_1} \geq \frac12$
- Proof of the inequality $\frac{a}{b+c}+\frac{b}{a+c}+\frac{c}{a+b} \geq \frac{3}{2}$
- Proof of Nesbitt's Inequality: $\frac{a}{b+c}+\frac{b}{c+a}+\frac{c}{a+b}\ge \frac{3}{2}$?

- Probability of iid random variables to be equal?
- Can these bounds, for the deficiency $D(x)=2x-\sigma(x)$ of a deficient number $x>1$, be improved?
- If $0<a<1, 0<b<1$, $a+b=1$, then prove that $a^{2b}+ b^{2a} \le 1$
- $66$ points in $100$ shots.
- Probability of having at least $K$ consecutive zeros in a sequence of $0$s and $1$s
- Find minimum of $a+b+c+\frac1a+\frac1b+\frac1c$ given that: $a+b+c\le \frac32$
- A unsolved puzzle from Number Theory/ Functional inequalities
- solving inequalities with fractions on both sides
- Inclusion-exclusion-like fractional sum is positive?
- Bound variance proxy of a subGaussian random variable by its variance

*(Note: This is a full reworking of my initial answer. Credit is due to user @Did who criticized my original offer and pressured for something to be done about it. So here it is).*

The OP provided a specific counter example and hence essentially proved that OP’s initial conjecture does not hold in general. Still the result appears counter-intuitive, because instinctively we think “two sources of divergence should make matters worse than one”. But there is another way to gain intuition here: realize that sometimes “a second divergence may offset the first one”. This is a more general phenomenon — for example in welfare economics, if just one of the conditions for welfare maximization cannot be respected, it has been proven that then we might become better off by violating a second optimality condition, than trying to implement as many as possible (or in more vulgar terms, “if not perfect, then anything goes”).

Since the Question’s conjecture has been disproved, the purpose of my answer is to formalize a bit why it doesn’t hold, and also, try to link intuition to mathematical expressions.

For $P_3(A,B,C)$ to be non-zero, we assume $P(C|{A,B})\gt0$, $P(A|B)\gt0$. Note that all probabilities are taken with respect to the true probability measure $P_3=P$ – the only source of divergence in $P_1$ and $P_2$ is the assumptions about the dependence structure, not measurement.

Decomposing the three distributions we have

\begin{align}

P_1(A,B,C) &= P(A) P(B) P(C) \\

P_2(A,B,C) &= P(A,B) P(C) \\

P_3(A,B,C) &= P(A,B,C) = P(A,B)P(C|{A,B})

\end{align}

Then the TVD measures can be expressed as

$$d(P_1,P_3) = \sup|P_1 – P_3| = \sup|P(A) P(B) P(C) – P(A,B)P(C|{A,B})| \\= \sup|P(A,B)\left[\frac {P(A)}{P(A|B)}P(C) – P(C|{A,B})\right]| \\= \sup P(A,B)\sup|\left[\frac {P(A)}{P(A|B)}P(C) – P(C|{A,B})\right]|

$$

and

$$d(P_2,P_3) = \sup|P_2 – P_3| = \sup|P(A,B) P(C) – P(A,B)P(C|{A,B})| \\= \sup|P(A,B)\left[P(C) – P(C|{A,B})\right]| \\= \sup P(A,B)\sup|P(C) – P(C|{A,B})| $$

Let’s write the two TVD’s side by side scaled by their common factor:

$$\bar d(P_1,P_3) = \sup|\left[\frac {P(A)}{P(A|B)}P(C) – P(C|{A,B})\right]|\;,\qquad \bar d(P_2,P_3) = \sup|P(C) – P(C|{A,B})| $$

The $P_2$ TVD is affected only by the one wrong assumption (independent $C$), while the $P_1$ TVD is affected by both, as should be expected. But *how* is it affected? The second wrong assumption ($A$ and $B$ independent) is “represented” by the factor $\frac {P(A)}{P(A|B)}$ – but which it does *not* affect multiplicatively the *distance* $\sup|P(C) – P(C|{A,B})|$, *but only one of its boundaries*, $P(C)$. And it is the boundary that reflects the other wrong assumption. We will show it in a while, but it is already evident that since we are calculating maximum absolute values here, the effect of this factor is in no way monotonic in the way it enters the expression- it can be higher or lower than unity, and increase or decrease the supremum involved: the second “mistake” in $P_1$ affects the consequences of the first -and it may make the overall distance longer or shorter.

To show this more formally, while clearing also our eyes, we define $a\equiv \frac {P(A)}{P(A|B)} \in (0,M)$ where $M$ is some positive number, and $\; x\equiv P(C)\in(0,1)\;,y\equiv P(C|{A,B})\in (0,1)$.

Then the conjecture $d(P_1,P_3)\ge d(P_2,P_3)$ boils down to whether $\sup|ax-y|\ge \sup|x-y|$.

Now, given that $x \in(0,1)\;,y \in (0,1)$, the maximum possible range of the function $|x-y|$ defines the maximal set $H_2=(0,1)$, a bounded subset of $\Bbb R$. By the same reasoning, the maximum possible range of $ax$ is $(0, M),\;$ and then the maximum possible range of $|ax-y|$ defines the maximal set $H_1=\left(0,\max (M,1)\right)$, which is also a bounded subset of $\Bbb R$. Denote $h_1$ and $h_2$ the actual range sets produced by $P_1$ and $P_2$ respectively. By construction $h_1\subseteq H_1\;,\;h_2\subseteq H_2\; $.

Then we have two cases:

$$Μ\lt 1 \Rightarrow h_1\subseteq H_1\subset H_2\;\Rightarrow\;\sup h_1\le\sup H_1 \lt \sup H_2\;,\; \text{and} \;\sup h_2\le\sup H_2$$

$$Μ\gt 1 \Rightarrow h_2\subseteq H_2\subset H_1\;\Rightarrow\;\sup h_2\le\sup H_2 \lt \sup H_1\;,\ \text{and} \;\sup h_1\le\sup H_1$$

But in neither case can we infer that $\sup h_1\le \sup h_2$ or $\sup h_2\le \sup h_1$ – the magnitude of the factor representing the second wrong assumption cannot determine that. The result is distribution-specific – and since the source of everything is the joint distribution which can describe arbitrary dependence structures as long as it sums up to unity when needed, we conclude that OP’s counter example is not an outlier but representative of the situation: anything goes. Finally, one could thing that the problem may be the distance measure used, the total variational distance. Not really: in OP’s counter example, if one computes the Hellinger distance $H(P,Q) = \frac {1}{\sqrt 2} \left(\sum_i\left[\sqrt p_i-\sqrt q_i\right]^2\right)^\frac 12$ one will find $H(P_1,P_3)=0.13$ while $H(P_2,P_3)=0.31$. Same qualitative result, $P_1$ is “closer” to $P_3$ than $P_2$ is.

I just find the following counter-example. Suppose $A,B,C$ are discrete variables. $A,B$ can each take two values while $C$ can take three values.

The joint distribution $P(A,B,C)$ is:

\begin{array}{cccc}

A & B & C & P(A,B,C) \\

1 & 1 & 1 & 0.1/3 \\

1 & 1 & 2 & 0.25/3 \\

1 & 1 & 3 & 0.25/3 \\

1 & 2 & 1 & 0.4/3 \\

1 & 2 & 2 & 0.25/3 \\

1 & 2 & 3 & 0.25/3 \\

2 & 1 & 1 & 0.4/3 \\

2 & 1 & 2 & 0.25/3 \\

2 & 1 & 3 & 0.25/3 \\

2 & 2 & 1 & 0.1/3 \\

2 & 2 & 2 & 0.25/3 \\

2 & 2 & 3 & 0.25/3 \\

\end{array}

So the marginal distribution $P(A,B)$ is:

\begin{array}{ccc}

A & B & P(A,B) \\

1 & 1 & 0.2 \\

1 & 2 & 0.3 \\

2 & 1 & 0.3 \\

2 & 2 & 0.2 \\

\end{array}

The marginal distributions $P(A), P(B)$ and $P(C)$ are uniform.

So we can compute that:

\begin{align}

d(P1,P3) &= 0.1 \\

d(P2,P3) &= 0.4/3

\end{align}

- How to find the number of roots using Rouche theorem?
- Tensors, what should I learn before?
- CDF of sum of dependent random variables
- Munkres Topology, page 102, question 19:a
- If $f$ has only removable discontinuities, show that $f$ can be adjusted to a continuous function
- Does the Extended Euclidean Algorithm always return the smallest coefficients of Bézout's identity?
- An error in application of Runge's theorem
- Need some help with this recurrence equation
- If $T:X \to Y$ is a linear homeomorphism, is its adjoint $T^*$ a linear homeomorphism?
- Why in a DFA the empty string distinguishes any accept state from any reject state?
- $\operatorname{Im} A = (\operatorname{ker} A^*)^\perp$
- “Sequential continuity is equivalent to $\epsilon$-$\delta $ continuity ” implies Axiom of countable choice for collection of subsets of $\mathbb R$?
- The inverse of the matrix $\{1/(i+j-1)\}$
- Does this packing problem even have an optimal solution?
- Proof for: $(a+b)^{p} \equiv a^p + b^p \pmod p$