Intereting Posts

Help finding this set
Triangulation of a simple polygon (elementary proof?)
A quick question about categoricity in model theory
Monty hall problem with leftmost goat selection.
Normal bundle of a section of a $\mathbb{P}^1$-bundle
Show that $\langle,V\rangle = -\langle U,\rangle$ for bi-invariant metric in Lie group
A result of Erdős on increasing multiplicative functions
Closed form for $f(z)^2 + f ' (z)^2 + f ' ' (z) ^2 = 1 $?
Birationally equivalent elliptic curves
Proving: $\frac{2x}{2+x}\le \ln (1+x)\le \frac{x}{2}\frac{2+x}{1+x}, \quad x>0.$
Is this an equivalent statement to the Fundamental Theorem of Algebra?
Proof by induction that $1 + 2 + 3 + \cdots + n = \frac{n(n+1)}{2}$
Connection between Hermite & Legendre polynomials
understanding $\frac{\partial x}{\partial y}\frac{\partial y}{\partial z}\frac{\partial z}{\partial x}=-1$
Traffic flow modelling – How to identify fans/shocks?

The mean absolute deviation is:

$$\dfrac{\sum_{i=1}^{n}|x_i-\bar x|}{n}$$

The variance is: $$\dfrac{\sum_{i=1}^{n}(x_i-\bar x)^2}{n-1}$$

- So the mean deviation and the variance are measuring the same thing, yet variance requires squaring the difference. Why? Squaring always gives a positive value, so the sum won’t be zero, but absolute value also gives a positive value.
- Why isn’t it $|x_i-\bar x|^2$, then? Squaring just enlarges, why do we need to do this?

A similar question is here, but mine is a little different.

Thanks.

- Sample Standard Deviation vs. Population Standard Deviation
- Correctness of a statistical evaluation of a parameter
- Variance is the squared difference - why not to the 3, or 4 instead?
- Why is there not a simpler way to calculate the standard deviation?
- Motivation behind standard deviation?
- Recursive formula for variance
- How can I calculate “most popular” more accurately?
- Why do statisticians like “$n-1$” instead of “$n$”?
- Why the standard deviation of the sample mean is calculated as $\frac{\sigma}{\sqrt{n}}$?
- Determining variance from sum of two random correlated variables

They don’t measure the same thing. To see this, think about physical units.

Suppose the value of $x$ is measured in seconds. For example, $n$ people do a 100-meter race and the values $x_i$ are how many seconds it took each one to finish.

The formula $|x_i – \bar x|$ measures the difference of two times, so it’s also measured in seconds.

The mean absolute deviation is therefore an average of second-values, so it’s also measured in seconds.

However, the formula $(x_i – \bar x)^2$ squares the difference of two times, so it’s measured in *seconds squared*. The variance is therefore also in seconds squared. They don’t belong to the same physical space of variables, so they measure different things.

The *standard deviation*, however (the square root of the variance) is again measured in seconds, so it measures something similar (at least, physically similar).

As for why we like the square-root-of-average-of-squares better than the average-of-absolute-values – the square has better mathematical properties, as shown in other answers and in the link you referred to (particularly Rich’s answer).

A late answer, just for completeness with a different view on the thing.

You might look at your data as measured in a multidimensional space, where each **subject** is a dimension and each **item** is a vector in that space from the origin towards the items’ measurement over the full **subject’s** space.

Additional remark: this view of things has an additional nice flavour because it uncovers the condition, that the **subjects** are assumend independent of each other. This is to have the data-space euclidean; changes in that independence-condition require then changes in the mathematics of the space: it has correlated (or “oblique”) axes.

Now the distance of one vector-arrowhead to another is just the formula for distances in the Euclidean space, the squarerroot of squares of distances-of-coordinates (from the Pythagorean theorem) : $$d = \sqrt { (x_1-y_1)^2+(x_2-y_2)^2+ \cdots+(x_n-y_n)^2}$$ And the standard-deviation is that value, normed by the number of subjects, if the mean-vector is taken as the $y$-vector.

$$\text{sdev} = \sqrt { {(x_1- \bar x)^2 +(x_2-\bar x)^2+ \cdots +(x_n-\bar x)^2 \over n} }$$

They *don’t* measure the same thing. The mean absolute deviation and *standard deviation* measure the same thing (notice the similarity of their names).

The variance is convenient because it satisfies the property that the variance of independent random variables is the sum of the variances.

First of all $|\cdot|^2$ is exactly the same with $(\cdot)^2$ for real $x$. As you mentioned they have some similar characteristics but for many problems coming out of optimization involving Gaussian densities, the optimum result is achieved by squaring. You might want to have a look at viterbi detector for example or lets give another example from estimation theory, which is the energy detector.

One can still use the sample absolute deviation instead of sample variance and can obtain a very good performance but for the examples which I gave the result will NOT be optimum.

Variance is, as you say, a measure of deviation. Or, rather, *standard deviation* (the square root of the variance) is a measure of deviation. So it’s really standard deviation and average deviation you ought to compare.

The difference is the following: If $d_i = |x_i-\bar x|$ are the absolute value deviations, then average deviation is

$$

\frac{d_1 + d_2 + \cdots + d_n}{n}

$$

while standard deviation is

$$

\sqrt{\frac{d_1^2 + d_2^2 + \cdots + d_n^2}{n}}

$$

The normal average uses what is called the *arithmetic mean*, and the standard deviation uses what is called the *quadratic mean*. It is not very difficult to show that, as long as not all the $d_i$ are equal, the standard deviation is strictly larger.

So standard deviation is more affected by outliers than is the average deviation. That is really all there is to it.

A similar case arises in the linear regression where the “least square method” is used, instead for example of a (fictitious) “least absolut values method”. In that case the reason is that squaring has better properties concerning the derivative (minimizing the variability).

In the above case apply similar reasons, that have to do with estimating the bias (of the corresponding sample measure) or making other calculations such as determining the distribution of a sample statistic. Moreover squaring the absolute value is the same as squaring the value itself, i.e. $$|x_i-\bar x|^2=(x_i-\bar x)^2$$ so that this alteration does not lead to a noticeable difference.

If you don’t have a preference for exactly how you measure deviation, then you should choose the measure that’s easiest to compute with.

The standard deviation — the square root of variance — is rather nice for doing actual computations, because the variance has all sorts of nice properties. e.g. the function defining variance is everywhere differentiable (in fact, it’s analytic), and is additive: i.e. $\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$.

There is a very simple explanation for this: it allows for the calculation of analytical solutions for many interesting problems.

As others have pointed out before, $x^2$ is differentiable, whereas $|x|$ is not. Hence, in problems where quadratic terms are present, one can differentiate them to find optimal solutions analytically.

On the other hand, with $|x|$, one often has to resort to numerical schemes to handle the absolute value. Another flip side to using quadratic terms is that the outliers (i.e. large and small $x$ values) have a much higher influence on the $x^2$ terms when compared to their influence on $|x|$. This may be good or bad depending on your application.

You say the variance is $\dfrac{\sum_{i=1}^{n}(x_i-\bar x)^2}{n-1}$.

What if I told you the variance is $\dfrac{\sum_{i=1}^{n}(x_i-\bar x)^2} n$?

You can find both in textbooks. The fact is, dividing by $n-1$ rather than $n$ is properly done (if at all) ONLY when one is estimating the **population** variance by using a finite **sample** $x_1,\ldots,x_n$ that is not the whole population. If $x_1,\ldots,x_n$ is the whole population and each point is equally probable, then the variance of that population is given by the **second** expression about, **not** the first.

Now here’s the important point:

$$

\operatorname{var}(X_1+\cdots+X_n) = \operatorname{var}(X_1) + \cdots + \operatorname{var}(X_n) \tag 1

$$

if $X_1,\ldots,X_n$ are independent random variables.

**That does not work with mean absolute deviation.** (It also does not work in the version with $n-1$ instead of $n$.)

Now suppose $n=1800$ and each $X_i$ is the number of “heads” seen on the $i$th coin toss, so $X_i$ is either $0$ or $1$. Then the sum is the number of “heads” in $1800$ tosses. What is the probability that that number is at least $890$ but not more than $905$? To answer that, one approximates the distribution of the number of “heads” by the normal distribution with the same expected value and the same variance. Without the identity $(1)$, one would not know what that variance is! Abraham de Moivre discovered all this in the $18$th century. And that is why standard deviations rather than mean absolute deviations are used.

- Why is $\pi r^2$ the surface of a circle
- Surjectivity of a map between a module and its double dual
- Product of quadratic residues mod p $\equiv 1$ mod p iff $p \equiv 3$ mod 4
- Is kernel a complemented subspace
- how to prove, $f$ is onto if $f$ is continuous and satisfying $|f(x) – f(y)| ≥ |x – y|$ for all $x,y$ in $\mathbb{R}$
- subset of a topological space is closed if and only if it contains all of its limit points.
- How does partial derivative work?
- If $A \subseteq C$ and $B \subseteq D$ then $A \times B \subseteq C \times D$
- Pathologies in module theory
- Why is $\pi^2$ so close to $10$?
- A problem about convergence…
- How to find the solutions for the n-th root of unity in modular arithmetic?
- $24\mid n(n^{2}-1)(3n+2)$ for all $n$ natural problems in the statement.
- Galois group of algebraic closure of a finite field
- Why is every Noetherian zero-dimensional scheme finite discrete?