Intereting Posts

Proving that $\gcd(2^m – 1, 2^n – 1) = 2^{\gcd(m,n )} – 1$
Books of interesting mathematics aimed at mathematicians
An interesting way to visualize the Mandelbrot Set. Proofs? Simplifications? Extensions?
Relation between Borelâ€“Cantelli lemmas and Kolmogorov's zero-one law
Sparsest matrix with specified row and column sums
Example of a function continuous at only one point.
Finding how multiplication and addition behave on $\mathbb{F}_4$ without any result
Probability that no car is parked next to a car of the same type
Is there a purely algebraic proof of the Fundamental Theorem of Algebra?
Your favourite maths puzzles
Books to read to understand Terence Tao's Analytic Number Theory Papers
Motivation of stable homotopy theory
Is this $\gcd(0, 0) = 0$ a wrong belief in mathematics or it is true by convention?
Showing that $\int_{0}^{\infty} \frac{dx}{1 + x^2} = 2 \int_0^1 \frac{dx}{1 + x^2}$
arithmetic with quantum integers

I’d like to calculate a standard deviation for a very large (but known) number of sample values, with the highest accuracy possible. The number of samples is larger than can be efficiently stored in memory.

The basic variance formula is:

$\sigma^2 = \frac{1}{N}\sum (x – \mu)^2$

- Dominant term and Big Omega
- Convert a Pair of Integers to a Integer, Optimally?
- Why does this algorithm to plot implicit equations work?
- An easy reference for genetic algorithm
- $n$-Bit Strings Not Containing $010$
- The time complexity of finding a neighborhood graph provided an unordered adjacency matrix

… but this formulation depends on knowing the value of $\mu$ already.

$\mu$ can be calculated cumulatively — that is, you can calculate the mean without storing every sample value. You just have to store their sum.

But to calculate the variance, is it necessary to store every sample value? Given a stream of samples, can I accumulate a calculation of the variance, without a need for memory of each sample? Put another way, is there a formulation of the variance which doesn’t depend on foreknowledge of the exact value of $\mu$ before the whole sample set has been seen?

- Problem with an algorithm to $3$-colour the edges of cubic graphs
- What is the probability that $x_1+x_2+…+x_n \le n$?
- How can I pack $45-45-90$ triangles inside an arbitrary shape ?
- Development of a specific hardware architecture for a particular algorithm
- If $ X = \sqrt{Y_{1} Y_{2}} $, then find a multiple of $ X $ that is an unbiased estimator for $ \theta $.
- Rigorous book on bootstrapping, boosting, bagging, etc.
- Heronian triangle Generator
- Does anyone know of any open source software for drawing/calculating the area of intersection of different shapes?
- Is there a polynomial-time algorithm to find a prime larger than $n$?
- Simple explanation and examples of the Miller-Rabin Primality Test

You can keep two running counters – one for $\sum_i x_i$ and another for $\sum_i x_i^2$. Since variance can be written as

$$ \sigma^2 = \frac{1}{N} \left[ \sum_i x_i^2 – \frac{(\sum_i x_i)^2}{N} \right] $$

you can compute the variance of the data that you have seen thus far with just these two counters. Note that the $N$ here is not the total length of all your samples but only the number of samples you have observed in the past.

I’m a little late to the party, but it appears that this method is pretty unstable, but that there is a method that allows for streaming computation of the variance without sacrificing numerical stability.

Cook describes a method from Knuth, the punchline of which is to initialize $m_1 = x_1$, and $v_1 = 0$, where $m_k$ is the mean of the first $k$ values. From there,

$$

\begin{align*}

m_k & = m_{k-1} + \frac{x_k – m_{k-1}}k \\

v_k & = v_{k-1} + (x_k – m_{k-1})(x_k – m_k)

\end{align*}

$$

The mean at this point is simply extracted as $m_k$, and the variance is $\sigma^2 = \frac{v_k}{k-1}$. It’s easy to verify that it works for the mean, but I’m still working on grokking the variance.

- Find the area where dog can roam
- Index of a maximal subgroup in a finite group
- Is $1847^{2013}+2$ really a prime?
- sum of arctangent
- Noetherian ring with finitely many height $n$ primes
- How can I compare two matrices?
- Hilbert's Original Proof of the Nullstellensatz
- If $a$ and $b$ are roots of $x^4+x^3-1=0$, $ab$ is a root of $x^6+x^4+x^3-x^2-1=0$.
- Difference between a function and a graph of a function?
- vector spaces whose algebra of endomorphisms is generated by its idempotents
- Prove the AGM identity using only Hypergeometric series
- is there a property of a category that is preserved by category isomorphism, but not equivalence?
- $4$ or more type $2$ implies $3$ or less type $1$
- How many ways can $r$ nonconsecutive integers be chosen from the first $n$ integers?
- $L^p$-norm of a non-negative measurable function