# How to calculate discrepancy of a sequence

For $d\geq1$ let $I^d=[0,1)^d$ denote the $d$-dimensional half-open unit cube and consider a finite sequence $x_1,\ldots,x_N\in{I}^d$. For a subset $J\subset{I}$, let $A(J,N)$ denote the number of elements of this sequence inside $J$, i.e.
$$A(J,N)=\left|\{x_1,\ldots,x_N\}\cap{J}\right|,$$
and let $V(J)$ denote the volume of $J$. The discrepenacy of the sequence $x_1,\ldots,x_N$ is defined as
$$D_N=\sup_{J}{\left|A(J,N)-V(J)\cdot{N}\right|},$$
where the supremum is taken over all half-open subintervals $J=\prod_{j=1}^d{[0,t_j)}$, with $0\leq{t_j}\leq1$. The discrepancy thus compares the actual number of points in a given volume with the expected number of points in that volume, assuming the sequence $x_1,\ldots,x_N$ is uniformly distributed in $I^d$.

So far, so good. But say I want to calculate the discrepancy of a given sequence. Does anyone know how to? There are an infinite number of half-open subintervals $J$, making this (at least to me) non-trivial. Eventually I want to compare different low-discrepancy sequences in MATLAB, so if anyone knows a heuristic for this I am equally happy and grateful :)

#### Solutions Collecting From Web of "How to calculate discrepancy of a sequence"

To compute discrepancy, we want to consider two types of sets $J$: ones that contain an unusually low number of points for their volume, and ones that contain an unusually high number of points for their volume.

For sets of the first type, if we can increase their volume without adding any additional points, we might as well do this. If $J = \prod_{j=1}^d [0,t_j)$, then we can increase each $t_j$ without any price to pay, until either $t_j = (x_i)_j$ for some $x_i \in \{x_1, \dots, x_N\}$, or else $t_j = 1$. So we might as well assume $J$ is already of this form: something like

There are $(N+1)^d$ sets to check like this: we just check all possibilities where $t_j \in \{(x_1)_j, \dots, (x_N)_j, 1\}$, for each $j$. (Note that this also wastes some checks: sometimes, we set $t_j = (x_i)_j$ even though $x_i$ is not on the boundary of $j$, because some other coordinate is too large.)

A similar thing happens for sets of the second type, except for weird interaction between their half-openness and the supremum. If $J =\prod_{j=1}^d [0,t_j)$, we want $t_j$ to be just a tiny bit greater than $(x_i)_j$ for some $x_i \in J$, so that $J$ includes $x_i$ without wasting too much volume.

Equivalently, we can just check sets of the form $J = \prod_{j=1}^d [0,t_j]$: closed subintervals. These are not elements of the set we’re taking the supremum over, but they’re limit points: each such set has discrepancy approximated by the discrepancies of $J_\epsilon = \prod_{j=1}^d [0,t_j+\epsilon)$, as $\epsilon \to 0$. Such sets $J$ look like:

Anyway, as before, there’s only a finite number of cases to check here: we iterate over all $N^d$ possibilities where $t_j \in \{(x_1)_j, \dots, (x_N)_j\}$. These are almost exactly the same possibilities we considered in the first case, except now we include the points on the boundary when computing discrepancy.