The pseudoness of pseudorandom number generators

Is there a reasonable statistic test one can do to standard random number generators (say, one of those that come built in in Python libs) which shows they are not really random?

(By reasonable I mean to exclude silly things like «generate a looong list and see that it is periodic»)

Later: Notice I am looking for a test that fails on standard generators.

Solutions Collecting From Web of "The pseudoness of pseudorandom number generators"

The standard reference is Knuth, The Art of Computer Programming, Volume 2, Seminumerical
Algorithms. Anyone with serious interest in this topic should read Knuth’s extensive exposition here. In addition to theoretical discussion, he presents tests including frequency, serial, gap, poker, coupon collector’s, permutation, run, maximum-of-t,
collision, birthday spacings, and serial correlation.

Marsaglia’s DIEHARD suite of statistical test includes birthday spacings, overlapping permutations, ranks of 31×31 and 32×32 matrices, ranks of 6×8 matrices, monkey tests on 20-bit Words,monkey tests OPSO, OQSO, DNA, count the 1’s in a stream of bytes, count the 1’s in specific bytes, parking lot, minimum distance, random spheres, squeeze, overlapping
sums, runs, and craps.

The NIST Statistical Test Suite includes frequency, block frequency, cumulative sums, runs, long runs,Marsaglia’s rank, spectral (based on the Discrete Fourier Transform), nonoverlapping template matchings, overlapping template matchings, Maurer’s universal statistical, approximate entropy (based on the work of Pincus, Singer and Kalman), random
excursions (due to Baron and Rukhin), Lempel-Ziv complexity, linear complexity, and serial.

As for tests that fail, as Knuth mentions when discussing a typical generator in section 3.6 p. 188 “Caution: The numbers generated by ran_array fail the birthday spacings test of Section 3.3.2J, and they have other deficiencies that sometimes show up in high-resolution simulations (see exercises 3.3.2-31 and 3.3.2-35)”.

Searches for Marsaglia (George), works by him or about his PRNG tests, would uncover a lot of information. e.g.,

http://en.wikipedia.org/wiki/Diehard_tests

was found by looking for “marsaglia test”.

PRNGs in most modern software intended to do serious simulation have been
vetted to make sure they pass diverse suites of tests, so you are
unlikely to find a case where one of them fails.

Historically, one famous failure is the PRNG called ‘RANDU’.
When triads of three successive values from this generator are
plotted in the unit cube, one can see that all plot points lie in
a small number of widely separated parallel planes (Lewis, Orav and Uribe, 1986), whereas a satisfactory generator
would place points consistent with random selection throughout the cube.

That is not to say all simulations perform flawlessly.
Typically, PRNGs generate values from
UNIF(0,1). It is possible to abuse values from an excellent generator
when trying to transform to random samples from other distributions, but this is
a fault of the programming–not of the generated values.

Example: Modern software uses the Box-Muller transformation to get
two precisely standard normal random variables from two uniform ones via
log and trig functions. But before doing transcendental arithmetic was as
fast as it is today, a method of getting one standard normal from
12 uniform ones by addition and subtraction was $Z = U_1, \dots, U_{12} – 6$. This method, which
cannot produce values $|Z| > 6$ (rare, but possible), still lingers
in some texts and legacy software.

The following program in R uses this method to generate a
hundred random samples of size 5000 from a “standard normal” distribution,
of which 15 sanples fail the Shapiro-Wilk 5% level test of normality. (R uses the “Mersenne-Twister” PRNG due to Matsumoto and Nishimura (1998), which runs
through $2^{19937} – 1$ values before repeating itself and has been vetted in
tests involving up 624 dimensions without finding results inconsistent
with random behavior.)

set.seed(1234); m = 100; p.val = numeric(m); n = 5000

for(i in 1:m) { z = rowSums(matrix(runif(n*12), nrow=n))-6

p.val[i] = shapiro.test(z)$p.value }

sum(p.val < .05)