%% For double-blind review submission
\documentclass[acmlarge,review,anonymous]{acmart}\settopmatter{printfolios=true}
%% For single-blind review submission
%\documentclass[acmlarge,review]{acmart}\settopmatter{printfolios=true}
%% For final camera-ready submission
%\documentclass[acmlarge]{acmart}\settopmatter{}
%% Note: Authors migrating a paper from PACMPL format to traditional
%% SIGPLAN proceedings format should change 'acmlarge' to
%% 'sigplan,10pt'.
%% Some recommended packages.
\usepackage{booktabs} %% For formal tables:
%% http://ctan.org/pkg/booktabs
\usepackage{subcaption} %% For complex figures with subfigures/subcaptions
%% http://ctan.org/pkg/subcaption
%\newcommand\ITEM[1]{\par\noindent{\bf [#1]}}
\newenvironment{myitemize}{\itemize\divide\topsep by 4 \divide\partopsep by 4 \divide\itemsep by 6 \relax}{\enditemize}
\newdimen\figureheight \figureheight=\textheight \advance\figureheight by -30pt
\makeatletter\if@ACM@journal\makeatother
%% Journal information (used by PACMPL format)
%% Supplied to authors by publisher for camera-ready submission
\acmJournal{PACMPL}
\acmVolume{1}
\acmNumber{1}
\acmArticle{1}
\acmYear{2017}
\acmMonth{1}
\acmDOI{10.1145/nnnnnnn.nnnnnnn}
\startPage{1}
\else\makeatother
%% Conference information (used by SIGPLAN proceedings format)
%% Supplied to authors by publisher for camera-ready submission
\acmConference[PL'17]{ACM SIGPLAN Conference on Programming Languages}{January 01--03, 2017}{New York, NY, USA}
\acmYear{2017}
\acmISBN{978-x-xxxx-xxxx-x/YY/MM}
\acmDOI{10.1145/nnnnnnn.nnnnnnn}
\startPage{1}
\fi
%% Copyright information
%% Supplied to authors (based on authors' rights management selection;
%% see authors.acm.org) by publisher for camera-ready submission
\setcopyright{none} %% For review submission
%\setcopyright{acmcopyright}
%\setcopyright{acmlicensed}
%\setcopyright{rightsretained}
%\copyrightyear{2017} %% If different from \acmYear
%% Bibliography style
\bibliographystyle{ACM-Reference-Format}
%% Citation style
%% Note: author/year citations are required for papers published as an
%% issue of PACMPL.
\citestyle{acmauthoryear} %% For author/year citations
\begin{document}
%% Title information
\title[Better Splittable Pseudorandom Number Generators]{Better Splittable Pseudorandom Number Generators (and~Almost~As~Fast)}
%% [Short Title] is optional;
%% when present, will be used in
%% header instead of Full Title.
%\titlenote{with title note} %% \titlenote is optional;
%% can be repeated if necessary;
%% contents suppressed with 'anonymous'
%\subtitle{Subtitle} %% \subtitle is optional
%\subtitlenote{with subtitle note} %% \subtitlenote is optional;
%% can be repeated if necessary;
%% contents suppressed with 'anonymous'
%% Author information
%% Contents and number of authors suppressed with 'anonymous'.
%% Each author should be introduced by \author, followed by
%% \authornote (optional), \orcid (optional), \affiliation, and
%% \email.
%% An author may have multiple affiliations and/or emails; repeat the
%% appropriate command.
%% Many elements are not rendered, but should be provided for metadata
%% extraction tools.
%% Author with single affiliation.
\author{Guy L. Steele Jr.}
\orcid{0000-0002-1421-3811}
\affiliation{
\position{Software Architect}
\institution{Oracle Labs} %% \institution is required
\streetaddress{35 Network Drive~~UBUR02-313}
\city{Burlington}
\state{MA}
\postcode{01803}
\country{USA}
}
\email{guy.steele@oracle.com} %% \email is recommended
%% Paper note
%% The \thanks command may be used to create a "paper note" ---
%% similar to a title note or an author note, but not explicitly
%% associated with a particular element. It will appear immediately
%% above the permission/copyright statement.
%\thanks{with paper note} %% \thanks is optional
%% can be repeated if necessary
%% contents suppressed with 'anonymous'
%% Abstract
%% Note: \begin{abstract}...\end{abstract} environment must come
%% before \maketitle command
\begin{abstract}
We have tested and analyzed the {\sc SplitMix} pseudorandom number generator algorithm
presented by Steele, Lea, and Flood \citeyear{FAST-SPLITTABLE-PRNG}, and have discovered two additional classes
of gamma values that produce weak pseudorandom sequences. In this paper we
present a modification to the {\sc SplitMix} algorithm
that avoids all three classes of problematic gamma values,
and also a completely new algorithm for splittable pseudorandom number generators,
which we call {\sc TwinLinear}.
Like {\sc SplitMix}, {\sc TwinLinear} provides both a \emph{generate}
operation that returns one (64-bit) pseudorandom value and a \emph{split} operation
that produces a new generator instance that with very high probability
behaves as if statistically independent of all other instances.
Also like {\sc SplitMix}, {\sc TwinLinear} requires no locking or other synchronization
(other than the usual memory fence after instance initialization),
and is suitable for use with {\sc simd} instruction sets
because it has no branches or loops.
The {\sc TwinLinear} algorithm is the result of a systematic exploration
of a substantial space of nonlinear mixing functions that combine the output of
two independent generators of (perhaps not very strong) pseudorandom number sequences.
We discuss this design space and our strategy for exploring it.
We used the PractRand test suite (which has provision for failing fast)
to filter out poor candidates, then used TestU01 BigCrush to verify the quality
of candidates that withstood PractRand.
We present results of analysis and extensive testing
on {\sc TwinLinear} (using both TestU01 and PractRand). Single instances of {\sc TwinLinear}
have no known weaknesses, and {\sc TwinLinear} is significantly more robust than
{\sc SplitMix} against accidental correlation in a multithreaded setting. It is slightly more costly
than {\sc SplitMix} (10 or 11 64-bit arithmetic operations per 64 bits generated, rather than 9)
but has a shorter critical path (5 or 6 operations rather than 8).
We believe that {\sc TwinLinear} is suitable for the same sorts of applications as {\sc SplitMix},
that is, ``everyday'' scientific and machine-learning applications (but not cryptographic applications),
especially when concurrent threads or distributed processes are involved.
\end{abstract}
%% 2012 ACM Computing Classification System (CSS) concepts
%% Generate at 'http://dl.acm.org/ccs/ccs.cfm'.
\begin{CCSXML}
10003752.10010061.10010062
Theory of computation~Pseudorandomness and derandomization
500
10010147.10011777.10011778
Computing methodologies~Concurrent algorithms
500
\end{CCSXML}
\ccsdesc[500]{Theory of computation~Pseudorandomness and derandomization}
\ccsdesc[500]{Computing methodologies~Concurrent algorithms}
%% End of generated code
%% Keywords
%% comma separated list
\keywords{collections, Java, multithreading, object-oriented,
parallel computing, {\sc prng}, pseudorandom, random number generator,
recursive splitting, {\sc rng}, spliterator, splittable data structures, streams} %% \keywords is optional
%% \maketitle
%% Note: \maketitle command must come after title commands, author
%% commands, abstract environment, Computing Classification System
%% environment and commands, and keywords command.
\maketitle
\section{Introduction}\label{sec:intro}
Steele, Lea, and Flood~\citeyear{FAST-SPLITTABLE-PRNG} describe the {\sc SplitMix} algorithm for pseudorandom number generators ({\sc prng}s).
(We~refer readers to Section~1 of their paper for a general description of {\sc prng} algorithms
and the challenges that must be addressed when using them in parallel computations.)
\hbox{A 64-bit} version of the {\sc SplitMix} algorithm was deployed as class {\tt java.util.SplittableRandom} for the Java programming language in JDK8 (Oracle \citeyearNP{JDK8-SPLITTABLERANDOM}).
The basic structure of one instance (in the object-oriented sense) of this algorithm
is an object with a mutable 64-bit integer field {\tt seed} and an immutable odd 64-bit integer parameter ({\tt final} field) {\tt gamma};
to generate a new pseudorandomly chosen 64-bit integer, the object adds {\tt gamma} to {\tt seed}, then
applies a {\it mixing function} to the new value of {\tt seed}. Steele, Lea, and Flood comment on,
and offer some evidence for, the notion that certain choices of the parameter {\tt gamma} might
produce pseudorandom sequences of less than ideal quality. They therefore include code to test whether
a proposed value for {\tt gamma} had sufficient number of {\tt 01} and {\tt 10} pairs in its binary representation;
if it does not, the candidate value is exclusive-{\sc or}'d with the constant {\tt 0xaaaaaaaaaaaaaaaa},
producing a replacement value that is still odd but has a sufficient number of {\tt 01} and {\tt 10} pairs.
Their only comment about this trick relative to the quality
of the sequences produced is ``Testing shows that this appears to be effective.''
Other parts of the paper test the entire 64-bit {\sc SplitMix} algorithm, including this trick to
avoid what we will call {\it weak gamma values}, using the DieHarder test suite~\citep{DIEHARDER}.
We set out to verify the conjecture about weak gamma values and to test the quality of the {\sc SplitMix} algorithm in other ways.
In particular, we wanted to find out whether there were other classes of weak gamma values,
and we wanted to test the algorithm
with the well-known TestU01 BigCrush test suite~\citep{TESTU01,L'Ecuyer:2007:TestU01}.
For additional assurance, we decided also to use the {\tt PractRand} test suite \citep{PRACTRAND}, which is less well known
than TestU01 but has the virtue of ``failing early'' as soon as it detects an undesirable amount of bias.
We also hoped to identify an equally fast but more robust mixing function.
% We constructed a specialized test framework (coded as Java programs plus {\tt make} files)
% to use both PractRand and TestU01 to probe the behavior of {\sc SplitMix} as well as
% several dozen variants. The framework automatically distills the test suite reports and
% uses DataGraph (a Macintosh application) to present the distillations visually. Six
% months of testing using our 16-node CPU cluster (512 threads in all) has produced tens of
% thousands of data points from these measurements. Results include: (a)~Weak gamma values
% of the conjectured class do indeed result in {\sc prng}s whose output has reduced
% statistical quality. (b)~There are two other classes of weak gamma values, not defended
% against by JDK8. (c)~There is another mixing function that can be used as part of {\sc SplitMix}
% that appears to be much less susceptible to one of these two new classes of
% weak gamma values, though the reasons are not yet well understood. (d)~There are some
% {\sc prng} algorithms for which TestU01 BigCrush detects statistical anomalies but
% PractRand does not, and there are other {\sc prng} algorithms for which PractRand detects
% statistical anomalies but TestU01 BigCrush does not.
We present here a technique that can be used in {\sc SplitMix} to defend against all three classes of weak gamma value.
With this modification, we believe that
the {\sc SplitMix} algorithm used in JDK8 for class {\tt SplittableRandom} is satisfactory for many purposes, but still
has three drawbacks: (a)~its overall state space (considering all possible instances) is 127 bits, which may be on the small side
for large-scale applications; (b)~the {tt split} operation takes more time; and (c)~it is possible that there are yet other classes of weak gamma values.
Still, it is the best published design we have seen so far for a reasonably fast and completely splittable pseudorandom number generator.
Our investigations led us to consider testing a broader class of candidate {\sc prng} algorithms.
While there are many papers in the literature on the subject of constructing a {\sc prng} with a long period by combining two or more Linear Congruential Generators (LCGs), almost all such papers use some linear method of combination (presumably because this makes the subsequent mathematical analysis tractable). Typically each LCG is computed using arithmetic modulo some prime number, with a different prime number for each constituent LCG, and then the result are summed modulo some prime number. For example, the well-known MRG32k3a algorithm~\citep{MRG32k3a} has exactly this structure. One drawback of this approach is that an algorithm of this form
does not straightforwardly generate all $2^k$ possible values for a $k$-bit integer with equal probability; instead, one must typically either
settle for an approximation to true uniformity (which is perfectly acceptable for some applications)
or use some iterative method such as rejection sampling.
We present here a new {\sc prng} algorithm framework, {\sc TwinLinear}, in which all arithmetic is performed modulo a power of~2 (typically $2^{64}$), and in which the outputs of two Linear Congruential Generators are combined using a {\it nonlinear} function.
% A single LCG by itself has notorious weaknesses: (i)~the low-order bits of the result are
% not very random at all (numbering the bits with nonnegative indices such that the least
% significant bit has index 0, we may observe that bit $i$ has period $2^{i+1}$), and
% (ii)~pairs of consecutive values, when used to plot points in the x-y plane, tend to
% cluster along a small set of diagonal lines. The {\sc TwinLinear} algorithm compensates
% for this poor behavior by doing three things:
% \par\noindent (1)~using the high 32 bits from each of two 64-bit LCGs to obtain 64 bits with fairly good pseudorandomness;
% \par\noindent (2)~using the high 6 bits of one LCG (which are ``quite random'') to \emph{rotate} this combined
% result (a \emph{nonlinear} combining step borrowed from the {\sc pcg} algorithm~\citep{PCG-PREPRINT,PCG-VIDEO}); and
% \par\noindent (3)~using a final mixing step that is a miniature version of the mixing functions used in {\sc SplitMix}.
% \par\noindent
% The rotation step does not alter the number of 1-bits in a word; the mixing function
% in the third step is intended to address that problem. On the other hand, mixing functions of the form used by {\sc SplitMix}
% do not by themselves sufficiently avoid correlations between multiple generator objects used in parallel;
% testing shows that the rotation step seems to be highly effective in breaking up these correlations.
The result is a generator that is reasonably fast (almost as fast as {\sc SplitMix}, possibly the same speed on some architectures; seems to be completely immune to the problem of weak gamma values; and has a much lower probability of other sorts of unwanted statistical accidents.
Its state space (and therefore its memory footprint) is exactly twice that of {\sc SplitMix}: two mutable integer fields and two immutable odd integer parameters per instance. Part of its speed is attributable to the fact that it does not treat its state as an array, and therefore performs no indexing operations. (This fact makes it easy for a compiler to keep the state in registers during execution of inner loops.)
We present test results to demonstrate that
the {\sc TwinLinear} algorithm has no apparent statistical weaknesses in the case of a single instance,
and a significantly smaller probability than the {\sc SplitMix} algorithm of accidental correlations among multiple instances.
In Section~\ref{sec:splitmix} we review the {\sc SplitMix} algorithm and describe its weaknesses.
In Section~\ref{sec:test-framework} we describe our test framework.
In Section~\ref{sec:testing-splitmix} we report the results of testing the {\sc SplitMix} algorithm with a variety of mixing functions.
In Section~\ref{sec:twinlinear} we describe a new class of {\sc prng} algorithms, {\sc TwinLinear}, and one specific mixing function that appears to work especially well.
In Section~\ref{sec:twinlinear-weaknesses} we discuss potential weaknesses of the {\sc TwinLinear} algorithm.
In Section~\ref{sec:testing-twinlinear} we report the results of testing the {\sc SplitMix} algorithm with a variety of mixing functions.
In Section~\ref{sec:related-work} we compare to related work, and in
Section~\ref{sec:conclusions} we present some conclusions.
\section{The {\sc SplitMix} Algorithm}\label{sec:splitmix}
In Figure~\ref{fig:splitmix} we reproduce the essential parts of the {\sc SplitMix} algorithm (coded in the Java programming language)
given by Steele, Lea, and Flood~\citeyear{FAST-SPLITTABLE-PRNG} in their Figures~16 and~17.
The public constructors ensure that the private constructor is always given an odd value for the parameter {\tt gamma}.
The private method {\tt nextSeed} advances the state of the (additive) sequence generator and returns a generated 64-bit integer.
The public method {\tt nextLong} generates a pseudorandom 64-bit integer by giving the result of {\tt nextSeed} to
the method {\tt mix64}, which implements the 64-bit MurmurHash3 mixing function \citep{MURMURHASH3-FINALIZER}.
The public method {\tt nextDouble} generates a pseudorandom 64-bit floating-point value by using the 53 high-order bits
of an integer generated by {\tt nextLong} as a fixed-point fraction, multiplying it by {\tt DOUBLE{\char'137}ULP}
to produce a result in the range $[0.0, 1.0)$.
The public method {\tt split} returns a new instance of {\tt SplittableRandom} created by choosing ``at random''
an initial seed value, produced by {\tt nextLong}, and a new {\tt gamma} parameter, produced by giving the result
of {\tt nextSeed} to {\tt mixGamma}, which applies a different mixing function ({\tt mix64variant13}, which implements
the 13th mixing function described by \citet{STAFFORD-MIX-VARIANTS}) and then further modifies the value if it is determined
to be potentially weak.
\begin{figure}
\hrule
% \begin{verbatim}
% package java.util;
% import java.util.concurrent.atomic.AtomicLong;
% public final class SplittableRandom {
% private long seed; private final long gamma; // gamma must be an odd integer
% private SplittableRandom(long seed, long gamma) { // The argument "gamma" must be odd
% this.seed = seed; this.gamma = gamma; }
% private long nextSeed() { return (seed += gamma); }
% private static long mix64(long z) {
% z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL;
% z = (z ^ (z >>> 33)) * 0xc4ceb9fe1a85ec53L;
% return z ^ (z >>> 33); }
% private static long mix64variant13(long z) {
% z = (z ^ (z >>> 30)) * 0xbf58476d1ce4e5b9L;
% z = (z ^ (z >>> 27)) * 0x94d049bb133111ebL;
% return z ^ (z >>> 31); }
% private static long mixGamma(long z) {
% z = mix64variant13(z) | 1L;
% int n = Long.bitCount(z ^ (z >>> 1));
% if (n >= 24) z ^= 0xaaaaaaaaaaaaaaaaL;
% return z; } // This result is always odd
% private static final double DOUBLE_ULP = 1.0 / (1L << 53);
% private static final long GOLDEN_GAMMA = 0x9e3779b97f4a7c15L; // Note: this value is odd
% private static final AtomicLong defaultGen = new AtomicLong(initialSeed());
% public SplittableRandom(long seed) { this(seed, GOLDEN_GAMMA); }
% public SplittableRandom() {
% long s = defaultGen.getAndAdd(2 * GOLDEN_GAMMA);
% this.seed = mix64(s); this.gamma = mixGamma(s + GOLDEN_GAMMA); }
% public long nextLong() { return mix64(nextSeed()); }
% public double nextDouble() { return (nextLong() >>> 11) * DOUBLE_ULP; }
% public SplittableRandom split() {
% return new SplittableRandom(mix64(nextSeed()), mixGamma(nextSeed())); }
% }
% \end{verbatim}
\begin{tabbing}
{\tt package\ java.util;} \\
{\tt import\ java.util.concurrent.atomic.AtomicLong;} \\
{\tt public\ final\ class\ SplittableRandom\ {\char'173}} \\
{\tt \ \ private\ long\ seed;\ private\ final\ long\ gamma;\ \ \ \ \ \ \ \ \ \ //\ `gamma' must\ be\ an\ odd\ integer} \\
{\tt \ \ private\ SplittableRandom(long\ seed,\ long\ gamma)\ {\char'173}\ \ \ \ \ //\ The\ argument\ `gamma'\ must\ be\ odd} \\
{\tt \ \ \ \ this.seed\ =\ seed;\ this.gamma\ =\ gamma;\ {\char'175}} \\
{\tt \ \ private\ long\ nextSeed()\ {\char'173}\ return\ (seed\ +=\ gamma);\ {\char'175}} \\
{\tt \ \ private\ static\ long\ mix64(long\ z)\ {\char'173}} \\
{\tt \ \ \ \ z\ =\ (z\ {\char'136}\ (z\ >>>\ 33))\ *\ 0xff51afd7ed558ccdL;} \\
{\tt \ \ \ \ z\ =\ (z\ {\char'136}\ (z\ >>>\ 33))\ *\ 0xc4ceb9fe1a85ec53L;} \\
{\tt \ \ \ \ return\ z\ {\char'136}\ (z\ >>>\ 33);\ {\char'175}} \\
{\tt \ \ private\ static\ long\ mix64variant13(long\ z)\ {\char'173}} \\
{\tt \ \ \ \ z\ =\ (z\ {\char'136}\ (z\ >>>\ 30))\ *\ 0xbf58476d1ce4e5b9L;} \\
{\tt \ \ \ \ z\ =\ (z\ {\char'136}\ (z\ >>>\ 27))\ *\ 0x94d049bb173111ebL;} \\
{\tt \ \ \ \ return\ z\ {\char'136}\ (z\ >>>\ 31);\ {\char'175}} \\
{\tt \ \ private\ static\ long\ mixGamma(long\ z)\ {\char'173}} \\
{\tt \ \ \ \ z\ =\ mix64variant13(z)\ |\ 1L;} \\
{\tt \ \ \ \ int\ n\ =\ Long.bitCount(z\ {\char'136}\ (z\ >>>\ 1));} \\
{\tt \ \ \ \ if\ (n\ >=\ 24)\ z\ {\char'136}=\ 0xaaaaaaaaaaaaaaaaL;} \\
{\tt \ \ \ \ return\ z;\ {\char'175}\ \ \ \ //\ This\ result\ is\ always\ odd} \\
{\tt \ \ private\ static\ final\ double\ DOUBLE{\char'137}ULP\ =\ 1.0\ /\ (1L\ <<\ 53);} \\
{\tt \ \ private\ static\ final\ long\ GOLDEN{\char'137}GAMMA\ =\ 0x9e3779b97f4a7c15L;\ \ //\ Note:\ this\ value\ is\ odd} \\
{\tt \ \ private\ static\ final\ AtomicLong\ defaultGen\ =\ new\ AtomicLong(initialSeed());} \\
{\tt \ \ public\ SplittableRandom(long\ seed)\ {\char'173}\ this(seed,\ GOLDEN{\char'137}GAMMA);\ {\char'175}} \\
{\tt \ \ public\ SplittableRandom()\ {\char'173}} \\
{\tt \ \ \ \ long\ s\ =\ defaultGen.getAndAdd(2\ *\ GOLDEN{\char'137}GAMMA);} \\
{\tt \ \ \ \ this.seed\ =\ mix64(s);\ this.gamma\ =\ mixGamma(s\ +\ GOLDEN{\char'137}GAMMA);\ {\char'175}} \\
{\tt \ \ public\ long\ nextLong()\ {\char'173}\ return\ mix64(nextSeed());\ {\char'175}} \\
{\tt \ \ public\ double\ nextDouble()\ {\char'173}\ return\ (nextLong()\ >>>\ 11)\ *\ DOUBLE{\char'137}ULP;\ {\char'175}} \\
{\tt \ \ public\ SplittableRandom\ split()\ {\char'173}} \\
{\tt \ \ \ \ return\ new\ SplittableRandom(mix64(nextSeed()),\ mixGamma(nextSeed()));\ {\char'175}} \\
{\tt {\char'175}}
\end{tabbing}
\removelastskip
\hrule\vskip-1ex
\caption{Pertinent portions of class {\tt SplittableRandom}}\label{fig:splitmix}
\end{figure}
\section{Two New Classes of Weak Gamma Value for SplitMix}\label{sec:two-new-classes}
We already knew the conjecture of Steele, Lea, and Flood that gamma values
with few {\tt 01} and {\tt 10} pairs might generate sequences so weak that the mixing function could not compensate.
Such gamma values tend to produce sequences in which consecutive values are the same in many bit positions.
From this we inferred a more general conjecture:
having one or more equally spaced subsequences, such that consecutive values within each subsequence are the same in many bit positions,
might give the mixing function the same trouble. This suggested that gamma values of the form $\frac{2^{64}}{k}$ (for some small integer $k$)
might also produce very weak sequences, because most of the high order bits within the overall generated sequence would have a repeating pattern with period $k$, and the mixing function might not be able to mask this correlation.\footnote{Indeed, we now realize that
it suggests that there may be weak gamma values of the form
$\frac{w}{k}$ where $w$ is itself a weak gamma value and $k$ is a small integer. We have not yet thoroughly
tested this even more general idea.}
Our other conjecture was that a weak gamma value might produce sequences such that
consecutive values are transformed by some initial part of the mixing function computation
into values that are the same in many bit positions. This suggested that
gamma values of the form $m(2^{s})+1)$, where $s$ is the shift distance in the first shift-and-{\sc xor} step
of the mixing function, might prove to be weak, because results of the operation {\tt z~{\char'136}~(z~>>>~$s$)} on successive values of the seed will tend to have the same low-order bits, and the remainder of the mixing function might not be able to mask this correlation.
We set out to confirm or disprove these two new conjectures.
\section{Our Testing Framework}\label{sec:test-framework}
We built a small testing framework to control thousands of test runs of multiple {\sc prng} algorithms, using both the TestU01 BigCrush test suite and the PractRand test suite. Nearly all the tests\footnote{\label{note:weak}A very small fraction of the tests were run on a Macintosh Pro with
two 2.8 GHz quad-core Xeon processors. This was done to validate the testing software before reserving time on the big cluster. The results of these initial runs constituted valid measurements and were retained.}
were performed on a cluster of 16 nodes, each with two sockets, each with an E5-2660 2.2Ghz Intel Xeon processor (each having eight cores collectively supporting 16 threads). Therefore 512 threads can execute simultaneously. We made no attempt to parallelize the TestU01 and BigCrush test suites; instead, we used {\tt make} files to generate thousands of jobs at a time. Each {\tt make} file describes one batch of test runs. Each {\tt make} file includes code to find out which of the 16 nodes it is being run on, so that a different subset of the batch of test runs will be run on each node. The use of {\tt make} files allowed a very simple form of crash recovery: simply a matter of re-issuing the {\tt make} command.
Each individual run tests the behavior of one {\sc prng} algorithm, starting it from one specific state and testing the statistical quality of its output stream. While BigCrush and PractRand differ in the kinds of statistical tests they employ and the way they report the results of their analysis, they are alike in four key ways:
\begin{myitemize}
\item There is a simple way to code new {\sc prng} algorithms in C (or C++) and link them into the test suite. (This strategy means there is no I/O overhead for piping the {\sc prng} output stream into the test suite.)
\item Each reports results by printing text to ``standard output''. This report includes statistical information and also an indication of the total amount of CPU time (user execution time) consumed by the test.
\item Each has a command-line interface that allows specification of which {\sc prng} algorithm to test.
\item The same command-line interface does not allow a complete specification of the initial state of the {\sc prng}, but does allow specification of a 64-bit \emph{seed} from which the initial state can be constructed, and the construction code can be user-specified and bundled with the code for the {\sc prng} algorithm itself.
\end{myitemize}
For our purposes, this last point meant that we had to design a series of specifications for using a single 64-bit integer to construct a wide variety of initial states. These specifications are presented in later sections.
The printed output of each individual test run is captured to a separate file. This file contains a date/time stamp at the start and the end (as produced by the Unix {\tt date} command), with the printed output of the test suite in between.
Once a batch of test runs has been executed and their printed reports captured to individual files, a custom ``distillation'' program (coded in Java) goes through the results and reduces each file to a data structure that can be summarized by a small row of numbers. (There are actually two such distillation programs, fairly similar in structure, one for BigCrush and one for PractRand.) The distillation program then writes a master summary file, a set of chart-producing data files, and a script file. The master summary file contains a description of the precise set of test runs in that batch, all of the summary results, and the total amount of user CPU time consumed by all the test runs. Each chart-producing data file is a {\tt csv} (comma-separated values) file containing the data needed to produce one chart. The script file is a Unix shell script that uses the Macintosh application DataGraph to construct one {\tt pdf} file from each of the chart-producing data files.
The overall workflow for a batch of test runs therefore looks something like this:
\begin{myitemize}
\item Execute the {\tt make} file (using the command ``{\tt make -j 32}'') on each node of the cluster.
\item Copy the directory containing all the resulting report files to the Macintosh Pro.
\item Run the distillation program (either ``{\tt java DistillTestU01}'' or ``{\tt java DistillPractRand}''), \hfil\break giving it a directory name (such as ``{\tt twotwin}'') as an argument.
\item Use ``{\tt chmod +x}'' to cause the script file produced by the distillation program to be executable.
\item Run the script file (a typical invocation would be ``{\tt ./PractRand{\char'137}twotwin{\char'137}makegraphs}'').
\end{myitemize}
and the result is a set of {\tt pdf} files, each containing one chart. Examples appear later in this paper.
Each chart is a two-dimensional grid of size up to 60 rows by 40 columns, where each grid entry is a square, roughly $\frac{1}{8}''\times\frac{1}{8}''$,
representing the result of one test run. Thus each chart can report the results of up to 2400 test runs. Different charts are organized in different ways, depending on the specific batch of test runs performed, but for each of the two test suites each individual entry is presented in the same way.
The report from a single test run includes the results of many individual statistical tests (typically many dozens); these results are distilled into a single symbol and/or number to produce a chart entry.
\subsection{Distilling TestU01 BigCrush Reports}
The TestU01 BigCrush test suite runs 106 individual tests~\cite[function {\tt bbattery{\char'137}BigCrush}, pp.\ 148--152]{TESTU01-DOC}, computing 160 test statistics and $p$-values~\citep{L'Ecuyer:2007:TestU01}. A single test run typically prints about 110 kilobytes of information; at the end is either the message ``{\tt All tests were passed}'' or a list of {\it anomalies}, that is, tests whose $p$-values were outside the range $[0.001,0.999]$.
\newbox\quuxbox
\setbox\quuxbox\hbox{\includegraphics{/Users/gls/TestRandom/quux/TestU01_symbol_key.pdf}}
\setbox\quuxbox\hbox to 0pt{\hskip\textwidth\hskip-\parindent
\smash{\hbox to 0pt{\hss\raise 1.3em\hbox{\lower\ht\quuxbox\box\quuxbox}}}\hss}\relax
\leavevmode\parshape 10 0in 4.3in 0in 4.3in 0in 4.3in 0in 4.3in 0in 4.3in 0in 4.3in 0in 4.3in 0in 4.3in 0in 4.3in 0in \textwidth\relax
\box\quuxbox\relax
The distillation software for BigCrush test runs distills the list of anomalies for each test run into a pair of integers $(f, c)$ (a \emph{failure level} and a \emph{count}) in this manner:
If a test run file is missing, then $(f, c) = (-1, 0)$.
If a test run file is present but is incomplete or malformed, then $(f, c) = (-2, 0)$
(this can happen if a test run was terminated before completion).
If a test run file is present and all tests were passed, then $(f, c) = (0, 0)$.
Otherwise, the test run file was present and well-formed but reported one or more anomalies.
Each anomaly is categorized according to $p$-value into one of five failure levels, 1~through~5;
then $f$ is the highest failure level among all anomalies for the test run, and $c$ is the number of anomalies having that highest failure level.
For charting purposes, the pair of integers $(f,c)$ is then reduced to a symbol and/or number.
The
idea is that the
worse the results, the more ink on the page. These symbols were designed to make it easy to eyeball a chart and quickly recognize patterns of success and failure.
% \begin{figure}
% \hrule
% \hbox to \textwidth{\hss(a)\qquad\includegraphics{/Users/gls/TestRandom/quux/TestU01_symbol_key.pdf}\hss\hss
% (b)\qquad\includegraphics{/Users/gls/TestRandom/quux/PractRand_symbol_key.pdf}\hss}
% \hrule\vskip-1ex
% \caption{Symbols used in presenting results of test runs on pseudorandom number generators}\label{fig:symbols}
% \end{figure}
\subsection{Distilling PractRand Reports}
The PractRand test suite runs for an indefinite amount of time, normally producing intermediate reports after processing $2^m$ bytes of generated pseudorandom values for all integer values of $m$ starting with $m=27$. (We chose to provide command-line arguments that cause additional reports to be produced after processing
$0.375\times 2^{40}$, $0.75\times 2^{40}$, $1.25\times 2^{40}$, $1.5\times 2^{40}$, $1.75\times 2^{40}$, $2.25\times 2^{40}$, $2.5\times 2^{40}$, $2.75\times 2^{40}$, $3\times 2^{40}$, $3.25\times 2^{40}$, $3.5\times 2^{40}$, and $3.75\times 2^{40}$ bytes. We also provide command-line arguments that terminate the test run either after the first report that prints ``{\tt FAIL}'' or after testing 4 terabytes of data, whichever comes first.) For a report produced after processing $2^m$ bytes of generated pseudorandom values, PractRand computes $4m-56$ separate statistics; thus the first report (for $m=27$) reports 52 test results, and the report for $m=42$ (4 terabytes) reports 112 test results.
\setbox\quuxbox\hbox{\includegraphics{/Users/gls/TestRandom/quux/PractRand_symbol_key.pdf}}
\setbox\quuxbox\hbox to 0pt{\hskip\textwidth\hskip-\parindent
\smash{\hbox to 0pt{\hss\raise 0.3em\hbox{\lower\ht\quuxbox\box\quuxbox}}}\hss}\relax
\leavevmode\parshape 1 0in 3.9in\relax
\box\quuxbox\relax
A single test run that gets all the way to 4 terabytes typically prints about 5 kilobytes of information.
For each anomaly reported, PractRand prints not only a $p$-value but also a word or phrase describing that $p$-value;
in increasing order of severity, they are
{\tt unusual},
{\tt suspicious},
{\tt SUSPICIOUS},
{\tt very suspicious},
{\tt VERY SUSPICIOUS}, and
{\tt FAIL}. (It may further print a varying number of exclamation points after the word ``{\tt FAIL}'' but we chose to ignore those: failure is failure.)
We relied on these nonnumerical descriptions in distilling the reports.
\leavevmode\parshape 1 0in 3.9in\relax
The distillation software for PractRand test runs distills a set of anomalies into a pair of integers $(f, c)$ (a \emph{failure level} and a \emph{count}) in a manner not too different from the strategy used for BigCrush; these are then similarly reduced to a symbol and/or number.
Again, the idea is that the worse the results, the more ink on the page.
\section{Testing the SplitMix Algorithm}\label{sec:testing-splitmix}
We tested the {\sc SplitMix} algorithm using both BigCrush and PractRand,
and using a range of gamma values. We also tested variants of the {\sc SplitMix} algorithm that use other mixing functions.
Our three goals: (a)~comparing the strengths and weaknesses of
BigCrush and PractRand as test suites, (b)~comparing the strengths and weaknesses of the various mixing functions,
and (c)~determining whether the three classes of gamma values are indeed weak.
The BigCrush test suite is oriented toward testing double-precision floating-point values rather than 64-bit integers. Therefore we always used BigCrush in three different modes,
indicated by the letters {\bf f}, {\bf r}, and {\bf u}:
\begin{description}
% \item[f~~] Each generated 64-bit integer value is used to produce one IEEE double value by using the high 53 bits of the 64-bit integer value.
% \item[r~~] Each generated 64-bit integer value is used to produce one IEEE double value by using the high 53 bits of the {\it reverse} of the 64-bit integer value (so that, for example, the low-order bit of the 64 becomes the high-order bit of the 53).
% \item[u~~] Each generated 64-bit integer value is used to produce two IEEE double values by using first the low 32 bits, and then the high 32 bits, of the 64-bit integer value, adding 21 lower-order 0-bits to each to make 53.
\item[f~~] The high 53 bits of each generated 64-bit integer value is used to produce one double value.
\item[r~~] The reverse of the low 53 bits of each generated 64-bit integer value is used to produce one double value.
\item[u~~] Each generated 64-bit integer value is used to produce two double values by using first the low 32 bits, and then the high 32 bits, of the 64-bit integer value, adding 21 low-order 0-bits to each to make 53.
\end{description}
As it turned out, results were similar across all three modes. In this paper, we present data for only the {\bf u} mode.
The PractRand test suite is oriented toward testing 64-bit integer values, and includes tests specifically designed to probe weakness in the low-order bits, so we used this test suite directly on the generated 64-bit values.
We considered mixing functions of this form:
\begin{verbatim}
ulonglong SplittableRandom_name_Mixer(ulonglong z) {
z = (z ^ (z >> s1)) * m1;
z = (z ^ (z >> s2)) * m2;
return z ^ (z >> s3); }
\end{verbatim}
\begin{figure}
\hrule
\begin{center}
\begin{tabular}{@{}l@{\qquad}r@{\qquad}r@{\qquad}r@{\qquad}r@{\qquad}r@{}}
name & {\tt s1} & {\tt s2} & {\tt s3} & \multicolumn{1}{c}{\tt m1} & \multicolumn{1}{c}{\tt m2} \\ \hline
{\tt murmurhash3} & {\tt 33} & {\tt 33} & {\tt 33} & {\tt 0xff51afd7ed558ccdull} & {\tt 0xc4ceb9fe1a85ec53ull} \\
{\tt lecuyer32} & {\tt 32} & {\tt 32} & {\tt 32} & {\tt 0x106689d45497fdb5ull} & {\tt 0x3b91f78bdac4c89dull} \\
{\tt anneal32a} & {\tt 32} & {\tt 32} & {\tt 32} & {\tt 0xd6b0153091232c93ull} & {\tt 0x2e4df9428a87832dull} \\
{\tt anneal32b} & {\tt 32} & {\tt 32} & {\tt 32} & {\tt 0xd753249aa6fce2c7ull} & {\tt 0xd9a9f6e3314b8bb5ull} \\
{\tt anneal32c} & {\tt 32} & {\tt 32} & {\tt 32} & {\tt 0x387310ae4936362full} & {\tt 0xf9a9476328e05711ull} \\
{\tt lea} & {\tt 32} & {\tt 32} & {\tt 32} & {\tt 0xdaba0b6eb09322e3ull} & {\tt 0xdaba0b6eb09322e3ull} \\
{\tt stafford01} & {\tt 31} & {\tt 27} & {\tt 33} & {\tt 0x7fb5d329728ea185ull} & {\tt 0x81dadef4bc2dd44dull} \\
{\tt stafford02} & {\tt 33} & {\tt 31} & {\tt 31} & {\tt 0x64dd81482cbd31d7ull} & {\tt 0xe36aa5c613612997ull} \\
{\tt stafford03} & {\tt 31} & {\tt 30} & {\tt 33} & {\tt 0x99bcf6822b23ca35ull} & {\tt 0x14020a57acced8b7ull} \\
{\tt stafford04} & {\tt 33} & {\tt 28} & {\tt 32} & {\tt 0x62a9d9ed799705f5ull} & {\tt 0xcb24d0a5c88c35b3ull} \\
{\tt stafford05} & {\tt 31} & {\tt 29} & {\tt 30} & {\tt 0x79c135c1674b9addull} & {\tt 0x54c77c86f6913e45ull} \\
{\tt stafford06} & {\tt 31} & {\tt 27} & {\tt 30} & {\tt 0x69b0bc90bd9a8c49ull} & {\tt 0x3d5e661a2a77868dull} \\
{\tt stafford07} & {\tt 30} & {\tt 26} & {\tt 32} & {\tt 0x16a6ac37883af045ull} & {\tt 0xcc9c31a4274686a5ull} \\
{\tt stafford08} & {\tt 30} & {\tt 28} & {\tt 31} & {\tt 0x294aa62849912f0bull} & {\tt 0x0a9ba9c8a5b15117ull} \\
{\tt stafford09} & {\tt 32} & {\tt 29} & {\tt 32} & {\tt 0x4cd6944c5cc20b6dull} & {\tt 0xfc12c5b19d3259e9ull} \\
{\tt stafford10} & {\tt 30} & {\tt 32} & {\tt 33} & {\tt 0xe4c7e495f4c683f5ull} & {\tt 0xfda871baea35a293ull} \\
{\tt stafford11} & {\tt 27} & {\tt 28} & {\tt 32} & {\tt 0x97d461a8b11570d9ull} & {\tt 0x02271eb7c6c4cd6bull} \\
{\tt stafford12} & {\tt 29} & {\tt 26} & {\tt 33} & {\tt 0x3cd0eb9d47532dfbull} & {\tt 0x63660277528772bbull} \\
{\tt stafford13} & {\tt 30} & {\tt 27} & {\tt 31} & {\tt 0xbf58476d1ce4e5b9ull} & {\tt 0x94d049bb133111ebull} \\
{\tt stafford14} & {\tt 30} & {\tt 29} & {\tt 31} & {\tt 0x4be98134a5976fd3ull} & {\tt 0x3bc0993a5ad19a13ull}
\end{tabular}
\end{center}
\hrule\vskip-1ex
\caption{Parameters for mixing functions used while testing the {\sc SplitMix} algorithm}\label{fig:splitmix-mixers}
\end{figure}
We considered twenty sets of values for the parameters {\tt s1}, {\tt s2}, {\tt s3}, {\tt m1}, and {\tt m2},
as shown in Fig.~\ref{fig:splitmix-mixers}. These include the 64-bit MurmurHash3 mixing function \citep{MURMURHASH3-FINALIZER}
and the 14 mixing functions described by \citet{STAFFORD-MIX-VARIANTS}.
For each set of values, we used either no rotation step, or a left-rotation step (adding ``{\tt rotl}'' to the name):
\begin{verbatim}
ulonglong lh = z & 0x00000000FFFFFFFFull;
ulonglong hh = z & 0xFFFFFFFF00000000ull;
z = hh | ((((lh << 32) | lh) << ((z >> 32) & 0x1f)) >> 32);
\end{verbatim}
or a right-rotation step (adding ``{\tt rotr}'' to the name):
\begin{verbatim}
ulonglong lh = z & 0x00000000FFFFFFFFull;
ulonglong hh = z & 0xFFFFFFFF00000000ull;
z = hh | ((((lh << 32) | lh) >> ((z >> 32) & 0x1f)) & 0x00000000FFFFFFFFull);
\end{verbatim}
where in each of these rotation steps, the low 32 bits of {\tt z} are rotated by an amount determined by the low 5 bits of the high 32 bits of {\tt z}.
(We were inspired by \citet{PCG-PREPRINT} to try the rotation variants.)
We also studied 18 other mixing functions that were produced by an alternate simulated annealing process.
They are all similar in structure to the function shown above, but omit the shift-{\sc xor} step that uses {\tt s1}.
In addition, the five functions
{\tt anneal1LLXX} through {\tt anneal5LLXX} add left-rotation steps both before and after the multiplication by {\tt m1};
the six functions {\tt anneal1LXLX} through {\tt anneal6LXLX} add a left-rotation step before the multiplication by {\tt m1}
and before the multiplication by {\tt m2}; and the seven functions
{\tt anneal1XLX} through {\tt anneal7XLX} add a left-rotation step only before the multiplication by {\tt m2}.
For every mixer with ``{\tt anneal}'' in its name,
the parameters {\tt s1} (if relevant), {\tt s2}, {\tt s3}, {\tt m1}, and {\tt m2} were determined by a simulated-annealing search.
\begin{figure}
\hrule
\begin{center}
\begin{tabular}{@{}l@{\hskip 2em}l@{\hskip 2em}l@{}}
$\hbox{\tt 8000000000000001}=(1/2)2^{64}$ & $\hbox{\tt 1745D1745D1745D1}=(1/11)2^{64}$ & $\hbox{\tt 6DB6DB6DB6DB6DB7}=(3/7)2^{64}$ \\
$\hbox{\tt 5555555555555555}=(1/3)2^{64}$ & $\hbox{\tt 1555555555555555}=(1/12)2^{64}$ & $\hbox{\tt 9249249249249249}=(4/7)2^{64}$ \\
$\hbox{\tt 4000000000000001}=(1/4)2^{64}$ & $\hbox{\tt 1381381381381381}=(1/13)2^{64}$ & $\hbox{\tt 45D1745D1745D175}=(3/11)2^{64}$ \\
$\hbox{\tt 3333333333333333}=(1/5)2^{64}$ & $\hbox{\tt 1249249249249249}=(1/14)2^{64}$ & $\hbox{\tt 5D1745D1745D1745}=(4/11)2^{64}$ \\
$\hbox{\tt 2AAAAAAAAAAAAAAB}=(1/6)2^{64}$ & $\hbox{\tt 1111111111111111}=(1/15)2^{64}$ & $\hbox{\tt 3B13B13B13B13B13}=(3/13)2^{64}$ \\
$\hbox{\tt 2492492492492493}=(1/7)2^{64}$ & $\hbox{\tt 1000000000000001}=(1/16)2^{64}$ & $\hbox{\tt 3813813813813813}=(23/105)2^{64}$ \\
$\hbox{\tt 2000000000000001}=(1/8)2^{64}$ & $\hbox{\tt 0F0F0F0F0F0F0F0F}=(1/17)2^{64}$ & $\hbox{\tt DB8DB8DB8DB8DB8D}=(3512/4095)2^{64}$ \\
$\hbox{\tt 1C71C71C71C71C71}=(1/9)2^{64}$ & $\hbox{\tt 0E38E38E38E38E39}=(1/18)2^{64}$ \\
$\hbox{\tt 1999999999999999}=(1/10)2^{64}$ & $\hbox{\tt 0D79435E50D79435}=(1/19)2^{64}$ \\
\end{tabular}
\end{center}
\hrule\vskip-1ex
\caption{Gamma values for testing: Fractional group}\label{fig:gamma-fractional}
\end{figure}
\begin{figure}
\hrule
\begin{center}
% \begin{tabular}{@{}l@{\hskip 2em}l@{\hskip 2em}l@{\hskip 2em}l@{}}
% {\tt 0000000000000001} & {\tt 0000002000000001} & {\tt 2000000100000001} & {\tt 4000008000000001} \\
% {\tt 0000000000000003} & {\tt 0000004000000001} & {\tt 2000000200000001} & {\tt 5000000000000001} \\
% {\tt 0000000000000005} & {\tt 0000008000000001} & {\tt 2000000400000001} & {\tt 5500000000000001} \\
% {\tt 0000000000000009} & {\tt 0000010000000001} & {\tt 2000000800000001} & {\tt 5550000000000001} \\
% {\tt 0000000000FFFFFF} & {\tt 0000FFFFFFFF0001} & {\tt 2000001000000001} & {\tt 5555000000000001} \\
% {\tt 0000000008000001} & {\tt 0001000100000001} & {\tt 2000002000000001} & {\tt 5555500000000001} \\
% {\tt 0000000010000001} & {\tt 0001000200000001} & {\tt 2000004000000001} & {\tt FFFF7F7FFFF7FFF7} \\
% {\tt 0000000020000001} & {\tt 0001000400000001} & {\tt 2000008000000001} & {\tt FFFFFF0000000001} \\
% {\tt 0000000040000001} & {\tt 0001000800000001} & {\tt 4000000100000001} & {\tt FFFFFF7FFFF7FFF7} \\
% {\tt 0000000080000001} & {\tt 0001001000000001} & {\tt 4000000200000001} & {\tt FFFFFFFFFFF7FFF7} \\
% {\tt 0000000100000001} & {\tt 0001002000000001} & {\tt 4000000400000001} & {\tt FFFFFFFFFFFFFFF7} \\
% {\tt 0000000200000001} & {\tt 0001004000000001} & {\tt 4000000800000001} & {\tt FFFFFFFFFFFFFFFB} \\
% {\tt 0000000400000001} & {\tt 0001008000000001} & {\tt 4000001000000001} & {\tt FFFFFFFFFFFFFFFD} \\
% {\tt 0000000800000001} & {\tt 0040000000000001} & {\tt 4000002000000001} & {\tt FFFFFFFFFFFFFFFF} \\
% {\tt 0000001000000001} & {\tt 0040000008000001} & {\tt 4000004000000001}
% \end{tabular}
\begin{tabular}{@{}l@{\hskip 1.35em}l@{\hskip 1.35em}l@{\hskip 1.35em}l@{\hskip 1.35em}l@{}}
{\tt 0000000000000001} & {\tt 0000000400000001} & {\tt 0001001000000001} & {\tt 2000004000000001} & {\tt 5550000000000001} \\
{\tt 0000000000000003} & {\tt 0000000800000001} & {\tt 0001002000000001} & {\tt 2000008000000001} & {\tt 5555000000000001} \\
{\tt 0000000000000005} & {\tt 0000001000000001} & {\tt 0001004000000001} & {\tt 4000000100000001} & {\tt 5555500000000001} \\
{\tt 0000000000000009} & {\tt 0000002000000001} & {\tt 0001008000000001} & {\tt 4000000200000001} & {\tt FFFF7F7FFFF7FFF7} \\
{\tt 0000000000FFFFFF} & {\tt 0000004000000001} & {\tt 0040000000000001} & {\tt 4000000400000001} & {\tt FFFFFF0000000001} \\
{\tt 0000000008000001} & {\tt 0000008000000001} & {\tt 0040000008000001} & {\tt 4000000800000001} & {\tt FFFFFF7FFFF7FFF7} \\
{\tt 0000000010000001} & {\tt 0000010000000001} & {\tt 2000000100000001} & {\tt 4000001000000001} & {\tt FFFFFFFFFFF7FFF7} \\
{\tt 0000000020000001} & {\tt 0000FFFFFFFF0001} & {\tt 2000000200000001} & {\tt 4000002000000001} & {\tt FFFFFFFFFFFFFFF7} \\
{\tt 0000000040000001} & {\tt 0001000100000001} & {\tt 2000000400000001} & {\tt 4000004000000001} & {\tt FFFFFFFFFFFFFFFB} \\
{\tt 0000000080000001} & {\tt 0001000200000001} & {\tt 2000000800000001} & {\tt 4000008000000001} & {\tt FFFFFFFFFFFFFFFD} \\
{\tt 0000000100000001} & {\tt 0001000400000001} & {\tt 2000001000000001} & {\tt 5000000000000001} & {\tt FFFFFFFFFFFFFFFF} \\
{\tt 0000000200000001} & {\tt 0001000800000001} & {\tt 2000002000000001} & {\tt 5500000000000001}
\end{tabular}
\end{center}
\hrule\vskip-1ex
\caption{Gamma values for testing: Sparse group}\label{fig:gamma-sparse}
\end{figure}
\subsection{The Main Tests}
In the main tests (named {\tt maintests}), we tested several different groups of gamma values. The first group (the \emph{fractional group}, Fig.~\ref{fig:gamma-fractional}) contains values of the form $\frac{2^{64}}{k}$ or $\frac{j2^{64}}{k}$ for small integers $k$ and $j$,
with the low-order bit forced to be 1 (so that the gamma value will be odd, as required by the {\sc SplitMix} algorithm). Note that all gamma values are given in hexadecimal. (The last two values, for which $j$ and $k$ are not very small after all, are control values included for testing purposes.)
The second group (the \emph{sparse group}, Fig.~\ref{fig:gamma-sparse}) contains values that have relatively few {\tt 01} or {\tt 10} transitions.
The third group (the \emph{shift group}, Fig.~\ref{fig:gamma-shift}) contains values of the form $m(2^s+1)$ for $s$ ranging from $26$ to $33$.
The fourth group (the \emph{control group}, Fig.~\ref{fig:gamma-control}) contains values thought to be ``truly random''; they are 64-bit values obtained from HotBits~\citep{HOTBITS}.
Sample results for BigCrush are shown in Figures~\ref{fig:TestU01_maintests_u_splitmix_Seq_graph_0} and
\ref{fig:TestU01_maintests_u_splitmix_Seq_graph_1}. The X-axis lists mixing functions tested,
and the Y-axis lists gamma values.
In Figure~\ref{fig:TestU01_maintests_u_splitmix_Seq_graph_0} we see that indeed most of the mixers have trouble with the fractional group
(except for the two control values $(23/105)2^{64}$ and $(3512/4095)2^{64}$,
for which {\tt murmurhash3} through {\tt stafford14} are just fine).
It is interesting that {\tt stafford05}, {\tt stafford06}, and {\tt anneal1LXLX} through {\tt anneal6LXLX} do well with all gamma values
in the fractional group except those that are also sparse. All mixers have trouble with gamma values in the sparse group.
As expected, the shift group does indeed cause problems for some of the mixers, including the principal one used in JDK8 ({\tt murmurhash3}),
but not for {\tt stafford05}, {\tt stafford06}, {\tt stafford10}, {\tt stafford11}, {\tt stafford13}, {\tt stafford14},
{\tt anneal1LLXX} through {\tt anneal5LLXX}, and {\tt anneal1LXLX} through {\tt anneal6LXLX}.
Mixers {\tt murmurhash3} through {\tt stafford14} do perfectly well with the control group.
Note that these two charts contains areas for which data files were missing or malformed;
we chose not to spend machine time to fill in these blanks because we had already learned
what we wanted to know: we had confirmed that all three conjectured classes of weak gamma values
can cause {\tt SplitMix} to fail the BigCrush test suite.
(Partial results from PractRand provide further confirmation;
see Figures~\ref{fig:PractRand_maintests_splitmix_Seq_graph_0} and \ref{fig:PractRand_maintests_splitmix_Seq_graph_1}
in Appendix~\ref{app:charts}.)
\subsection{The Variant Tests}
The variant tests (named {\tt variants}) probe the behavior for gamma values surrounding values
of the form $\frac{2^{64}}{k}$ or $\frac{j2^{64}}{k}$ for small integers $k$ and $j$.
For each value $g$ in Fig.~\ref{fig:gamma-variants},
the variant tests include a set of gamma values of the form $((g \pm 2^s) \mid 1)$ for $1 \leq s \leq 8$,
to test a conjecture that the weakness of gamma values declines as they become more distant
from members of the fractional group.
Sample results for BigCrush are shown in Figure~\ref{fig:TestU01_variants_u_splitmix_Seq_graph_3}, and
corresponding results for PractRand are shown in Figure~\ref{fig:PractRand_variants_splitmix_Seq_graph_3},
both in Appendix~\ref{app:charts}. The predicted effect was indeed observed, and test results were worse
for values surrounding fractions with very small denominators $k$. The PractRand results also suggest
that while fractional gamma values and values near them can be problematic, they are not as bad as sparse gamma values:
PractRand does not declare failure for fractional gamma values such as in Fig.~\ref{fig:gamma-fractional}
until over 128~GB of generated values have been tested, and does not declare failure for
nearby variant values until 4~TB of generated values have been tested.
\begin{figure}
\hrule
\begin{center}
\begin{tabular}{@{}l@{\qquad\qquad}l@{}}
{\tt 00000132D4004CB5}\quad shift distance 26 & {\tt 0000132D40004CB5}\quad shift distance 30 \\
{\tt 000003A09C00E827}\quad shift distance 26 & {\tt 00003A09C000E827}\quad shift distance 30 \\
{\tt 00000265A8004CB5}\quad shift distance 27 & {\tt 0000265A80004CB5}\quad shift distance 31 \\
{\tt 000007413800E827}\quad shift distance 27 & {\tt 000074138000E827}\quad shift distance 31 \\
{\tt 000004CB50004CB5}\quad shift distance 28 & {\tt 00004CB500004CB5}\quad shift distance 32 \\
{\tt 00000E827000E827}\quad shift distance 28 & {\tt 0000E8270000E827}\quad shift distance 32 \\
{\tt 00000996A0004CB5}\quad shift distance 29 & {\tt 0000996A00004CB5}\quad shift distance 33 \\
{\tt 00001D04E000E827}\quad shift distance 29 & {\tt 0001D04E0000E827}\quad shift distance 33
\end{tabular}
\end{center}
\hrule\vskip-1ex
\caption{Gamma values for testing: Shift group}\label{fig:gamma-shift}
\end{figure}
\begin{figure}
\hrule
\begin{center}
\begin{tabular}{@{}l@{\hskip 2em}l@{\hskip 2em}l@{\hskip 2em}l@{}}
{\tt 0A18B8E7AC904503} &
{\tt 9E13DEEA6A5D1D9B} &
{\tt BF56F43B89525AA1} &
{\tt 63F304E7AA9C5BFD}
\end{tabular}
\end{center}
\hrule\vskip-1ex
\caption{Gamma values for testing: Sparse group}\label{fig:gamma-control}
\end{figure}
\begin{figure}
\hrule
\begin{center}
\begin{tabular}{@{}l@{\hskip 2em}l@{\hskip 2em}l@{}}
$\hbox{\tt 5555555555555555}=(1/3) 2^{64}$ & $\hbox{\tt 0F0F0F0F0F0F0F0F}=(1/17) 2^{64}$ & $\hbox{\tt 6DB6DB6DB6DB6DB7}=(3/7) 2^{64}$ \\
$\hbox{\tt 3333333333333333}=(1/5) 2^{64}$ & $\hbox{\tt 0D79435E50D79435}=(1/19) 2^{64}$ & $\hbox{\tt 9249249249249249}=(4/7) 2^{64}$ \\
$\hbox{\tt 2492492492492492}=(1/7) 2^{64}$ & $\hbox{\tt 0C30C30C30C30C30}=(1/21) 2^{64}$ & $\hbox{\tt 45D1745D1745D175}=(3/11) 2^{64}$ \\
$\hbox{\tt 1C71C71C71C71C71}=(1/9) 2^{64}$ & $\hbox{\tt 0B21642C8590B216}=(1/23) 2^{64}$ & $\hbox{\tt 5D1745D1745D1745}=(4/11) 2^{64}$ \\
$\hbox{\tt 1745D1745D1745D1}=(1/11) 2^{64}$ & $\hbox{\tt 0A3D70A3D70A3D70}=(1/25) 2^{64}$ & $\hbox{\tt 3B13B13B13B13B13}=(3/13) 2^{64}$ \\
$\hbox{\tt 1381381381381381}=(1/13) 2^{64}$ & $\hbox{\tt 097B425ED097B425}=(1/27) 2^{64}$ & $\hbox{\tt 4EC4EC4EC4EC4EC4}=(4/13) 2^{64}$ \\
$\hbox{\tt 1111111111111111}=(1/15) 2^{64}$ & $\hbox{\tt 08D3DCB08D3DCB08}=(1/29) 2^{64}$
\end{tabular}
\end{center}
\hrule\vskip-1ex
\caption{Gamma values for variant testing are $((g \pm 2^s) \mid 1)$ where $g$ is a value in this table}\label{fig:gamma-variants}
\end{figure}
\subsection{The Round-Robin Tests}
There are two sets of \emph{round-robin} tests (named {\tt rr8d} and {\tt rr8c}).
In each set of tests, rather than using just one instance of the {\sc SplitMix} algorithm,
eight instances are used, and their results are used in round-robin fashion;
that is, the eight output streams are interleaved to produce a single output stream,
with each instance contributing every eighth result.
The \emph{distant} round-robin tests {\tt rr8d} use eight instances whose gamma values
are actually generated randomly (using the given putative ``gamma value'' as a seed)
and therefore should be quite distant from each other in the sense of Hamming distance;
that is, any two of them differ in many bit positions.
The \emph{close} round-robin tests {\tt rr8c} use eight instances whose initial seed values are identical
and whose gamma values
are identical except for at most three bits; the eight gamma values are generated from the given gamma value
by systematically replacing the three bits corresponding to the 1-bits in the mask {\tt 000000000000000E}
with all eight possible patterns for those three bits.
Sample results for PractRand are shown in Figures~\ref{fig:PractRand_maintests_splitmix_rr8d_graph_0}
and~\ref{fig:PractRand_maintests_splitmix_rr8c_graph_0},
both in Appendix~\ref{app:charts}. Perhaps unsurprisingly, no problems are found with the distant
round-robin tests. The close round-robin tests fail in some cases---though not all---when the given
gamma value is already weak, but never fail when the given gamma value is not weak.
\subsection{Conclusions Regarding the SplitMix Algorithm}
We have confirmed that there are three distinct classes of weak gamma values for the {\sc SplitMix algorithm}
(and, as we indicated in footnote~ref{note:weak}, speculated that there is a fourth).
They are easily defended against: if multiplying the proposed gamma value by any odd integer (including 1) that is smaller than (say)~32
causes it either to be sparse (having fewer than 24 {\tt 01} and {\tt 10} transitions)
or to have the property that
the first shift-and-{\sc xor} step of the mixing function causes too many (say more than $\frac{3s}{4}$)
of the $s$ low-order bits to be zero, then reject that candidate and try again.
Doing this may ensure that each individual instance of {\sc SplitMix} produces a good pseudorandom sequence;
but it does have some cost.
The round-robin tests give some confidence
that a modest collection of {\sc SplitMix} instances with randomly chosen gamma values
is very likely to behave as if they were statistically independent.
However, because there are ``only'' $2^{63}$ distinct choices of gamma value,
if the size of the collection is very large, the likelihood of all their gamma values being distinct may
not be comfortably small.
\section{The TwinLinear Algorithm}\label{sec:twinlinear}
At this point in our investigations we reasoned that perhaps there would be less burden
on the mixing function if a better initial sequence generator were used.
We took inspiration from two sources: from the MRG32k3a algorithm \citep{MRG32k3a} we took the idea of
using multiple linear congruential generators as initial sources,
and from {\sc pcg} \citep{PCG-PREPRINT} we borrowed the idea of using rotate instructions as
a part of nonlinear mixing.
The idea behind the {\sc TwinLinear} framework is to use just two linear congruential generators and mix their outputs.
It is well known that the low-order bits of {\sc lcg} output are not very random, but the high-order bits are fairly good.
Therefore we first mix the two outputs by using bitwise {\sc xor} to combine them \emph{after}
rotating one of them so as to swap its halves. The second step is to rotate this result by a ``random'' distance
determined by the highest-order bits of one {\sc lcg} output. The third step is to further mix the bits
of this rotated result by using a multiply step and a shift-and-{\sc xor} step.
The specific case that we have tested thoroughly uses 64-bit arithmetic.
It has 254 bits of internal state in the form of four 64-bit integers,
two of which are required to be odd. We will call the four 64-bit integers s1, s2, g1, and g2;
g1 and g2 must be odd. Once values have been chosen for s1 and s2 and g1 and g2 for
any instance of the {\sc prng}, s1 and s2 represent mutable state that may be altered
whenever a \emph{generate} or \emph{split} operation is performed, but g1 and g2, once chosen, are
unchanging for that instance. Thus, as with {\sc SplitMix}, one may regard instances of
a {\sc TwinLinear} class as members of a {\sc prng} \emph{family},
distinguished by parameters g1 and g2.
In addition, we make use of three 64-bit fixed integer constants a1, a2, and a3
that are identical for all instances.
For these we have chosen (guided by the tables provided by~\citet{LECUYER-TABLES}) the values
\begin{flushleft}
\qquad\qquad a1 = 3202034522624059733
\qquad a2 = 3935559000370003845
\qquad a3 = 2685821657736338717 \\
\end{flushleft}
The {\sc TwinLinear} algorithm uses two distinct linear congruential generators,
one whose state is s1 with parameter g1, and one whose state is s2 with parameter g2.
For every \emph{generate} step of the overall {\sc TwinLinear} algorithm,
each of the two linear congruential generators is used to generate a 64-bit value,
and then the two 64-bit values are mixed to produce a single 64-bit result.
The overall technique
for performing a \emph{generate} operation is described by this pseudocode:
\begin{tabbing}
\relax$\langle$1$\rangle$\qquad r := (s1 {\sc rotateleft} 32) {\sc xor} s2\\
\relax$\langle$2$\rangle$\qquad r := r {\sc rotateleft} (s1 {\sc shiftright} 58) \\
\relax$\langle$3$\rangle$\qquad r := r $\times$ a3 \\
\relax$\langle$4$\rangle$\qquad s1 := (a1 $\times$ s1 + g1) mod $2^{64}$ \\
\relax$\langle$5$\rangle$\qquad s2 := (a2 $\times$ s2 + g2) mod $2^{64}$ \\
\relax$\langle$6$\rangle$\qquad return (r {\sc xor} (r {\sc shiftright} 32))
\end{tabbing}
Line~$\langle$4$\rangle$ advances the state of the first linear congruential generator;
line~$\langle$5$\rangle$ likewise advances the state of the second linear congruential generator.
These are placed {\it after} the uses of s1 and s2 so that their execution may be overlapped
or interleaved with other computation.
Line~$\langle$1$\rangle$ combines s1 and s2 \emph{nonlinearly} by first permuting the
bits of s1 (using a {\sc rotateleft} operation) and then using a bitwise {\sc xor} operation
on the result of the {\sc rotateleft} operation and s2, to produce a new value r.
Line~$\langle$2$\rangle$ performs a further nonlinear combination step by using the high 6 bits of s1
(obtained by using a {\sc shiftright} operation on s1 for a distance of 58 bit positions)
to determine a number of bit positions by which r is to be rotated by a {\sc rotateleft} operation.
Lines~$\langle$3$\rangle$ and~$\langle$6$\rangle$ accomplish a final mixing on the value of r by first
multiplying it by a3 and then performing a standard \emph{xorshift} step.
Note that line~$\langle$6$\rangle$ contains two operations {\sc xor} and {\sc shiftright},
but on architectures that allow direct addressing of the two halves of a 64-bit register
as if they were two 32-bit registers, it may be possible to implement the computation
in line~$\langle$6$\rangle$ as a single 32-bit {\sc xor} instruction.
\begin{figure}
\hrule
% \begin{verbatim}
% class TwinLinear {
% private long s1, s2; private final long g1, g2; // `g1' and `g2' must be odd
% public TwinLinear(long s1, long s1, long g1, long g2) {
% this.s1 = s1; this.s2 = s2; this.g1 = g1 | 1; this.g2 = g2 | 1; }
% public long nextLong() {
% long r = Long.rotateLeft(s1, 32) ^ s2; long t = s1 >>> 58;
% r = Long.rotateLeft(r, (int)t); r *= 2685821657736338717L;
% s1 = s1 * 3202034522624059733L + g1; s2 = s2 * 3935559000370003845L + g2;
% return (r ^ (r >>> 32)); }
% private static final double DOUBLE_ULP = 1.0 / (1L << 53);
% public double nextDouble() { return (nextLong() >>> 11) * DOUBLE_ULP; }
% public TwinLinear split() {
% return new TwinLinear(nextLong(), nextLong(), nextLong(), nextLong()); }
% }
% \end{verbatim}
\begin{tabbing}
{\tt class\ TwinLinear\ {\char'173}} \\
{\tt \ \ private\ long\ s1,\ s2;\ private\ final\ long\ g1,\ g2;\ \ \ \ \ \ \ \ \ \ \ \ \ \ //\ `g1'\ and\ `g2'\ must\ be\ odd} \\
{\tt \ \ public\ TwinLinear(long\ s1,\ long\ s1,\ long\ g1,\ long\ g2)\ {\char'173}} \\
{\tt \ \ \ \ this.s1\ =\ s1;\ this.s2\ =\ s2;\ this.g1\ =\ g1\ |\ 1;\ this.g2\ =\ g2\ |\ 1;\ {\char'175}} \\
{\tt \ \ public\ long\ nextLong()\ {\char'173}} \\
{\tt \ \ \ \ long\ r\ =\ Long.rotateLeft(s1,\ 32)\ {\char'136}\ s2;\ long\ t\ =\ s1\ >>>\ 58;} \\
{\tt \ \ \ \ r\ =\ Long.rotateLeft(r,\ (int)t);\ r\ *=\ 2685821657736338717L;} \\
{\tt \ \ \ \ s1\ =\ s1\ *\ 3202034522624059733L\ +\ g1;\ s2\ =\ s2\ *\ 3935559000370003845L\ +\ g2;} \\
{\tt \ \ \ \ return\ (r\ {\char'136}\ (r\ >>>\ 32));\ {\char'175}} \\
{\tt \ \ private\ static\ final\ double\ DOUBLE{\char'137}ULP\ =\ 1.0\ /\ (1L\ <<\ 53);} \\
{\tt \ \ public\ double\ nextDouble()\ {\char'173}\ return\ (nextLong()\ >>>\ 11)\ *\ DOUBLE{\char'137}ULP;\ {\char'175}} \\
{\tt \ \ public\ TwinLinear\ split()\ {\char'173}} \\
{\tt \ \ \ \ return\ new\ TwinLinear(nextLong(),\ nextLong(),\ nextLong(),\ nextLong());\ {\char'175}} \\
{\tt {\char'175}}
\end{tabbing}
\hrule\vskip-1ex
\caption{Java code for the TwinLinear algorithm (with the {\tt RXRRMX} mixer)}\label{fig:twinlinear}
\end{figure}
Java code for the {\sc TwinLinear} algorithm appears in Figure~\ref{fig:twinlinear},
which may be compared with Figure~\ref{fig:splitmix}.
The {\tt split} operation simply performs
four {\tt nextLong} operations to obtain four 64-bit integers,
then calls the constructor (which forces the last two integers to be odd
by using a bitwise {\sc or} operation with the integer constant 1).
\begin{figure}
\hrule
\begin{tabbing}
\hskip 0.9in\=\kill
{\tt twoRXR}\>swap halves of $x$, {\sc xor} $x$ into $y$, rotate $y$ by $x$ \\
{\tt twoRXRR}\>{\sc xor} swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoXRXR}\>{\sc xor} $y$ into $x$, swap halves of $x$, {\sc xor}$x$ into $y$, rotate $y$ by $x$ \\
{\tt twoXRXRR}\>{\sc xor} $y$ into $x$, {\sc xor} swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoRARR}\>add swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoXRARR}\>{\sc xor} $y$ into $x$, add swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoARXRR}\>add $y$ into $x$, {\sc xor} swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoARARR}\>add $y$ into $x$, add swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoRXRRM}\>{\sc xor} swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoXRXRRM}\>{\sc xor} $y$ into $x$, {\sc xor} swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a3 \\
{\tt twoRARRM}\>add swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a3 \\
{\tt twoXRARRM}\>{\sc xor} $y$ into $x$, add swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a3 \\
{\tt twoARXRRM}\>add $y$ into $x$, {\sc xor} swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a3 \\
{\tt twoARARRM}\>add $y$ into $x$, add swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a3 \\
{\tt two6XR}\>rotate $x$ left by 6, {\sc xor} $x$ into $y$, rotate $y$ by $x$ \\
{\tt two6AR}\>rotate $x$ left by 6, add $x$ into $y$, rotate $y$ by $x$ \\
{\tt twoVXR}\>reverse bits of $x$, {\sc xor} $x$ into $y$, rotate $y$ by $x$ \\
{\tt twoVAR}\>reverse bits of $x$, add $x$ into $y$, rotate $y$ by $x$ \\
{\tt twoVXF}\>reverse bits of $x$, {\sc xor} $x$ into $y$, bitflip $y$ by $x$ \\
{\tt twoVAF}\>reverse bits of $x$, add $x$ into $y$, bitflip $y$ by $x$ \\
{\tt twoRX6F}\>{\sc xor} swapped halves of $x$ into $y$, shift $x$ right by 58, bitflip $y$ by $x$ \\
{\tt twoRA6F}\>add swapped halves of $x$ into $y$, shift $x$ right by 58, bitflip $y$ by $x$ \\
{\tt twoSXRR}\>{\sc xor} high half of $x$ into low half of $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoSIRR}\>copy high half of $x$ into low half of $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoSZRR}\>{\sc xor} high half of $y$ into low half of $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoSXRRM}\>{\sc xor} high half of $x$ into low half of $y$, shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a3 \\
{\tt twoRXRRMX}\>{\sc xor} swapped halves of $x$ into $y$, shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a3, \\ \>\qquad {\sc xor} high half of $y$ into low half of $y$ \\
{\tt twoSXRRMX}\>{\sc xor} high half of $x$ into low half of $y$, shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a3, \\ \>\qquad {\sc xor} high half of $y$ into low half of $y$ \\
{\tt twoRRMX}\>shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a3, {\sc xor} high half of $y$ into low half of $y$ \\
{\tt twoRRNX}\>shift $x$ right by 58, rotate $y$ by $x$, multiply $y$ by a4, {\sc xor} high half of $y$ into low half of $y$ \\
{\tt twoR4XRR}\>{\sc xor} ($x$ rotated left by 4) into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoR8XRR}\>{\sc xor} ($x$ rotated left by 8) into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoR12XRR}\>{\sc xor} ($x$ rotated left by 12) into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
\hphantom{\tt twoR1}$\smash{\lower0.3ex\hbox to 0pt{\hss$\vdots$\hss}}$\>\hphantom{{\sc xor} ($x$ rotated left by 2}$\smash{\lower0.3ex\hbox to 0pt{\hss$\vdots$\hss}}$ \\
{\tt twoR24XRR}\>{\sc xor} ($x$ rotated left by 24) into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoR28XRR}\>{\sc xor} ($x$ rotated left by 28) into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoR36XRR}\>{\sc xor} ($x$ rotated left by 36) into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoR40XRR}\>{\sc xor} ($x$ rotated left by 40) into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
\hphantom{\tt twoR1}$\smash{\lower0.3ex\hbox to 0pt{\hss$\vdots$\hss}}$\>\hphantom{{\sc xor} ($x$ rotated left by 2}$\smash{\lower0.3ex\hbox to 0pt{\hss$\vdots$\hss}}$ \\
{\tt twoR56XRR}\>{\sc xor} ($x$ rotated left by 56) into $y$, shift $x$ right by 58, rotate $y$ by $x$ \\
{\tt twoR60XRR}\>{\sc xor} ($x$ rotated left by 60) into $y$, shift $x$ right by 58, rotate $y$ by $x$
\end{tabbing}
\hrule\vskip-1ex
\caption{Mixing functions used while testing the {\sc TwinLinear} algorithm}\label{fig:twinlinear-mixers}
\end{figure}
\begin{figure}
\hrule
\begin{tabbing}
\hskip 1.5em\=\hskip 2.5em\=\kill
The {\tt twopair} tests use seed {\tt 0xPPQNRSXXXXXYYYYY} in a complex way to initialize two {\sc TwinLinear} instances: \\
Low bit of {\tt YYYYY} indicates which of two initial state values to use for both generators in both instances. \\
{\tt XXXXX | 1} initializes {\tt g1} for the first {\sc lcg} of each instance. \\
{\tt YYYYY | 1} initializes {\tt g2} for the second {\sc lcg} of each instance. \\
If high bit of {\tt Q} is {\tt 1}, multiply both these values by {\tt 1181783497276652981}, just to get nonzero bits up high. \\
Low bits of {\tt Q} specify perturbation of one or both gamma values, or of a state value, for second instance only: \\
\>{\tt Q} = {\tt 0}\>perturb first gamma value (rotation distance is {\tt 2R+1}) \\
\>{\tt Q} = {\tt 1}\>perturb second gamma value (rotation distance is {\tt 2R+1}) \\
\>{\tt Q} = {\tt 2}\>perturb both gamma values in same way (rotation distance is {\tt 2R+1}) \\
\>{\tt Q} = {\tt 3}\>perturb both gamma values but in different ways (rotation distances are {\tt 2R+1} and {\tt 2R+9}) \\
\>{\tt Q} = {\tt 4}\>perturb first gamma value (rotation distance is {\tt PP}) \\
\>{\tt Q} = {\tt 5}\>perturb second gamma value (rotation distance is {\tt PP}) \\
\>{\tt Q} = {\tt 6}\>perturb first state value (rotation distance is {\tt PP}) \\
\>{\tt Q} = {\tt 7}\>perturb second state value (rotation distance is {\tt PP}) \\
(To ``perturb'' is to invert {\tt N} bits, namely those at positions $(i\times\langle\hbox{rotation distance}\rangle)\bmod 64$ for $1 \leq i \leq \hbox{\tt N}$.) \\
{\tt S} indicates how to jump forward the state values of the generators of the second instance only: \\
\>{\tt S} = {\tt 0}\>jump first state value by {\tt 2**PP} steps \\
\>{\tt S} = {\tt 1}\>jump second state value by {\tt 2**PP} steps \\
\>{\tt S} = {\tt 2}\>jump both state values by {\tt 2**PP} steps \\
\>{\tt S} = {\tt 3}\>jump both state values but in different ways (first by {\tt 2**PP} steps, second by {\tt 2**(PP+2)} steps) \\
\>{\tt S} = {\tt 4}\>do not jump either state value \\
\>{\tt S} = {\tt 5}\>jump first state value by {\tt 2**PP + R} steps \\
\>{\tt S} = {\tt 6}\>jump second state value by {\tt 2**PP + R} steps \\
\>{\tt S} = {\tt 7}\>jump both state values by {\tt 2**PP + R} steps \\
\>{\tt S} = {\tt 8}\>jump first state value by {\tt PP} steps \\
\>{\tt S} = {\tt 9}\>jump second state value by {\tt PP} steps \\
\>{\tt S} = {\tt A}\>jump both state values by {\tt PP} steps \\
\>{\tt S} = {\tt B}\>jump both state values but in different ways (first by {\tt PP} steps, second by {\tt PP+2} steps) \\
\>{\tt S} = {\tt C}\>unused \\
\>{\tt S} = {\tt D}\>jump first state value by {\tt R} steps \\
\>{\tt S} = {\tt E}\>jump second state value by {\tt R} steps \\
\>{\tt S} = {\tt F}\>jump both state values by {\tt R} steps \\
Typically $1 \leq \hbox{\tt N} \leq 15$ and $0 \leq \hbox{\tt P} \leq 19$; all combinations of {\tt P} and {\tt Q} and {\tt R} thus produce 1200 tests.
\end{tabbing}
\hrule\vskip-1ex
\caption{How states of two instances of {\tt TwinLinear} are initialized for {\tt twopair} tests}\label{fig:twopair-parameters}
\end{figure}
\begin{figure}
\hrule
\begin{tabbing}
\hskip 1.5em\=\hskip 2.5em\=\kill
The {\tt twotwin} tests use seed {\tt 0xPQRSTUVVWWXXXYYY} in a complex way to initialize two {\sc TwinLinear} instances: \\
{\tt XXX | 1} initializes {\tt g1} for both instances, and {\tt YYY | 1} initializes {\tt g2} for both instances. \\
If low bit of {\tt XXX} is {\tt 1}, multiply {\tt g1} for both instances by {\tt 1181783497276652981} (make high bits nonzero). \\
If low bit of {\tt YYY} is {\tt 1}, multiply {\tt g2} for both instances by {\tt 2685821657736338717} (make high bits nonzero).\\
There is a fixed 16-entry ``{\tt PS} table''; the first entry is {\tt 0}, and the others are random 64-bit integers from Hotbits. \\
There is a fixed 16-entry ``{\tt QT} table''; the first entry is {\tt 0}, and the others are random 64-bit integers from Hotbits. \\
{\tt P} selects one of 16 values for {\tt s1} for both instances from the {\tt PS} table. \\
{\tt Q} selects one of 16 values for a temporary value {\tt M} from the {\tt QT} table. \\
Bit {\tt VV} of {\tt M} is set to {\tt 1}, and all bits of {\tt M} below that point are cleared. (If $\hbox{\tt VV} \geq 64$, then {\tt M} becomes 0.) \\
{\tt S} selects one of 16 values for {\tt s2} for both instances from the {\tt PS} table. \\
{\tt T} selects one of 16 values for a temporary value {\tt N} from the {\tt QT} table. \\
Bit {\tt WW} of {\tt N} is set to {\tt 1}, and all bits of {\tt N} below that point are cleared. (If $\hbox{\tt WW} \geq 64$, then {\tt N} becomes 0.) \\
For the second instance only, {\tt s1} is jumped forward {\tt M+R} steps, and {\tt s2} is jumped forward {\tt N+U} steps.
\end{tabbing}
\hrule\vskip-1ex
\caption{How states of two instances of {\tt TwinLinear} are initialized for {\tt twotwin} tests}\label{fig:twotwin-parameters}
\end{figure}
\section{Possible Weaknesses of the TwinLinear Algorithm}\label{sec:twinlinear-weaknesses}
As for the {\sc SplitMix} algorithm, we tried to predict analytically possible weaknesses of the {\sc TwinLinear}
algorithm so that we could focus additional testing on such cases.
Qualitatively speaking, the {\sc SplitMix} algorithm consists of an extremely weak (but very fast) sequence generator
followed by a mixing function that needs to be strong enough to compensate.
In contrast, the {\sc TwinLinear} algorithm begins with two generators each of which is known to be
fairly good---or at least ``not completely terrible.'' Our particular choices of multipliers a1 and a2
produce two linear congruential generators with very good spectral properties,
no matter what additive values g1 and g2 are used. There is good reason to believe that
no matter what values of g1 and g2 are chosen and no matter where in the state cycle each
of the two generators is started, if one regards only the 32 high-order bits of each generated value,
the two generated sequences will appear to be fairly uncorrelated. But it's not quite good enough
just to take the high-order 32 bits from each generator and concatenate them to form 64-bit values.
Therefore we also use a mixing function. Testing as shown that the particular one we have chosen
appears to be quite good, and we have found no weaknesses in sequences generated by a single
instance of {\sc TwinLinear}, no matter what the choice of parameter values g1 and g2.
But if we consider using multiple instances of the {\sc TwinLinear} algorithm together---for example,
using two such instances and interleaving their outputs---there are additional opportunities for
correlation. It might be that two such generators have similar initial states; but if their g1 and g2
values differ, then their states will soon become quite dissimilar.
But suppose that two such instances have the same value of g1,
or the same value of g2, or both? Let the initial state values be s1a and s2a (for the first instance of TwinLinear) and s1b and s2b
(for the second instance). If the distance n1 between s1a and s1b along the state cycle of the first {\sc lcg}
is of the form $(2m_1+1)2^k_1$, then the low-order $k_1$ bits of s1a and s1b will always be the same, and similarly for
s2a and s2b. Therefore we conjectured that the mixing function, if it is ``barely good enough'' in most cases,
might encounter trouble in the case of interleaving the output of two instances with the same g1 and g2,
with that trouble increasing as either $k_1$ or $k_2$ increases. We set out to test this conjecture.
\section{Testing the TwinLinear Algorithm}\label{sec:testing-twinlinear}
We tested the general {\sc TwinLinear} framework using both BigCrush and PractRand, using a number of distinct mixing functions,
and using a variety of techniques for initializing the two states and two gamma values from a variety of 64-bit seeds.
Again we had three goals in mind: comparing the strengths and weaknesses of
BigCrush and PractRand as test suites, comparing the strengths and weaknesses of the various mixing functions,
and trying to predict poor behavior and perhaps uncover unexpected poor behavior.
Brief characterizations of the mixing functions we used in our tests of {\sc TwinLinear} are shown
in Fig.~\ref{fig:twinlinear-mixers} (each mixes a value $x$ into a second value $y$, which becomes the result).
Initial tests showed that each proposed mixer was either ``good enough'' or ``really bad''
(an example chart is Fig.~\ref{fig:TestU01_twotests_u_twolcg_seq_graph_0} in Appendix~\ref{app:charts}).
Further, more intensive testing (both single-instance and round-robin) led us to focus on mixer {\tt twoRXRRMX}.
We set up a series of tests (called ``{\tt twopair}'') that would use interleaved output from two instances of {\sc TwinLinear}
using the {\tt twoRXRRMX} mixer, with the initial state of the two instances governed by a set of parameters
that could be packed into a single 64-bit integer command-line argument (see Fig.~\ref{fig:twopair-parameters}).
This complex encoding allowed us to explore the extent to which the test suites would detect biases
in the output if the initial states and/or gamma parameters of the two instances were correlated
in various ways.
(An example chart is Fig.~\ref{fig:PractRand_twopair_twolcg_twoRXRRMX_pair__0000100001_graph_0} in Appendix~\ref{app:charts}.)
We were able to get the PractRand tests to fail only for cases where the two instances had identical {\tt gamma} values
for their first {\sc lcg} \emph{and} identical {\tt gamma} values for their second {\sc lcg}.
It was at this point that we formulated the conjecture presented in the previous section.
We set up another series of tests (called ``{\tt twotwin}'') that would, again, use interleaved output from two instances of {\sc TwinLinear}
using the {\tt twoRXRRMX} mixer, but exploring a different initial states, specifically to test
the conjecture that if the gamma values are identical, then output quality depends on the relative phases of the corresponding {\sc lcg}
states. The initial state of the two instances is governed by a different set of parameters
packed into a single 64-bit integer command-line argument (see Fig.~\ref{fig:twotwin-parameters}).
The results confirm the conjecture. See, for example, Fig.~\ref{fig:TestU01_twotwin_u_twolcg_twoRXRRMX_0_twin__000000_graph_0}
and~\ref{fig:TestU01_twotwin_u_twolcg_twoRXRRMX_0_twin__000000_graph_1}; their X-axes together span the complete
range 63--0 of possible numbers {\tt VV} of trailing zeroes in n1, and their Y-axes describe various initializations.
The horizontal black bands reflect precisely those cases in which corresponding gamma values are identical,
and it may be observed that the the failures reported by BigCrush become less severe as one progresses from left to right.
(In Appendix~\ref{app:charts}, similar results for n2 are shown in
~Fig.~\ref{fig:TestU01_twotwin_u_twolcg_twoRXRRMX_1_twin__000000_graph_0}
and~\ref{fig:TestU01_twotwin_u_twolcg_twoRXRRMX_1_twin__000000_graph_1}.
Corresponding results reported by PractRand for the same test cases
appear in
Fig.~\ref{fig:PractRand_twotwin_twolcg_twoRXRRMX_0_twin__000000_graph_0},
\ref{fig:PractRand_twotwin_twolcg_twoRXRRMX_0_twin__000000_graph_1},
\ref{fig:PractRand_twotwin_twolcg_twoRXRRMX_1_twin__000000_graph_0},
and~\ref{fig:PractRand_twotwin_twolcg_twoRXRRMX_1_twin__000000_graph_1}.)
\section{Miscellaneous Observations}
In hopes of constructing a somewhat smaller and more efficient version of {\sc TwinLinear},
we have also studied a variant which we call {\sc SmallBig}; it makes use of one 32-bit {\sc lcg}
and one 64-bit {\sc lcg}. However, we have not yet found a version of {\sc SmallBig}
that is sufficiently robust when tested. We conjecture that something special happens
when going from 32-bit arithmetic to 64-bit arithmetic: when the game is to satisfy
BigCrush and PractRand, perhaps a 32-bit {\sc lcg} really
isn't ``random enough'' to succeed with this approach, but a 64-bit {\sc lcg} is.
We also studied a large set of mixing functions similar to those already described,
but in which a rotation step whose distance is determined by some part of the state
is replaced with a flip step. In a rotation of $2^n$ bits by distance $s$,
bit $i$ is moved to position $(i+s)\bmod 2^n$; in a ``flip'' of a group of $2^n$ bits by parameter $s$,
bit $i$ is moved to position $i \hbox{\sc xor} s$. What is the value of a rotation step?
\citet[\S 6.3.3]{PCG-PREPRINT} suggests that a random rotation ``ensures that all the bits are full period'';
a simpler intuition is that a random rotation gives every result bit an equal chance of being drawn
from every bit position of the input. By there criteria, a flip should be every bit as good as a rotation,
and yet our testing shows that mixing functions that use flips are in every case much worse than otherwise
identical mixing functions that use rotations. We hope that future work may illuminate why.
We conducted yet another group of tests (called ``{\tt twomulti}'') capable of interleaving the outputs of up to
256 instances of {\sc TwinLinear}, and did a large number of runs that interleaved 16 instances.
We omit details for lack of space, but remark that the results were consistent with the model
that instances behave as if statistically independent if they differ in at least one of the two {\tt gamma}
parameters.
We expended just over a thread-century of computing time running TestU01 and PractRand.
A breakdown appears in Fig.~\ref{fig:cputime}. Over a quarter of the time was spent on the
{\tt twotwin} tests that, on the one hand, characterize correlations between {\sc TwinLinear}
instances whose corresponding {\tt gamma} parameters are identical and, on the other hand,
appear to confirm that differing in at least one {\tt gamma} parameter
confers statistical independence.
\begin{figure}
\begin{tabular}{@{\hskip0.6em}llr@{\hskip0.6em}|@{\hskip0.6em}llr@{\hskip0.6em}|@{\hskip0.6em}llr@{\hskip0.6em}}
\hline
TestU01 & {\tt maintests} & 10.45 yr & PractRand & {\tt maintests} & 6.33 yr & PractRand & {\tt twopair} & 17.06 yr \\
TestU01 & {\tt variants} & 15.70 yr & PractRand & {\tt variants} & 2.80 yr & PractRand & {\tt twopair{\char'137}alt} & 0.22 yr \\
TestU01 & {\tt twotests} & 9.21 yr & PractRand & {\tt twotests} & 7.88 yr & PractRand & {\tt twotwin} & 27.71 yr \\
TestU01 & {\tt smallbig} & 1.04 yr & PractRand & {\tt smallbig} & 0.16 yr & PractRand & {\tt twomulti} & 5.79 yr
\\ \hline
\end{tabular}
\hbox to \textwidth{\hss{\vrule width 0pt height 2.3ex}Grand total CPU time used: 104.35 years\hss}
\caption{CPU time (in years) used for {\sc prng} testing (summed over all threads, typically 384 to 512 running simultaneously)}\label{fig:cputime}
\end{figure}
\section{Other Related Work}\label{sec:related-work}
We refer the reader to the Related Work section provided by Steele, Lea, and Flood~\citeyear{FAST-SPLITTABLE-PRNG}
for an overview of related work prior to theirs.
\citet{PCG-PREPRINT} describes the {\sc pcg} (Permuted Congruential Generator) family of {\sc prng} algorithms.
The basic idea is to start with a (single) Linear Congruential Generator, then apply a mixing function,
typically consisting of two or three steps, each step being either shift-and-{\sc xor}, multiplication
by a carefully chosen odd constant, or a bit-rotation of some part of the state by a distance specified
by some other part of the state.
\citet{VIGNA-XOROSHIRO} provides a comparison of the quality and speed of a number of recent {\sc prng}
algorithms, including (64-bit) {\sc SplitMix}, {\sc pcg}, and three xorshift-style algorithms that he has developed,
building on the work of \citet{MARSAGLIA-XORWOW}:
{\tt xorshift1024*} \citep{VIGNA-TOMS-2016},
{\tt xorshift128+} \citep{VIGNA-JCAM-2017}, and
{\tt xoroshiro128+}, all of which are very fast (but, like all xorshift generators, have a period that is 1~less than a power of~2,
and therefore the value 0 is delivered with slightly less probability than any other value).
All three do quite well when tested with BigCrush.
The {\tt xorshift1024*} algorithm has period $2^{1024}$; like most {\tt PRNG} algorithms with that much state,
it uses array indexing to access the state.
The {\tt xorshift128+} algorithm uses just 128 bits of state;
its source code is written in terms of array indexing, which might prevent a C compiler
from keeping the state in registers, but because all indices used are constants ({\tt 0} or {\tt 1}),
it could easily be rewritten to avoid use of array indexing in the same manner as {\tt TwinLinear}.
The {\tt xoroshiro128+} algorithm is similar to {\tt xorshift128+}, but uses even fewer
operations, and two of them are rotates rather than shifts.
All three are provided with jump functions that can quickly advance the state by approximately the square root of the period
($2^{512}$ steps for {\tt xorshift1024*}, $2^{64}$ steps for {\tt xorshift128+} or {\tt xoroshiro128+}),
making it easy to create many instances that traverse different parts of the cycle for use by multiple threads;
however, no {\tt split} operation is provided.
\section{Conclusions and Future Work}\label{sec:conclusions}
At the end of their paper, Steele, Lea, and Flood~\citeyear{FAST-SPLITTABLE-PRNG} commented:
``It would be a delightful outcome if, in the end, the best way to split off a new {\sc prng}
is indeed simply to \hbox{`pick one at random.'}$\,$''
Perhaps we have now achieved that:
our testing suggests that if the four arguments to the {\tt TwinLinear} constructor
are themselves chosen uniformly at random---with no need to filter out any ``weak values''---then the interleaved outputs of two generators
constructed in this way will pass the TestU01 BigCrush test suite~\cite{L'Ecuyer:2007:TestU01,TESTU01}
and also the PractRand test suite~\cite{PRACTRAND} with probability exceeding $1-2^{126}$.
(This is well in excess of the sensitivity of the test suites themselves.)
More specifically, we conjecture from testing (but this has not been proven mathematically) that the interleaved output of two such generators ought
\emph{always} pass the test suites if either their g1 values differ or their g2 values differ (or both).
% Moreover, in many cases such interleaved output passes the test suites even
% if the two LCGs have exactly the same g1 and g2 values;
% roughly speaking, if n1 is the number of times the first LCG of one {\sc prng} must be advanced to make the s1 fields
% identical, and n2 is the number of times the second LCG of one {\sc prng} must be advanced to make the s2 fields identical,
% then the interleaved output is likely to pass the test suites if neither n1 nor n2 has more than about 8 trailing 0-bits.
% Our testing also indicates that the interleaved output of more than two such objects likewise passes both test suites
% provided that no two have identical {\tt gamma} parameters.
We consider TestU01 BigCrush to be the current gold standard for final testing of any {\sc prng}
algorithm before deployment. However, we found PractRand to be an extremely useful additional tool
for two purposes: experimental exploration (because it fails fast on poor {\sc prng} algorithms)
and evaluating relative degrees of weakness (because the length to which a tested sequence must grow before
failure is reported appears to be a more sensitive and repeatable metric than the $p$-value calculated for a sequence
of fixed length). An algorithm that passes PractRand at the 4~GB threshold is worthy of final testing with BigCrush.
The {\sc SplitMix} algorithm used in JDK8 has 127 bits of state (of which 64 are updated
per 64 bits generated) and uses 9 arithmetic operations
per 64 bits generated, but the {\sc TwinLinear} algorithm uses 254 bits of state (of which 128 are updated
per 64 bits generated) and uses 11 arithmetic operations (or possibly 10, on some architectures) per 64 bits generated.
For applications in which it is desired to have a significantly smaller probability of
statistical correlations among multiple generators being used by parallel tasks,
especially when it is desirable to create new generator instances on the fly (for example, when forking new threads),
{\sc TwinLinear} may be very attractive.
%% Acknowledgments
\begin{acks} %% acks environment is optional
%% contents suppressed with 'anonymous'
XXXX
\end{acks}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/TestRandom/quux/TestU01_maintests_u_splitmix_Seq_graph_0.pdf}
\end{center}
\caption{{\tt TestU01{\char'137}maintests{\char'137{u}\char'137}splitmix{\char'137}Seq{\char'137}graph{\char'137}0}}
\label{fig:TestU01_maintests_u_splitmix_Seq_graph_0}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/TestRandom/quux/TestU01_maintests_u_splitmix_Seq_graph_1.pdf}
\end{center}
\caption{{\tt TestU01{\char'137}maintests{\char'137{u}\char'137}splitmix{\char'137}Seq{\char'137}graph{\char'137}1}}
\label{fig:TestU01_maintests_u_splitmix_Seq_graph_1}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/TestRandom/quux/TestU01_twotwin_u_twolcg_twoRXRRMX_0_twin__000000_graph_0.pdf}
\end{center}
\caption{{\tt TestU01{\char'137}twotwin{\char'137{u}\char'137}twolcg{\char'137}twoRXRRMX{\char'137}0{\char'137}twin{\char'137}{\char'137}000000{\char'137}graph{\char'137}0}}
\label{fig:TestU01_twotwin_u_twolcg_twoRXRRMX_0_twin__000000_graph_0}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/TestRandom/quux/TestU01_twotwin_u_twolcg_twoRXRRMX_0_twin__000000_graph_1.pdf}
\end{center}
\caption{{\tt TestU01{\char'137}twotwin{\char'137{u}\char'137}twolcg{\char'137}twoRXRRMX{\char'137}0{\char'137}twin{\char'137}{\char'137}000000{\char'137}graph{\char'137}1}}
\label{fig:TestU01_twotwin_u_twolcg_twoRXRRMX_0_twin__000000_graph_1}
\end{figure}
\clearpage
\bibliography{paper}
\newpage
%% Appendix
\appendix
\null
\vskip 1in
\begin{center}
\Huge\bf APPENDIX
\end{center}
\vskip 1in
In this Appendix we present additional measurement charts
and details about processes used to distill reports for both
TestU01 BigCrush and PractRand test runs.
\vskip 2in
\section{Additional Measurement Charts}\label{app:charts}
On the following pages we present 14 additional measurement charts. They will not all fit into
the final conference paper, but we wish to allow reviewers to check our summary descriptions of what we have measured.
\newpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_maintests_splitmix_Seq_graph_0.pdf}
\end{center}
\caption{{\tt PractRand{\char'137}maintests{\char'137}splitmix{\char'137}Seq{\char'137}graph{\char'137}0.pdf}}
\label{fig:PractRand_maintests_splitmix_Seq_graph_0.pdf}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_maintests_splitmix_Seq_graph_1.pdf}
\end{center}
\caption{{\tt PractRand{\char'137}maintests{\char'137}splitmix{\char'137}Seq{\char'137}graph{\char'137}1.pdf}}
\label{fig:PractRand_maintests_splitmix_Seq_graph_1.pdf}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/TestRandom/quux/TestU01_variants_u_splitmix_Seq_graph_3.pdf}
\end{center}
\caption{{\tt TestU01{\char'137}variants{\char'137{u}\char'137}splitmix{\char'137}Seq{\char'137}graph{\char'137}3}}
\label{fig:TestU01_variants_u_splitmix_Seq_graph_3}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_variants_splitmix_Seq_graph_3}
\end{center}
\caption{{\tt PractRand{\char'137}variants{\char'137}splitmix{\char'137}Seq{\char'137}graph{\char'137}3}}
\label{fig:PractRand_variants_splitmix_Seq_graph_3}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_maintests_splitmix_rr8d_graph_0.pdf}
\end{center}
\caption{{\tt PractRand{\char'137}maintests{\char'137}splitmix{\char'137}rr8d{\char'137}graph{\char'137}0}}
\label{fig:PractRand_maintests_splitmix_rr8d_graph_0}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_maintests_splitmix_rr8c_graph_0.pdf}
\end{center}
\caption{{\tt PractRand{\char'137}maintests{\char'137}splitmix{\char'137}rr8c{\char'137}graph{\char'137}0}}
\label{fig:PractRand_maintests_splitmix_rr8c_graph_0}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/TestRandom/quux/TestU01_twotests_u_twolcg_seq_graph_0}
\end{center}
\caption{{\tt TestU01{\char'137}twotests{\char'137{u}\char'137}twolcg{\char'137}seq{\char'137}graph{\char'137}0}}
\label{fig:TestU01_twotests_u_twolcg_seq_graph_0}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_twopair_twolcg_twoRXRRMX_pair__0000100001_graph_0.pdf}
\end{center}
\caption{{\tt PractRand{\char'137}twopair{\char'137}twolcg{\char'137}twoRXRRMX{\char'137}pair{\char'137}{\char'137}0000100001{\char'137}graph{\char'137}0}}
\label{fig:PractRand_twopair_twolcg_twoRXRRMX_pair__0000100001_graph_0}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/TestRandom/quux/TestU01_twotwin_u_twolcg_twoRXRRMX_1_twin__000000_graph_0.pdf}
\end{center}
\caption{{\tt TestU01{\char'137}twotwin{\char'137{u}\char'137}twolcg{\char'137}twoRXRRMX{\char'137}1{\char'137}twin{\char'137}{\char'137}000000{\char'137}graph{\char'137}2}}
\label{fig:TestU01_twotwin_u_twolcg_twoRXRRMX_1_twin__000000_graph_0}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/TestRandom/quux/TestU01_twotwin_u_twolcg_twoRXRRMX_1_twin__000000_graph_1.pdf}
\end{center}
\caption{{\tt TestU01{\char'137}twotwin{\char'137{u}\char'137}twolcg{\char'137}twoRXRRMX{\char'137}1{\char'137}twin{\char'137}{\char'137}000000{\char'137}graph{\char'137}3}}
\label{fig:TestU01_twotwin_u_twolcg_twoRXRRMX_1_twin__000000_graph_1}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_twotwin_twolcg_twoRXRRMX_0_twin__000000_graph_0.pdf}
\end{center}
\caption{{\tt PractRand{\char'137}twotwin{\char'137}twolcg{\char'137}twoRXRRMX{\char'137}0{\char'137}twin{\char'137}{\char'137}000000{\char'137}graph{\char'137}0}}
\label{fig:PractRand_twotwin_twolcg_twoRXRRMX_0_twin__000000_graph_0}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_twotwin_twolcg_twoRXRRMX_0_twin__000000_graph_1.pdf}
\end{center}
\caption{{\tt PractRand{\char'137}twotwin{\char'137}twolcg{\char'137}twoRXRRMX{\char'137}0{\char'137}twin{\char'137}{\char'137}000000{\char'137}graph{\char'137}1}}
\label{fig:PractRand_twotwin_twolcg_twoRXRRMX_0_twin__000000_graph_1}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_twotwin_twolcg_twoRXRRMX_1_twin__000000_graph_0.pdf}
\end{center}
\caption{{\tt PractRand{\char'137}twotwin{\char'137}twolcg{\char'137}twoRXRRMX{\char'137}1{\char'137}twin{\char'137}{\char'137}000000{\char'137}graph{\char'137}0}}
\label{fig:PractRand_twotwin_twolcg_twoRXRRMX_1_twin__000000_graph_0}
\end{figure}
\clearpage
\begin{figure}
\begin{center}
\includegraphics[height=\figureheight]{/Users/gls/PractRand/quux/PractRand_twotwin_twolcg_twoRXRRMX_1_twin__000000_graph_1.pdf}
\end{center}
\caption{{\tt PractRand{\char'137}twotwin{\char'137}twolcg{\char'137}twoRXRRMX{\char'137}1{\char'137}twin{\char'137}{\char'137}000000{\char'137}graph{\char'137}1}}
\label{fig:PractRand_twotwin_twolcg_twoRXRRMX_1_twin__000000_graph_1}
\end{figure}
\clearpage
\section{Details about Distilling BigCrush Reports}
The TestU01 BigCrush test suite runs 106 individual tests~\cite[pp.\ 148--152]{TESTU01-DOC}, computing 160 test statistics and $p$-values~\cite{L'Ecuyer:2007:TestU01}. A single test run typically prints about 110 kilobytes of information, which are summarized at the end in one of two forms. Here is an example of a summary produced when no significant anomalies are detected:
\begin{quotation}
\begin{tt}\catcode'137=11
\begin{tabbing}
========= Summary results of BigCrush ========= \\
\\
Version:~~~~~~~~~~TestU01 1.2.3 \\
Generator:~~~~~~~~u_xorgamma_stafford08_seq \\
Number of statistics:~~160 \\
Total CPU time:~~~05:12:44.09 \\
\\
All tests were passed
\end{tabbing}
\end{tt}
\end{quotation}
Here is an example of a summary produced when, say, just one significant anomaly is detected:
\begin{quotation}
\begin{tt}\catcode'137=11
\begin{tabbing}
========= Summary results of BigCrush ========= \\
\\
Version:~~~~~~~~~~TestU01 1.2.3 \\
Generator:~~~~~~~~u_xorgamma_stafford07_seq \\
Number of statistics:~~160 \\
Total CPU time:~~~05:55:20.65 \\
The following tests gave p-values outside [0.001, 0.9990]: \\
(eps~~means a value < 1.0e-300): \\
(eps1 means a value < 1.0e-15): \\
\\
~~~~~~~Test~~~~~~~~~~~~~~~~~~~~~~~~~~p-value \\
---------------------------------------------- \\
42~~Permutation, t = 7~~~~~~~~~~~~~~2.7e-4 \\
---------------------------------------------- \\
All other tests were passed
\end{tabbing}
\end{tt}
\end{quotation}
This is a typical result when a {\sc prng} is fundamentally sound but one test ``accidentally'' falls outside the statistically desirable range, which is bound to happen by chance every so often.
Now, here is an example of a summary produced when many significant anomalies are detected:
\begin{quotation}
\begin{tt}\catcode'137=11
\begin{tabbing}
========= Summary results of BigCrush ========= \\
\\
Version:~~~~~~~~~~TestU01 1.2.3 \\
Generator:~~~~~~~~u_splitmix_stafford01_seq \\
Number of statistics:~~160 \\
Total CPU time:~~~06:01:33.54 \\
The following tests gave p-values outside [0.001, 0.9990]: \\
(eps~~means a value < 1.0e-300): \\
(eps1 means a value < 1.0e-15): \\
\\
~~~~~~~Test~~~~~~~~~~~~~~~~~~~~~~~~~~p-value \\
---------------------------------------------- \\
10~~CollisionOver, t = 14~~~~~~~~~~1.6e-15 \\
12~~CollisionOver, t = 21~~~~~~~~~~~2.5e-4 \\
21~~BirthdaySpacings, t = 16~~~~~~~3.3e-24 \\
27~~SimpPoker, r = 27~~~~~~~~~~~~~~~~eps~~ \\
58~~AppearanceSpacings, r = 27~~~~~1 - eps1 \\
69~~MatrixRank, L=1000, r=26~~~~~~~~0.9995 \\
75~~RandomWalk1 H (L=50, r=25)~~~~~~8.4e-8 \\
96~~HammingIndep, L=30, r=27~~~~~~1.0e-138 \\
102~~Run of bits, r = 27~~~~~~~~~~~~~5.6e-4 \\
---------------------------------------------- \\
All other tests were passed
\end{tabbing}
\end{tt}
\end{quotation}
This is a typical result when a {\sc prng} is fundamentally unsound.
Note that a $p$-value may be reported in one of five forms: a floating-point value,
{\tt eps}, {\tt 1~-~eps}, {\tt eps1}, or {\tt 1~-~eps1}. Tests 10, 21, and especially 96 have extraordinarily tiny $p$-values.
Test 27 also has a tiny $p$-value, but we are not told exactly what it is, only that it is less than $10^{-300}$.
Test 58 has a $p$-value that is very close to $1$; we are not told exactly how close, only that the difference
is less than $10^{-15}$.
We will refer to each test explicitly listed before the phrase ``{\tt All other tests were passed}'' as an \emph{anomaly} of the test run.
The distillation software for BigCrush test runs distills a set of anomalies into a pair of integers $(f, c)$ (a \emph{failure level} and a \emph{count}) in this manner:
\begin{myitemize}
\item If a test run file is missing for some reason, then $(f, c) = (-1, 0)$.
\item If a test run file is present but is incomplete or otherwise malformed for some reason, then $(f, c) = (-2, 0)$.
\item If a test run file is present and all tests were passed, then $(f, c) = (0, 0)$.
\item Otherwise, the test run file was present and well-formed but reported anomalies.
\begin{myitemize}
\item The reported $p$-value for each anomaly is transformed into a floating-point number as follows:
\begin{myitemize}
\item A $p$-value listed as {\tt eps} or {\tt 1~-~eps} becomes {\tt 1.0e-300}.
\item A $p$-value listed as {\tt eps1} or {\tt 1~-~eps1} becomes {\tt 1.0e-15}.
\item A $p$-value already listed as a floating-point value greater than {\tt 0.5} is subtracted from {\tt 1.0}.
\item A $p$-value already listed as a floating-point value not greater than {\tt 0.5} remains as is.
\end{myitemize}
\item Next, the floating-point $p$-value is categorized into one of five \emph{failure levels}:
\begin{myitemize}
\item If it is less than {\tt 1.0e-300}, it is failure level $5$.
\item Otherwise, if it is less than {\tt 1.0e-15}, it is failure level $4$.
\item Otherwise, if it is less than {\tt 1.0e-6}, it is failure level $3$.
\item Otherwise, if it is less than {\tt 1.0e-4}, it is failure level $2$.
\item Otherwise, it must be less than {\tt 1.0e-3}, and it is failure level $1$.
\end{myitemize}
\item Next, let $f$ be the highest failure level among all anomalies for the test run is identified, and let $c$ be the number of anomalies having that highest failure level.
\end{myitemize}
\end{myitemize}
For charting purposes, the pair of integers $(f,c)$ is then reduced to a symbol and/or number in this way:
\begin{myitemize}
\item $(0, c)$ becomes whitespace (no anomalies).
\item $(1, c)$ becomes the number $c$ (minor anomalies, of which many are expected when doing thousands of tests).
\item $(2, c)$ becomes the number $c$ within a circle.
\item $(3, c)$ becomes the number $c$ within a square.
\item $(4, c)$ becomes the number $c$ in white on a solid black circle.
\item $(5, c)$ becomes the number $c$ in white on a solid black square.
\item $(-1, c)$ becomes a light gray square (indicating missing data).
\item $(-2, c)$ becomes $\times$ on a light gray square (indicating malformed data).
\end{myitemize}
\newpage
\section{Details about Distilling PractRand Reports}
The PractRand test suite runs for an indefinite amount of time, normally producing intermediate reports after processing $2^m$ bytes of generated pseudorandom values for all integer values of $m$ starting with $m=27$. (We have chosen to provide command-line arguments that cause additional reports to be produced after processing
$0.375\times 2^{40}$, $0.75\times 2^{40}$, $1.25\times 2^{40}$, $1.5\times 2^{40}$, $1.75\times 2^{40}$, $2.25\times 2^{40}$, $2.5\times 2^{40}$, $2.75\times 2^{40}$, $3\times 2^{40}$, $3.25\times 2^{40}$, $3.5\times 2^{40}$, and $3.75\times 2^{40}$ bytes. We also provide a command-line arguments that terminate the test run either after the first report that prints ``{\tt FAIL}'' or after testing 4 terabytes of data, whichever comes first.) For an intermediate report produced after processing $2^m$ bytes of generated pseudorandom values, PractRand computes $4m-56$ separate statistics; thus the first report (for $m=27$) reports 52 test results, and the report for $m=42$ (4 terabytes) reports 112 test results.
A single test run that gets all the way to 4 terabytes typically prints about 5 kilobytes of information. Here is an example of a run that was terminated after examining 64~gigabytes of generated values:
\begin{quotation}
\begin{tt}\catcode'136=11 \catcode'137=11
\begin{tabbing}
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 128 megabytes (2^27 bytes), time= 2.6 seconds \\
~~Test Name~~~~~~~~~~~~~~~~~~~~~Raw~~~~~~~~~Processed~~~~~Evaluation \\
~~[Low4/64]BCFN(2+1,13-5)~~~~~~~R=~~+9.9~~~~p =~~4.8e-4~~~unusual~~~~~~~~~ \\
~~...and 51 test result(s) without anomalies \\
\\
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 256 megabytes (2^28 bytes), time= 5.9 seconds \\
~~no anomalies in 56 test result(s) \\
\\
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 512 megabytes (2^29 bytes), time= 11.6 seconds \\
~~no anomalies in 60 test result(s) \\
\\
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 1 gigabyte (2^30 bytes), time= 22.2 seconds \\
~~no anomalies in 64 test result(s) \\
\\
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 2 gigabytes (2^31 bytes), time= 42.6 seconds \\
~~no anomalies in 68 test result(s) \\
\\
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 4 gigabytes (2^32 bytes), time= 82.4 seconds \\
~~no anomalies in 72 test result(s) \\
\\
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 8 gigabytes (2^33 bytes), time= 161 seconds \\
~~no anomalies in 76 test result(s) \\
\\
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 16 gigabytes (2^34 bytes), time= 319 seconds \\
~~Test Name~~~~~~~~~~~~~~~~~~~~~Raw~~~~~~~~~Processed~~~~~Evaluation \\
~~Gap-16:B~~~~~~~~~~~~~~~~~~~~~~R=~~+5.8~~~~p =~~1.9e-4~~~suspicious~~~~~~ \\
~~...and 79 test result(s) without anomalies \\
\\
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 32 gigabytes (2^35 bytes), time= 631 seconds \\
~~Test Name~~~~~~~~~~~~~~~~~~~~~Raw~~~~~~~~~Processed~~~~~Evaluation \\
~~Gap-16:B~~~~~~~~~~~~~~~~~~~~~~R=~~+8.3~~~~p =~~1.2e-6~~~very suspicious \\
~~[Low4/64]DC6-9x1Bytes-1~~~~~~~R=~~+5.7~~~~p =~~3.7e-3~~~unusual~~~~~~~~~ \\
~~...and 82 test result(s) without anomalies \\
\\
rng=splitmix_anneal32a_seq, seed=0x5555555555555555 \\
length= 64 gigabytes (2^36 bytes), time= 1253 seconds \\
~~Test Name~~~~~~~~~~~~~~~~~~~~~Raw~~~~~~~~~Processed~~~~~Evaluation \\
~~Gap-16:A~~~~~~~~~~~~~~~~~~~~~~R= +10.0~~~~p =~~1.0e-6~~~very suspicious \\
~~Gap-16:B~~~~~~~~~~~~~~~~~~~~~~R= +17.5~~~~p =~~1.1e-10~~~~FAIL !~~~~~~~~ \\
~~[Low1/64]DC6-9x1Bytes-1~~~~~~~R=~~-4.2~~~~p =1-5.0e-3~~~unusual~~~~~~~~~ \\
~~...and 85 test result(s) without anomalies
\end{tabbing}
\end{tt}
\end{quotation}
This is a typical result when a {\sc prng} is fundamentally unsound. Note that PractRand reports not only a $p$-value for each anomaly but also a word or phrase assessing that $p$-value; we choose to rely on these nonnumerical assessments in distilling the reports. The complete set of assessments used, in increasing order of severity, is
{\tt unusual},
{\tt suspicious},
{\tt SUSPICIOUS},
{\tt very suspicious},
{\tt VERY SUSPICIOUS}, and
{\tt FAIL}. (PractRand may further print a varying number of exclamation points after the word ``{\tt FAIL}'' but we choose to ignore those.)
The distillation software for PractRand test runs distills a set of anomalies into a pair of integers $(f, c)$ (a \emph{failure level} and a \emph{count}) in a manner not too different from the strategy used for BigCrush:
\begin{myitemize}
\item If a test run file is missing for some reason, then $(f, c) = (-1, 0)$.
\item If a test run file is present but is incomplete or otherwise malformed for some reason, then $(f, c) = (-2, 0)$.
\item If a test run file is present and no anomalies were observed after processing 4 terabytes of generated pseudorandom data, then $(f, c) = (0, 0)$.
\item Otherwise, the test run file was present and well-formed but reported one or more anomalies.
\begin{myitemize}
\item Each anomaly is categorized into one of 11 \emph{failure levels}:
\begin{myitemize}
\item If the assessment was {\tt FAIL}:
\begin{myitemize}
\item If the assessment occurred for $m < 31$, it is failure level $11$.
\item Otherwise, if the assessment occurred for $m < 34$, it is failure level $10$.
\item Otherwise, if the assessment occurred for $m < 37$, it is failure level $9$.
\item Otherwise, if the assessment occurred for $m < 40$, it is failure level $8$.
\item Otherwise, if the assessment occurred for $m < 42$, it is failure level $7$.
\item Otherwise, it is failure level $6$.
\end{myitemize}
\item If the assessment was {\tt VERY SUSPICIOUS}, it is failure level $5$.
\item If the assessment was {\tt very suspicious}, it is failure level $4$.
\item If the assessment was {\tt SUSPICIOUS}, it is failure level $3$.
\item If the assessment was {\tt suspicious}, it is failure level $2$.
\item If the assessment was {\tt unusual}, it is failure level $1$.
\end{myitemize}
\item Next, let $f$ be the highest failure level among all anomalies for the test run, and let $c$ be the number of anomalies having that highest failure level.
\end{myitemize}
\end{myitemize}
For charting purposes, the pair of integers $(f,c)$ is then reduced to a symbol and/or number in this way:
\begin{myitemize}
\item $(0, c)$ becomes whitespace (no anomalies).
\item $(1, c)$ becomes the number $c$ (minor anomalies; many are expected when doing thousands of tests).
\item $(2, c)$ becomes the number $c$ within a circle.
\item $(3, c)$ becomes the number $c$ within a square.
\item $(4, c)$ becomes the number $c$ in white within a solid dark gray circle.
\item $(5, c)$ becomes the number $c$ in white within a solid dark gray square.
\item $(6, c)$ becomes the number $c$ in white within a solid black diamond.
\item $(7, c)$ becomes the number $c$ in white within a solid black diamond within a circle.
\item $(8, c)$ becomes the number $c$ in white within a solid black diamond within a square.
\item $(9, c)$ becomes the number $c$ in white on a solid black circle.
\item $(10, c)$ becomes the number $c$ in white on a solid black circle within a square.
\item $(11, c)$ becomes the number $c$ in white on a solid black square.
\item $(-1, c)$ becomes a light gray square (indicating missing data).
\item $(-2, c)$ becomes $\times$ on a light gray square (indicating malformed data).
\end{myitemize}
\end{document}