How to generate multiple pseudorandom number generators?

I’m translating some code from another framework into pytorch and they have a specific use-case of generating some data while training the model.

So, the scenario goes like this:

rngs_reserve = 10K
random.seed(seed)
np.random.seed(seed)
rng_seq = PRNGSequence(seed)
rng_seq.reserve(rngs_reserve) # reserve 10000 PRNG keys

# training loop
for step in steps: # run for 10K steps
    data = sample_batch(next(rng_seq), ....) # this uses random generation e.g. torch.randint(generator=rng)
    output = model(data)
    .
    .
    .

As you can notice the rng_seq is an iterator which every time returns a new pseudorandom key which is then passed on to the function generating random numbers, e.g., torch.randint(generator=rng).

So my attempt at converting this into pytorch is the following:

random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)

# training loop
for step in steps: # run for 10K steps
    rng = torch.Generator()
    data = sample_batch(rng, ....) # this uses random generation e.g. torch.randint(generator=rng)
    output = model(data)
    .
    .
    .

Questions:
Is this approach faithfully following the original approach?
The goal is to generate diverse data samples during the training so that we minimize overlap between samples generated on the fly, is there a better way to achieve this?

Hi Kirk!

The short story is that you really don’t want to use multiple pseudorandom
number generators. (Doing so can perversely make your numbers less
pseudorandom.) For most use cases, it will likely be best to use pytorch’s
global random number generator as it is automatically called by pytorch’s
various random functions.

(Distributed processing is a case where it makes sense to have multiple
random number generators running on multiple processors.)

To answer your specific question: Yes, you can do something like this, but
you need to initialize each instance of rng. You can call rng.seed() to
initialize each rng instance to some sort of “random” seed, or you can call
rng.manual_seed (seed) to initialize rng to your specific choice for seed.

Yes, this is the issue. Unless you use a nuanced scheme for initializing your
individual rngs, you do risk getting “overlap between samples generated on
the fly,”

Yes, don’t use multiple rngs (that you would have to initialize very carefully).
Just use pytorch’s global random number generator that (if well designed)
will not have “overlap” between subsequent sets of samples.

Best.

K. Frank

Hi KFrank, thanks for your reply.

So, when you mention pytorch global random number generator are you referring to torch.manual_seed() or something else? Could you please clarify with a MWE?

If we indeed end up using pytorch’s global rng wouldn’t that create samples with overlaps if for instance we sample lots of numbers? e.g., millions?

Another way I was thinking about doing this was using local number generators. For instance, in the code set you seeds for reproducibility and then in the train loop use different seeds for rng.

MWE of main.py

# set the seed for different random functions
os.environ['PYTHONHASHSEED'] = str(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)


# training loop
rng = torch.Generator() # Is this better to be one instane or generate multiple inside the for loop?
for step in steps: # run for 10K or 10M steps
    rng.manual_seed(step)
    data = sample_batch(rng, ....) # this uses random generation e.g. torch.randint(generator=rng)

The seed != step, so seed and step are always different to avoid overlap.

Hi Kirk!

Again, the short story is that people have invested a lot of effort into
implementing really good random number generators. Use what they’ve
done, as provided to you out-of-the-box by pytorch.

Pytorch’s official documentation is mostly silent about the details of its
random number generators.

What appears to be happening is that when you load pytorch (or maybe
when you first access some random numbers), pytorch instantiates a
Generator, initialized with some sort of a “random” seed, and makes it
available “globally.”

That is, all of pytorch’s functions that use random numbers call this
Generator to get their random numbers (unless you tell them to do
otherwise by passing in a generator argument).

So:

x = torch.randn (8)
y = torch.randperm (3)
z = torch.nn.Linear (2, 5)

all consume random numbers from the global Generator (in the order
they are called).

When you call torch.manual_seed (some_seed), you are reinitializing
this global Generator with some_seed.

As far as I can tell, pytorch uses, on the cpu*, a Mersenne twister algorithm
for its pseudorandom number generator, most likely the so-called “MT19937.”
This generator has a period of 2**19937 - 1. So the sequence it produces
contains lots and lots of millions of sample, To get overlaps, you would
have to sample more than 2**19937 - 1 numbers. (And if you did, you
would be in the soup with virtually all random number generators. Plus, your
application would take “a while” to run.)

If you do this, you will dramatically increase the probability of overlaps.
(In practice, you will probably be fine, but why go to the extra work just
to do something that is worse, even if it still works in practice?)

Note that using multiple copies of Generator doesn’t create more random
numbers – it just gives different chunks of the length-2**19937 - 1
sequence generated by a single copy of Generator. That sequence
won’t overlap until you’ve exhausted its full length. Your multiple copies
have no guarantee that they won’t overlap (but it’s unlikely that they will
unless you have lots of individual Generators or extract lots of numbers
from them).

*) On the gpu, I believe that pytorch uses the so-called “philox” algorithm.
A philox stream should have a period of at least 2**128 – still pretty long,
although much shorter than that of MT19937 – but can produce at least
2**64 independent (“non-overlapping”) streams, making it suitable for
running multiple generators on multiple gpus.

Best.

K. Frank