I’m translating some code from another framework into pytorch and they have a specific use-case of generating some data while training the model.
So, the scenario goes like this:
rngs_reserve = 10K
random.seed(seed)
np.random.seed(seed)
rng_seq = PRNGSequence(seed)
rng_seq.reserve(rngs_reserve) # reserve 10000 PRNG keys
# training loop
for step in steps: # run for 10K steps
data = sample_batch(next(rng_seq), ....) # this uses random generation e.g. torch.randint(generator=rng)
output = model(data)
.
.
.
As you can notice the rng_seq
is an iterator which every time returns a new pseudorandom key which is then passed on to the function generating random numbers, e.g., torch.randint(generator=rng)
.
So my attempt at converting this into pytorch is the following:
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
# training loop
for step in steps: # run for 10K steps
rng = torch.Generator()
data = sample_batch(rng, ....) # this uses random generation e.g. torch.randint(generator=rng)
output = model(data)
.
.
.
Questions:
Is this approach faithfully following the original approach?
The goal is to generate diverse data samples during the training so that we minimize overlap between samples generated on the fly, is there a better way to achieve this?