Why does pytorch have different devices of the random number generator, if no default one is given?

potatolige · March 24, 2022, 3:12pm

Hi,

I am facing an issue caused by the random number generator.

As my code works in most cases on GPU, and I am too lazy to manually set the device when I initialize a tensor through torch.Tensor(...),

I set the default tensor’s type to cuda.FloadTensor by torch.set_default_tensor_type('torch.FloatTensor')

Then I found all the random number generation operation, such as torch.randperm(3), will return me a result as a GPU tensor, even if I did not specify the random number generator’s devcie to cuda. Very Cool!

However, since my dataset is not very big, I decide to save it in GPU memory, and use a dataloader to load minibatches. Then I found if I do not specify a cuda generator generator=torch.Generator(device="cuda") when I initialize the dataloader, it will use the cpu as the default device and thus raise an exception when I yield a minibatch from the loader: RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'.

So why is this the case, i.e. not taking cuda as the default generator’s device type? Why is the behaviour here different to, e.g., torch.randperm(3) case?

Besides, since I want to control the randomness and reproduce my result, I always fix the randomness seed in each run, e.g. 1234, and according to the Controlling sources of randomness, calling torch.manual_seed(0) will set the seed to 0 for generators of both cpu’s and gpu’s breifly in one line. But this seed will not influence the gpu generator of the dataloader. I need to formally initialize a gpu generator, set the seed manually and then initilize the dataloader. Otherwise, I cannot make sure that the dataloader’s seed is the same to the other operations like torch.randperm(3).

Since it is a solvable issue, I did not put it in github’s issue. But it breaks the beauty of setting randomness globally and brings exception of the official tutorial. So I would like to hear any suggestion from yours.

Best,
Bruce

srishti-git1110 · October 5, 2022, 3:17pm

Hi,
I was working on something related and came across this post - nice one!

Could you please elaborate what exactly you meant by saving the data in the GPU memory?
Did you do that in the __init__() or the __getitem()__ method of the Dataset class or did you mean something else?

ptrblck · October 5, 2022, 3:34pm

From my experience: setting the default tensor type to a CUDATensor was never properly working for all methods and some operations were breaking, just as a heads-up in case you want to use the same approach.

potatolige · October 5, 2022, 3:40pm

Hi, I meant I just save all my data in a big tensor with device=“cuda”, I found it is faster than save it in cpu tensor and get a batch out of it and then change the device to “cpu”.

I think in the lastest version of pytorch 1.12, they have changed the usage of dataloader. Now it is not possible to give a generator to the init function of dataloader anymore. So I guess the bug has been fixed somehow.