Hi,
I am facing an issue caused by the random number generator.
As my code works in most cases on GPU, and I am too lazy to manually set the device when I initialize a tensor through torch.Tensor(...)
,
I set the default tensor’s type to cuda.FloadTensor by torch.set_default_tensor_type('torch.FloatTensor')
Then I found all the random number generation operation, such as torch.randperm(3)
, will return me a result as a GPU tensor, even if I did not specify the random number generator’s devcie to cuda. Very Cool!
However, since my dataset is not very big, I decide to save it in GPU memory, and use a dataloader to load minibatches. Then I found if I do not specify a cuda generator generator=torch.Generator(device="cuda")
when I initialize the dataloader, it will use the cpu as the default device and thus raise an exception when I yield a minibatch from the loader: RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'
.
So why is this the case, i.e. not taking cuda as the default generator’s device type? Why is the behaviour here different to, e.g., torch.randperm(3)
case?
Besides, since I want to control the randomness and reproduce my result, I always fix the randomness seed in each run, e.g. 1234, and according to the Controlling sources of randomness, calling torch.manual_seed(0)
will set the seed to 0 for generators of both cpu’s and gpu’s breifly in one line. But this seed will not influence the gpu generator of the dataloader. I need to formally initialize a gpu generator, set the seed manually and then initilize the dataloader. Otherwise, I cannot make sure that the dataloader’s seed is the same to the other operations like torch.randperm(3)
.
Since it is a solvable issue, I did not put it in github’s issue. But it breaks the beauty of setting randomness globally and brings exception of the official tutorial. So I would like to hear any suggestion from yours.
Best,
Bruce