Hello,
I am training a model on 2 GPUs. In the get_item of my custom dataloader, pre-processing is done on GPU to increase speed.
When I set num_workers to 0, the memory usage of my 2 GPUs are the same : 22k/32k.
However, When I set num_workers=2 with the following code:
trainloader = DataLoader(train, batch_size=4, shuffle=True, num_workers=2, persistent_workers=True)
The memory usage of my first GPU becomes 31k/32k, while the memory usage of my second GPU remains 22k/32k.
Please, do you have any idea ?
EDIT : When I display the device used in the dataloader, only GPU 0 is used. GPU 1 is never used.
array = np.load(path)
tensor = torch.from_numpy(tensor).to(self.device)
print(tensor.device) # shows only cuda:0
where self.device is initialized in init such as :
self.device = torch.device('cuda')