I am training network with two GPUs.
I found many topics said that pinned memory can help improve the training speed a lot. But when I used pinned memory, it has not any speedup.Following is my code.
1, Make the DataLoader return batches placed in pinned memory by passing pin_memory=True to its constructor.
data_loader = torch.utils.data.DataLoader(
dataset,
batch_size=self.opt.batchSize, #batchSize is 6.
shuffle=bool(self.opt.shuffle_data),
num_workers=int(self.opt.nThreads), pin_memory=True)
2, Use pin_memory() method and pass an additional async=True argument to a cuda() call.
data is torch.FloatTensor, come from DataLoader
#self.input_A and self.input_B are torch.cuda.FloatTensor
input_A = torch.Tensor.pin_memory(data)
input_B = torch.Tensor.pin_memory(data)
self.input_A.resize_(input_A.size()).copy_(input_A, async=True)
self.input_B.resize_(input_B.size()).copy_(input_B, async=True)
self.real_A = Variable(self.input_A)
self.fake_B = self.netG.forward(self.real_A)
self.real_B = Variable(self.input_B)
--------start training D and G network-------------
Is my code wrong? Does anyone has any ideas or give me a example to show how to use pinned memory?, thanks a lot.