In every training loop, I use DataLoader to load a batch of image into CPU, and move it to GPU like this:
from torch.utils.data import DataLoader batchsize = 64 trainset = datasets.CIFAR10(blahblah…) train_loader = DataLoader(train_dataset, batch_size=batchsize, shuffle=True, num_workers=2) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") def train(epoch): for batch_index, data in enumerate(train_loader, 0): inputs, labels = data inputs, labels = inputs.to(device), labels.to(device)
I know that I can wrap dataloader and calling
.to(device) in advance instead of using it in every training batch. But
.to(device) itself is time-consuming, I mean, transfer a tensor from CPU to GPU is much slower than creating a tensor directly on GPU, isn’t it?
(randomTensorA is created on CPU and using .to() function to transfer it on GPU,
randomTensorB is created on GPU)
import time import torch shape = [300, 300, 300] a = time.time() for _ in range(100): randomTensorA = torch.randn(shape).to(torch.device('cuda')) b = time.time() print('Elapsed Time: %f' % (b-a)) a = time.time() for _ in range(100): randomTensorB = torch.randn(shape, device='cuda') b = time.time() print('Elapsed Time: %f' % (b-a))
Elapsed Time: 24.316857 Elapsed Time: 1.658716
So, is there anyway to let dataloader load dataset directly on GPU? Please let me know, thanks.