In every training loop, I use DataLoader to load a batch of image into CPU, and move it to GPU like this:
from torch.utils.data import DataLoader
batchsize = 64
trainset = datasets.CIFAR10(blahblah…)
train_loader = DataLoader(train_dataset, batch_size=batchsize, shuffle=True, num_workers=2)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def train(epoch):
for batch_index, data in enumerate(train_loader, 0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
I know that I can wrap dataloader and calling .to(device)
in advance instead of using it in every training batch. But .to(device)
itself is time-consuming, I mean, transfer a tensor from CPU to GPU is much slower than creating a tensor directly on GPU, isn’t it?
for example:
(randomTensorA is created on CPU and using .to() function to transfer it on GPU,
randomTensorB is created on GPU)
import time
import torch
shape = [300, 300, 300]
a = time.time()
for _ in range(100):
randomTensorA = torch.randn(shape).to(torch.device('cuda'))
b = time.time()
print('Elapsed Time: %f' % (b-a))
a = time.time()
for _ in range(100):
randomTensorB = torch.randn(shape, device='cuda')
b = time.time()
print('Elapsed Time: %f' % (b-a))
Terminal output:
Elapsed Time: 24.316857
Elapsed Time: 1.658716
So, is there anyway to let dataloader load dataset directly on GPU? Please let me know, thanks.