Since I started to use the pytorch dataloader, I´ve got runtime problems with .cuda(). My dataset consists of 250000 .npy-files each containing a numpy array with the shape 33x27. I’m using the dataloader the following way:
# list containing all file paths
train_file_paths = getPaths(self.dir_training)
trainDataSet = IterDataset(feature_path=train_file_paths)
train_loader = utils.DataLoader(dataset=trainDataSet,batch_size=32,shuffle=False,num_workers=16,pin_memory=True)
My training loop calls each time a new batch (batch-size: 32) and stores it to the GPU via .cuda(). The model is stored to the GPU at the beginning of the script.
for i,(feature,labels) in enumerate(train_loader):
feature = Variable(feature.cuda(), requires_grad=True)
labels = Variable(labels.cuda(), requires_grad=True)
outputs = model(feature.float())
loss = criterion(outputs,labels.long())
optimizer.zero_grad()
loss.backward()
optimizer.step()
My Dataclass looks like the following (the first column of the ndarray is the label):
class IterDataset(Dataset):
def __init__(self, feature_path):
self.feature_path = feature_path
def __len__(self):
return len(self.feature_path)
def __getitem__(self, index):
feature = np.load(self.feature_path[index])
X = feature[:,1:]
y = feature[0,0]
# checking for NAN
if np.isnan(X).any():
print('NAN'+ self.feature_path[index])
return X, y
The cProfile for the first 1000 batches:
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2023 13.745 0.007 13.745 0.007 {method 'cuda' of 'torch._C._TensorBase' objects}
1000 1.931 0.002 1.931 0.002 {method 'run_backward' of 'torch._C._EngineBase' objects}
Hardware / Software I’m using:
- Cuda Version: 10.1
- GPU: 2x GeForce GTX 1080
So if anybody has an idea why .cuda() takes so much time, I would appreciate it.