Hi,
I’m working with affectnet which is about 450K images, totalling about 56Gb.
I’ve got a Titan Xp, and I’ve succesfully created a custom Dataset loader.
However, my training appears to be very slow because for each mini-batch the input/output tensors are re-allocated from host to gpu.
I’d like to try and upload some of the dataset images on the GPU, since it has about 12Gb of DDRAM. Problem is that it is working extremely slow. I’ve tested with CUDA 9.0 and now with CUDA 9.1.
The loop is something very simple such as:
for idx in range(len(self.labels.rows)):
if torch.cuda.memory_allocated() < MAX_GPU_MEM:
pair = self.__getitem__(idx)
in_tensor = pair[0].cuda(non_blocking=True).half()
out_tensor = pair[1].cuda(non_blocking=True).half()
self.data.append([in_tensor, out_tensor])
else:
print("GPU nearly maxed out")
break
print("in GPU RAM: ", len(self.data))
I can’t see if there is a method to allocate beforehand more GPU memory. I admit that some of the time spent is on pre-processing those images and transforming them, but the rate the GPU RAM increases is phenomenally slow.
Is there a way to upload them all together in a batch?
EDIT: maybe my issue is the pre-processing after all… I’ll try and profile the code