During the training of the first epoch, the program got “killed” after 957/2354 batches and there was no other message.
The dataloader was using the default
The code ran well when I ran through a smaller dataset.
I’m using GPU.
I did set
optimizer.zero_grad()at the beginning of each batch within the dataloader loop.
Is it the issue of GPU memory?
Do I need to set
torch.cuda.empty_cache()after each batch?
Here’s the versions I am using.
# Name Version Build Channel pytorch 1.2.0 py3.6_cuda9.2.148_cudnn7.6.2_0 pytorch torchvision 0.4.0 py36_cu92 pytorch
Please let me know if you have any suggestions. Thank you.