Hi,
I am running a CNN-HRNN model. CNN is a pre-trained VGG16 Encoder net. HRNN is a Hierachical RNN Language Model for image description generation. and I load the encoder on multi-GPU(like this model.Encoder = torch.nn.DataParallel(model.Encoder, device_ids=[0,1,2,7])).
When I am using DataLoader for loading one mini part of my dataset(about 4500 samples) ,It runs well with batch size=160. But when I am using DataLoader for loading more dataset(more than 12000 samples ), with the same 160 batch size,it runs out of GPU memory after several iters. Why?
I am getting this error:
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train.v1.py", line 472, in <module>
main()
File "train.v1.py", line 213, in main
train(train_loader, model, criterion, optimizer, epoch)
File "train.v1.py", line 328, in train
loss.backward()
File "/home/bbbian/local/anaconda/lib/python2.7/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/bbbian/local/anaconda/lib/python2.7/site-packages/torch/autograd/__init__.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
Here is some of my train code:
for i, data in enumerate(train_loader):
img=Variable ...
inputs=...
word_outputs, sent_outputs = model(img, inputs)
wordRNN_loss = criterion[0](word_outputs, inputs[:, :, 1:], inputs_mask[:, :, 1:])
sentRNN_loss = criterion[1](sent_outputs, inputs_num)
wordRNN_losses.update(wordRNN_loss.data[0], data[0].size(0))
sentRNN_losses.update(sentRNN_loss.data[0], data[0].size(0))
# combined loss
loss = sentRNN_loss * opts.lambda_sent + wordRNN_loss * opts.lambda_word
optimizer.zero_grad()
loss.backward()
optimizer.step()