CNN + LSTM gradient

DimTrigkakis · July 26, 2017, 1:24am

Hello everyone,

I have a problem that requires both a cnn and rnn, both trainable end-to-end.
I have begun my using a dataloader on sequences of images that get one tag at the end. The batch size is always one, so I only have a sequence of image tensors and a tag. I want to pass them through a cnn, and then use the 1 x n_features
feature vector as the input to the rnn.

The forward pass seems to work, but when I use loss.backward, I get

loss.backward()
File "/scratch/Dimitris/software/pytorch/pytorchenv/lib/python2.7/site-packages/torch/autograd/variable.py", line 146, in backward
  self._execution_engine.run_backward((self,), (gradient,), retain_variables)
File "/scratch/Dimitris/software/pytorch/pytorchenv/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py", line 175, in backward
  update_grad_input_fn(self._backend.library_state, input, grad_output, grad_input, *gi_args)
RuntimeError: out of range at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensor.c:23

The images are a tensor, and I pretend the sequence length is the batch size before putting an entire such tensor inside the cnn

    for img in imgs_in_play:

        img_tensor = t(PIL.Image.open(img))
        img_tensor.unsqueeze_(0)
        imgs_raw.append(img_tensor)

    imgs = torch.cat(imgs_raw,0)

Then the output is passed part by part into the rnn.

 imgs= torch.squeeze(imgs,0)
 imgs = Variable(imgs.cuda())

 output = vp.model(imgs)
 output.data.unsqueeze_(1)

 vp.rnn_model.zero_grad()
 hidden_rnn = vp.rnn_model.init_hidden()

 for i in range(output.size()[0]):
        output_rnn, hidden_rnn = vp.rnn_model(output[i], hidden_rnn)

I imagine that just concatenating all images together and pretending they are in a batch before giving them to the cnn model gives trouble when the rnn is trying to create the gradients, but I am not sure which is the proper way of connecting the ~100 cnn’s to the inputs of the rnn.

All tensor shapes are as expected, the final cnn output before the rnn is (sequence length x 1 x 2048) and the output_rnn contains 13 tag predictions , one per sequence.

Any help is appreciated!

EDIT: the convnet does not seem to be the problem, since creating random input tensors of the same shape give the exact same error.
EDIT2: I was unsqueezing both the output_rnn tensor and the target_tensor, but output_rnn should not have been unqsueezed().

Dimitris