Getting "error invalid argument 4: out of range" on backprop

tetratrio · January 20, 2018, 4:58pm

Hey!

I’m testing out FlowNet2 implemented in pytorch by fitsumreda and would like to backprop through the network to find the gradients for two input images. The loss im backpropping through is the mse of the flow of the input and constant target flow.

When calling backward() I get the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)

<ipython-input-21-9531bba54470> in transfer_style(style_imgs, content_imgs, steps, optimizer_type, lr, weights, batch_norm, distance_type, return_intermediate, blur_interv, blur_kernel_size, blur_sigma_sq)
     79                 loss = loss_fn(style,content_param,weights,distance_type)
     80                 loss += flow_loss(content_param)
---> 81                 loss.backward()
     82                 optimizer.step()
     83                 est.tick('Step {}/{}, loss: {}'.format(step+1,steps,loss.data.cpu()[0]))

~/.local/lib/python3.5/site-packages/torch/autograd/variable.py in backward(self, gradient, retain_graph, create_graph, retain_variables)
    165                 Variable.
    166         """
--> 167         torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
    168 
    169     def register_hook(self, hook):

~/.local/lib/python3.5/site-packages/torch/autograd/__init__.py in backward(variables, grad_variables, retain_graph, create_graph, retain_variables)
     97 
     98     Variable._execution_engine.run_backward(
---> 99         variables, grad_variables, retain_graph)
    100 
    101 

RuntimeError: invalid argument 4: out of range at /pytorch/torch/lib/THC/generic/THCTensor.c:450

How do I interpret this error? Usually it is very easy to understand what went wrong in pytorch thanks to the excellent error messages but as this error is happening in compiled C code I can’t figure out what is wrong.

Worth mentioning is that the forward pass works perfectly and produces correct results. Calling backward() also works if I remove the flownet loss.

If I had more free time I would try to spend time on debugging but this is something im doing as a hobby and free time is limited at the moment.

It might also just be me who is not using the flownet2 implementation correctly.

Im using python 3.5 and pytorch 0.3 installed via pip.

Thanks in advance

richard · January 23, 2018, 4:59pm

Based on the code, it looks like a narrow operation has incorrect arguments. However, it’s strange that this only shows up in your backward pass but not your forward pass, so it could be a bug with pytorch (or with the backward pass, if it was custom defined).

tetratrio · January 25, 2018, 6:15pm

Im going to try and figure this out when I have time. Im not too familiar with the C code that pytorch is using, thanks for suggesting what the error could be!
As the model uses custom layers the backward implementation might be flawed somewhere. The custom network (flownet2) was written in python2.7 using pytorch 0.2 which is not what I am using so maybe that can also be the source of the error?

I will update this thread if I find time to go through all the code

richard · January 25, 2018, 8:52pm

You can try gdb-ing through it. Here’s a quick guide.