RuntimeError: start (0) + length (0) exceeds dimension size (0)

I am running a model for QA ranking. This model uses 2 LSTMs and a series of attention mechanisms. It works for 1 dataset, but when I run this model for another dataset, I get a weird looking error:

RuntimeError: start (0) + length (0) exceeds dimension size (0). (narrow at /opt/conda/conda-bld/pytorch_1532579805626/work/aten/src/ATen/native/TensorShape.cpp:157)

I don’t understand what the error is and hence I am unable to debug it. Plus, I don’t understand why does it work for one dataset, and doesn’t for the other dataset.

I am using 4 GPUs and torch.nn.parallel.data_parallel() in my code for data parallelism. It should not be any out of memory issue since only 50% of each of the 4 GPUs is utilized with the minimum batch size necessary for training.

Can someone please tell me how to solve this issue?

Probably due to your tensor shape, there is a narrow on empty tensor some where. You can try use a debugger and/or print out the tensor shapes to see where things got wrong.

Additionally, we have added proper empty tensor support, and will be available in next release. So the error message will be nicer. But that probably isn’t the root of your problem.

I am printing the tensor shapes, they are all as expected. The code works fine when I run it on 1 GPU. But with multiple GPUs, I get this error. Plus, this error happens with the last batch of training. It works fine with all the previous batches. I am printing the gradient of embeddings layer, it shows that there is a non zero gradient as well.

I can get the same error on a single GPU. In my case, I generated an empty batch record.

Hi, I got the same error. Do you know how to resolve this now?

Check that your tensors are not empty, as this error is apparently thrown if narrow encounters an empty tensor.