I am running a model for QA ranking. This model uses 2 LSTMs and a series of attention mechanisms. It works for 1 dataset, but when I run this model for another dataset, I get a weird looking error:
RuntimeError: start (0) + length (0) exceeds dimension size (0). (narrow at /opt/conda/conda-bld/pytorch_1532579805626/work/aten/src/ATen/native/TensorShape.cpp:157)
I don’t understand what the error is and hence I am unable to debug it. Plus, I don’t understand why does it work for one dataset, and doesn’t for the other dataset.
I am using 4 GPUs and
torch.nn.parallel.data_parallel() in my code for data parallelism. It should not be any out of memory issue since only 50% of each of the 4 GPUs is utilized with the minimum batch size necessary for training.
Can someone please tell me how to solve this issue?