Loss.backward() cause RuntimeError: invalid argument (in version 0.1.12 no error)

If I use batch_size=1, my Skip-gram implementation will raise error as following: (in pytorch 0.1.12 the code works well. but in 0.3.0 raises Runtime Errors)

  • CPU version:
Traceback (most recent call last):
  File "/home/zarzen/Dev/sgns/train.py", line 32, in <module>
    train()
  File "/home/zarzen/Dev/sgns/train.py", line 25, in train
    loss.backward()
  File "/home/zarzen/.pyenv/versions/anaconda3-4.3.0/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/zarzen/.pyenv/versions/anaconda3-4.3.0/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: invalid argument 1: expected 3D tensor, got 4D at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/TH/generic/THTensorMath.c:1630
  • GPU version
# same position traceback as CPU version
RuntimeError: invalid argument 6: expected 3D tensor at /Users/zarzen/Dev/pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:442

I have uploaded codes at Github: https://github.com/zarzen/sgns. To reproduce the bug, changing batch_size=1 in train.py. To run the code, run preprocess.py first then train.py.

If I change batch_size greater than 1, the implementation works fine. I know batch_size==1 is not reasonable. But if the data_length % batch_size == 1 then the last batch produced by dataloader could be sized 1, which will cause problem. The easy fix could be checking whether such situation will happen or not and changing batch_size correspondingly. I just hope to know whether my code logic
contains some more crucial issues.

Following is the model snippet:
(The input data of skipgram mode is pairs
[ (w_t1, w_c1),
(w_t2, w_c2),
…]
)

class EmbeddingNN(nn.Module):
  """ single hidden layer embedding model"""
  def __init__(self, voc_size, emb_size=300, init_with=None):
    super(EmbeddingNN, self).__init__()
    padding_idx = 0
    self.voc_size = voc_size
    self.emb_size = emb_size
    self.iembeddings = nn.Embedding(self.voc_size, self.emb_size)
    self.oembeddings = nn.Embedding(self.voc_size, self.emb_size)
    # pylint: disable=no-member
    if init_with is not None:
      assert init_with.shape == (voc_size, emb_size)
      self.iembeddings.weight = nn.Parameter(FloatTensor(init_with))
    else:
      self.iembeddings.weight = nn.Parameter(FloatTensor(voc_size, emb_size).uniform_(-1, 1))
    self.oembeddings.weight = nn.Parameter(FloatTensor(voc_size, emb_size).uniform_(-1, 1))
    # pylint: enable=no-member
    self.iembeddings.weight.requires_grad = True
    self.oembeddings.weight.requires_grad = True


  def forward(self, data):
    """"""
    return self.forward_i(data)


  def forward_i(self, data):
    """ get input vectors"""
    idxs = Variable(LongTensor(data))
    idxs = idxs.cuda() if self.iembeddings.weight.is_cuda else idxs
    return self.iembeddings(idxs)


  def forward_o(self, data):
    """ get output vectors"""
    idxs = Variable(LongTensor(data))
    idxs = idxs.cuda() if self.oembeddings.weight.is_cuda else idxs
    return self.oembeddings(idxs)


  def get_emb_dim(self):
    return self.emb_size


class SkipGram(nn.Module):
  """"""

  def __init__(self, emb_nn, n_negs=64, weights=None):
    super(SkipGram, self).__init__()
    self.emb_model = emb_nn
    self.voc_size = emb_nn.get_emb_dim()
    self.n_negs = n_negs
    self.neg_sample_weights = None
    if weights is not None:
      wf = np.power(weights, 0.75) # pylint: disable=no-member
      wf = wf / wf.sum()
      self.neg_sample_weights = FloatTensor(wf)


  def forward(self, data):
    """ data is a list of pairs"""
    batch_size = len(data[0])
    iwords = data[0]
    owords = data[1]
    if self.neg_sample_weights is not None:
      # pylint: disable=no-member
      nwords = t.multinomial(self.neg_sample_weights,
                             batch_size * self.n_negs,
                             replacement=True).view(batch_size, -1)
    else:
      nwords = FloatTensor(batch_size, self.n_negs).uniform_(0, self.voc_size - 1).long()

    ivectors = self.emb_model.forward_i(iwords).unsqueeze(2)
    ovectors = self.emb_model.forward_o(owords).unsqueeze(1)
    nvectors = self.emb_model.forward_o(nwords).neg() # important

    # pylint: disable=no-member
    oloss = t.bmm(ovectors, ivectors).squeeze().sigmoid().log()
    nloss = t.bmm(nvectors, ivectors).squeeze().sigmoid().log().view(-1, 1, self.n_negs).sum(2).mean(1)
    return -(oloss + nloss).mean()

Environment info:
pytorch: 0.3.0
python: 3.6.0(anaconda3-4.3.0)

Thanks!

It’s a little hard to tell what’s going on here. It would be great if you could find, from your code, a minimal script that demonstrates the RuntimeError.

I think what’s happening is that when batch_size = 1, you might need to unsqueeze the first dimension of your data. For example, if your data is N batches of tensors of size C, then when batch_size=1, the DataLoader might give you a tensor of size C instead of 1, C.

Hi Richard,

Thanks for reply.
I have checked the size of batch produced by dataloader, I think no problem there.
When batch_size == 1, dataloader will produce data as [LongTensor([x1]), LongTensor([y1])]. For batch_size > 1, for example batch_size == 2 produced batch is [LongTensor([x1,x2]), LongTensor([y1, y2])].
And also I have checked the size of vectors after embedding lookup.

Because the code can run on older PyTorch, I am wondering if it is a bug?

Could be a bug, but I know a few things were changed between the two versions. It’s hard to tell without a minimal example.

One other thing you could do that would be helpful is to provide a backtrace of the RuntimError. Something like:

gdb python
catch throw
run <script name>
backtrace

Sorry for late reply.
Here is full backtrace I can have based on your commands:

(gdb) catch throw
Catchpoint 1 (throw)
(gdb) run train.py 
Starting program: /home/zarzen/anaconda3/bin/python train.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffb36dd700 (LWP 11650)]
[New Thread 0x7fffb2edc700 (LWP 11651)]
[New Thread 0x7fffb26db700 (LWP 11652)]
[New Thread 0x7fffabeda700 (LWP 11654)]
[New Thread 0x7fffab6d9700 (LWP 11655)]
[New Thread 0x7fffa2ed8700 (LWP 11656)]
Traceback (most recent call last):
  File "train.py", line 32, in <module>
    train()
  File "train.py", line 25, in train
    loss.backward()
  File "/home/zarzen/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/home/zarzen/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: invalid argument 1: expected 3D tensor, got 4D at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/TH/generic/THTensorMath.c:1630
[Thread 0x7fffa2ed8700 (LWP 11656) exited]
[Thread 0x7fffab6d9700 (LWP 11655) exited]
[Thread 0x7fffabeda700 (LWP 11654) exited]
[Thread 0x7fffb26db700 (LWP 11652) exited]
[Thread 0x7fffb2edc700 (LWP 11651) exited]
[Thread 0x7fffb36dd700 (LWP 11650) exited]
[Inferior 1 (process 11646) exited with code 01]

Hi all, I observe the same.

it occurs in loss backward, if the batchsize at the end of the dataloader sampling becomes 1:

loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([64, 3, 256, 256]) torch.Size([64, 1])
loss torch.Size([1]) torch.Size([1, 3, 256, 256]) torch.Size([1, 1])
Traceback (most recent call last):

… i omitted messages here … note: batchsize is now 1

in train_model3
loss.backward()
File “/home/binder/entwurf6/virtuals/pytorch_py3/lib/python3.5/site-packages/torch/autograd/variable.py”, line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “/home/binder/entwurf6/virtuals/pytorch_py3/lib/python3.5/site-packages/torch/autograd/init.py”, line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: invalid argument 6: expected 3D tensor at /pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:442

What happens if you change all instances of bmm to matmul?

Sorry for late reply.

But how can I change bmm to matmul?
I have two matrix A(batch_size, 300) and B(batch_size, 300), I want to do row-wise dot product. In numpy I can use einsum api to achieve this. In Pytorch only thing I know is bmm with unsequeeze.

Another way is A*B.T and then get values setting on the diagonal.

torch.matmul does the same thing as torch.bmm: it does batch matrix multiplication. What happens when you replace all instances of torch.bmm with torch.matmul?

Yes, the issue fixed.
Thanks!