Backward size mismatch error when adding the bias term


I am trying to implement a batch matrix multiplication like the first equation in this image.

The weight and bias are defined as a parameter in the model. I am making a copy of the bias term to the entire batch.

def batch_matmul_bias(seq, weight, bias, nonlinearity=''):
s = None
bias_dim = bias.size()
for i in range(seq.size(0)):
    _s =[i], weight) 
    _s_bias = _s + bias.expand(bias_dim[0], _s.size()[0])
    print _s_bias.size()
        _s_bias = torch.tanh(_s_bias)
    _s_bias = _s_bias.unsqueeze(0)
    if(s is None):
        s = _s_bias
        s =,_s_bias),0)
return s.squeeze()

The forward pass works, but when doing the backward pass, I am getting a size mismatch error.

RuntimeError: sizes do not match at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.7_1485448159614/work/torch/lib/THC/generated/../generic/

Can you help me fix it?

Thank you.

Is there any reason why you do not use builtin functions?
I guess you could do something along the lines of:

import torch.nn.functional as F

def batch_matmul_bias(seq, weight, bias, nonlinearity=''):
  s = F.linear(seq, weight, bias)
  if nonlinearity=='tanh':
    s = F.tanh(s)
  return s
1 Like

I just looked at the API carefully, and it looks like Linear supports batch samples and no bias as well, this will save a lot of time. Thank you!


Be careful because these functions supports only batch mode.
So if your input is not a batch, don’t forget to use .unsqueze(0) to make it as a batch of 1 element.

1 Like

I managed to fix it. For future reference, I was sloppy and did not properly reshape the bias term. Doing a transpose of the bias term is the one I forgot.

_s_bias = _s + bias.expand(bias_dim[0], _s.size()[0]).transpose(0,1)

Thank you for the wonderful effort that you’ve put in here, debugging is a lot easier.


Can you explain this?

Is there some straightforward way to reshape while using add_module() ?

ATM I have to resort to

class RESHAP(nn.Module):
	def __init__(self, nz):
		super(RESHAP, self).__init__() = nz

	def forward(self, input):
		return input.view(-1,, 1, 1)

	def __repr__(self):
		return self.__class__.__name__ + ' ()'

main.add_module(‘initial.{0}.reshape’.format(nz/4), RESHAP(nz/4))

@Veril if you dont use a sequential, then you can reshape with torch.view inside your forward function.

@Sandeep42 it’s a bug that has been fixed 2 days ago (gradient’s weren’t viewed properly). The problem is that the basic ops like +, *, - and / don’t care about the sizes of the inputs, but only that their number of elements match. The result has always the size of the first operand - this explains your screenshot.

I think we should add some strict size checking there anyway (once we add broadcasting) :confused:

I am looking forward to that. The fact that you can just interactively work with Tensors, and debug them is already a great thing. Thanks!