How to resolve the RuntimeError related to torch.cat() function?

I have an attention decoder whose forward function is as follows.

def forward(self, input, hidden, encoder_outputs):
    embedded = self.embedding(input).view(1, 1, -1)
    embedded = self.drop(embedded)
    attn_weights = F.softmax(self.attn(torch.cat((embedded[0], hidden[0]), 1)))

When ever the forward function gets called, I get the following error.

RuntimeError: inconsistent tensor sizes at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.9_1487344852722/work/torch/lib/THC/generic/THCTensorMath.cu:141

How to resolve this problem? The problem is associated with torch.cat(). Please help.

as reflected by this thread: How can I print the shape of a tensor inside the forward function?

you can add prints to figure out the problematic tensor shape for torch.cat

Thanks for your reply. I printed the shape and found the following.

embedded = self.embedding(input).view(1, 1, -1)
embedded = self.drop(embedded)
print(embedded[0].size(), hidden[0].size())

I am getting, torch.size([1, 300]) torch.size([1, 1, 300]). Why I am getting [1, 300] shape for embedded tensor even though I have used the view method as view(1, 1, -1)?

According to the docs Embedding layer returns a Tensor of the shape (N,W, embedding_dim) where N is the mini-batch size and W is number of indices to extract per mini-batch. After performing the view operation on that, you would get a tensor of the shape (1,1, N x W x embedding_dim). It is important to note that this is a 3 dimensional tensor. But since you are doing embedded[0].size(), you are essentially asking for the shape of the remaining two dimensions, which explains the result you are getting via the print statements. Hope this helps!

1 Like

Then can you tell me, why hidden[0].size() is working fine? hidden is the output of torch.nn.LSTM which is also a 3d tensor but whenever I try to print hidden.size(), i get error which says - ‘tuple’ object has no attribute ‘size’. Where I am doing the mistake?

It is probably because you are using LSTMs. Pytorch’s implementation returns to you both h_n and c_n (hidden state and cell state for the last time step) in the hidden variable as a tuple. In comparison, GRU would just return to you h_n. As a result for LSTMs, hidden[0] is giving you h_n.

Now it makes sense! Thanks.