in that attn_weights [1,700] and encoder_outputs [1,512]. The problem should be with encoder outputs because in that i am passing a single sentence for that sentence its giving matrix with 512 values and in the encoder outputs the input is 1 sentence so its giving the shape as [1*512] and if i do bmm with that value then only the error is coming @ptrblck
Are you sure the dimension error is coming from this line of code?
I’m not sure to understand the shapes correctly. Your encoder_outputs seem to have a shape of [17,0]?
That is because of dimension as i think because bmm the shape should be (b×n×m) and (b×m×p) then only it will give (b×n×p) as the output shape here the attn_weights is [1,700] and encoder_outputs shape is [1,512] but it should have to be in shape [700,512] then only the bmm will work correctly. @ptrblck
Would it work if you unsqueeze attn_weights in dim2 and encoder_outputs in dim1? Based on your description this should yield your desired output shape.
Everything is working fine in single GPU, CPU but when working on multi-GPU the batch size is getting splitted correctly but the problem arises in initialising the hidden layer because in there i initialized it to a max-length.how could i set this to correct shape dynamically while allocating multi-gpu
In a similar set up with multi-gpu and RNN+Attention I run in the same error as @Kishore_G. After going line by line I have also narrowed it down to the torch.bmm line.
In my case I am multiplying the attention scores with shape torch.Size([2852,59,1]) with the initial vectors torch.Size([2852,59,100]) where those represent (batch_size, sentence_size, vector_dim).
I think I found the error by looking at the bmm docs. You have to swap the dimension:
input and mat2 must be 3-D tensors each containing the same number of matrices.
If input is a (b \times n \times m)(b×n×m) tensor, mat2 is a (b \times m \times p)(b×m×p) tensor, out will be a (b \times n \times p)(b×n×p) tensor.