Training seq2seq translation with multi gpu gives error invalid argument 6: wrong matrix size at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:492

this is the attention decoder structure

AttnDecoderRNN(
(embedding): Embedding(79586, 256, padding_idx=2)
(attn): Linear(in_features=512, out_features=700, bias=True)
(attn_combine): Linear(in_features=512, out_features=256, bias=True)
(dropout): Dropout(p=0.1)
(gru): GRU(256, 256)
(out): Linear(in_features=256, out_features=79586, bias=True)
)
and encoder structure is

EncoderRNN(
(embedding): Embedding(93553, 256, padding_idx=2)
(gru): GRU(256, 256)
)
@ptrblck

in the below line

attn_applied = torch.bmm(attn_weights.unsqueeze(0),
                                 encoder_outputs.unsqueeze(0))

in that attn_weights [1,700] and encoder_outputs [1,512]. The problem should be with encoder outputs because in that i am passing a single sentence for that sentence its giving matrix with 512 values and in the encoder outputs the input is 1 sentence so its giving the shape as [1*512] and if i do bmm with that value then only the error is coming @ptrblck

Are you sure the dimension error is coming from this line of code?
I’m not sure to understand the shapes correctly. Your encoder_outputs seem to have a shape of [17,0]?

That is because of dimension as i think because bmm the shape should be (b×n×m) and (b×m×p) then only it will give (b×n×p) as the output shape here the attn_weights is [1,700] and encoder_outputs shape is [1,512] but it should have to be in shape [700,512] then only the bmm will work correctly. @ptrblck

RuntimeError: invalid argument 2: wrong matrix size, batch1: 1x700, batch2: 1x512 at /pytorch/aten/src/TH/generic/THTensorMath.cpp:2312

This is the error i got recently and coming from the line


attn_applied = torch.bmm(attn_weights.unsqueeze(0),
                                  encoder_outputs.unsqueeze(0))

Would it work if you unsqueeze attn_weights in dim2 and encoder_outputs in dim1? Based on your description this should yield your desired output shape.

Everything is working fine in single GPU, CPU but when working on multi-GPU the batch size is getting splitted correctly but the problem arises in initialising the hidden layer because in there i initialized it to a max-length.how could i set this to correct shape dynamically while allocating multi-gpu

In a similar set up with multi-gpu and RNN+Attention I run in the same error as @Kishore_G. After going line by line I have also narrowed it down to the torch.bmm line.

In my case I am multiplying the attention scores with shape torch.Size([2852,59,1]) with the initial vectors torch.Size([2852,59,100]) where those represent (batch_size, sentence_size, vector_dim).

I think I found the error by looking at the bmm docs. You have to swap the dimension:

input and mat2 must be 3-D tensors each containing the same number of matrices.
If input is a (b \times n \times m)(b×n×m) tensor, mat2 is a (b \times m \times p)(b×m×p) tensor, out will be a (b \times n \times p)(b×n×p) tensor.