Well I am unsqueezing the third dimension so my tensor is [batch_size x seq_len x 1] sized, and send in a mask along with it with dimensions [batch_size*nheads, seq_len,seq_len].
This gets me the error:
AssertionError: Expected attn_mask shape to be (5, 8, 8) but got torch.Size([40, 100, 100])
Note that here nheads is 5 batch size is 8 seq_le 100, so actually the wanted size is [nheads, batch,batch], but that cannot be right.
If I don’t unsqueeze and send in my data as batc_size x seq_len, and my mask as [nheahs*batch_size,seq_len] then I run into:
RuntimeError: The shape of the 2D attn_mask is torch.Size([40, 100]), but should be (8, 8).
Here again the “correct” size is [batch_size x batch_size].
Here is the updated verison of my current data collator:
class MyDataCollator(DefaultDataCollator):
def __init__(self,model,mask_prob = 0.15):
self.model_nhead = model.nhead
self.d_model = model.d_model
self.mask_prob = mask_prob
def __call__(self,input):
batch = default_data_collator(input)
# Mask should be size : [batch_size * nhead, seq_len, seq_len], but that doesn't work
batch['src'] = batch['src'].unsqueeze(2)
batch_size = batch['src'].shape[0] #This is now 8 due to low memory of my laptop
seq_len = batch['src'].shape[1] #This is 500, however embedding transforms it into d_model
mask = rand(self.model_nhead * batch_size, self.d_model,self.d_model) < self.mask_prob
batch['src_mask'] = mask
return batch
Also your comment on whether I send in batched inputs intrigued me and I followed into the Pytorch code. I found that I get the error from inside an if statement which checks whether my data is 2 or three dimensional, and what is interesting from the 2D branch. So even tho I send in batched inputs somehow Pytorch detects it as unbatched.
This I found in the torch/nn/functional.py file in the function _mha_shape_check.
But then I printed out the shape of my input tensor right before passing it to the nn.TransformerEncoder and it was the correct [batch_size x seq_len, 1].
Also since my first dimension is the batch_dimension I am passing the batch_first=True parameter to the nn.TransformerEncoderLayer. However it really seemed like that it has no effect so I tested it by swapping the axes of my input so I had [seq_len x batch_size x 1]. Then it passed the shape check but failed later on a different one.
AssertionError: was expecting embedding dimension of 100, but got 1. I honestly don’t know what the hell is going on with this one, but I though I will share maybe it helps.