Masked_fill operates weirdly

I’m doing a personal study with the annotated Transformer(http://nlp.seas.harvard.edu/2018/04/03/attention.html), and

I’m trying to use masked_fill operation like following,

scores = scores.masked_fill(mask == 0, -1e9)

the input is written as mask is vary in size because of the data format (text sentence).

it works fine for first input, (data: [torch.cuda.FloatTensor of size 1x8x21x21 (GPU 0)])

but in second input(data: [torch.cuda.FloatTensor of size 1x8x9x9 (GPU 0)]), it gives following error message

RuntimeError: The expanded size of the tensor (9) must match the existing size (8) at non-singleton dimension 3. at /pytorch/torch/lib/TH/generic/THTensor.c:309

I thought it caused by data type, but it was in vain.

where am I messed up? any suggestion would be very helpful.

From the error mesage, it is a size issue on the 3rd dimension, where one is of size 8 and the other of size 9.
I would print the size of the tensors before the operation to check the dimensions.

thank you for your reply, and I found the problem while this post was suspended due to my first written post!

mask size was changed due to my debugging code inserted while ago.