Unexpected NaN from TransformerEncoder forward function

Hi,

I’m getting unexpected NaN at certain cells of the output tensor, could somebody help me understand why? Especially given that I have the padding_mask specified to avoid the nan cells, why does that still impact my output? Also is input padding cells’ value supposed to be NaN, or what’s the best practice?

Example Input:

in_embed = torch.tensor(
[[[ 0.0000, 1.1115],
[ 0.0015, 1.1115],
[ 0.0015, 1.1115]],
[[ 0.9552, 0.0000],
[float(‘nan’),float(‘nan’)],
[float(‘nan’),float(‘nan’)]],
[[ 1.0119, -0.4620],
[float(‘nan’),float(‘nan’)],
[float(‘nan’),float(‘nan’)]],
[[ 0.0463, -1.2204],
[float(‘nan’),float(‘nan’)],
[float(‘nan’),float(‘nan’)]]]
)
mask = torch.tensor(
[[0., -float(‘inf’), -float(‘inf’), -float(‘inf’)],
[0., 0., -float(‘inf’), -float(‘inf’)],
[0., 0., 0., -float(‘inf’)],
[0., 0., 0., 0.]]
)
seqmasks = torch.tensor(
[[False, False, False, False],
[False, True, True, True],
[False, True, True, True]]
)

Calling TransformerEncoder:

tel = torch.nn.TransformerEncoderLayer(d_model=2, nhead=2, dim_feedforward=8)
bert = torch.nn.TransformerEncoder(tel, num_layers=6)
out_embed = bert(src=in_embed, mask=mask, src_key_padding_mask=seqmasks)

Real Output:

tensor([[[-1.0000, 1.0000],
[ nan, nan],
[ nan, nan]],
[[ 1.0000, -1.0000],
[ nan, nan],
[ nan, nan]],
[[ 1.0000, -1.0000],
[ nan, nan],
[ nan, nan]],
[[ 1.0000, -1.0000],
[ nan, nan],
[ nan, nan]]], grad_fn=)

Expected Output (something like this):

tensor([[[-1.0000, 1.0000],
[-1.0000, 1.0000],
[-1.0000, 1.0000]],
[[ 1.0000, -1.0000],
[ nan, nan],
[ nan, nan]],
[[ 1.0000, -1.0000],
[ nan, nan],
[ nan, nan]],
[[ 1.0000, -1.0000],
[ nan, nan],
[ nan, nan]]], grad_fn=)