I’m looking for suggestions on how to debug the following issue, where the output size of a nn.TransformerEncoder
does not match the input. As part of a larger network, I have a nn.TransformerEncoder
layer containing a single nn.TransformerEncoderLayer
. It’s created as:
enc_layers = nn.TransformerEncoderLayer(16, 2,
dim_feedforward=32, dropout=0.0, activation='gelu',
batch_first=True,
)
self.encoder = nn.TransformerEncoder(enc_layers, 1)
In the forward
method, the snippet where this is called is:
xs = xs.view(-1, 20, 16)
print(xs.size())
xsa = self.encoder(xs, src_key_padding_mask=mask)
print(xsa.size())
I’m getting:
torch.Size([1024, 20, 16])
torch.Size([1024, 14, 16])
The strange thing is that when I run a single batch in isolation in a notebook, the sizes match and everything looks fine, but when run in the training script, I see the size mismatch. I’ve printed the sizes of all of the model parameter tensors in both, and they match. The mask size is `torch.Size([1024, 20]).
I’ve also used python -m pdb
to drop in and inspect things when the training script fails at the next step because of the unexpected tensor size. When I re-pass xs
to self.encoder
I’m getting the expected size, so this is really mysterious to me.
I’m using torch 2.0.0 on a cuda gpu, but the same thing happens when running on the CPU as well.
Is there a situation where a nn.TransformerEncoder
’s output could have a different size than the input? Thoughts/suggestions for how to further debug this?