My model will input a batch of sequence to nn.Transformer
and the output of transformer with be feeded into nn.Linear
. The input sequence has different length before feeding it into nn.Transformer
, so I pad the sequence to the same length in every batch adatively using collate_fn
in dataloader
For example:
- batch 1: max length of sequence in this batch is 10, padding 0 to each sequence
- batch 2: max length of sequence in this batch is 12, padding 0 to each sequence
- batch 3: max length of sequence in this batch is 15, padding 0 to each sequence
Now, I have a problem because the output shape of Transformer is [seq_len, batchsize, dinput], so the input dimension of nn.Linear
is dynamical because of the sequence length is different.
I have seen some posts mentioned that it can use the nn.AdaptiveAvgPool
to solve this problem. Now, I have two solutions and question:
-
Transpose the output of transformer into [batchsize, dinput, seq_len] using
nn.AdaptiveAvgPool1d
to make shape into [batchsize, dinput, fix_num], and reshape to [batchsize, -1] to feed intonn.Linear
. Is this make sense? The sequence length are different but we fix it into a fix number? -
Set a fixing maximum length, ex: 30, some sequence length is up to 25-30, but most of the sequence length are in length 5-10. The input data will be most of 0 (padding value). It will contain too much useless information!
Does anyone compare of these two padding methods or any other good suggestion?