I’m doing machine translation with pytorch. Because I use cnn and maxpooling in the decoder, the decoder reduces the total time steps of the target sentence. Now I want to do a upsampling to reconstruct the time steps, so it can have the same length as the original target sentence. I find it hard to implement since the target sentences have different length. Here is my model class:
class Archi(nn.Module): def __init__(self, src_vocab_size, ref_vocab_size, d_model=512, num_block=3, heads=8, dropout=0.1): super().__init__() self.encoder = Encoder(src_vocab_size=src_vocab_size, d_model=d_model, num_block=num_block, heads=heads, dropout=dropout) self.decoder = Decoder(ref_vocab_size=ref_vocab_size, d_model=d_model, num_block=num_block, heads=heads, dropout=dropout) self.to_vocab = nn.Linear(d_model, ref_vocab_size) self.to_original_length = nn.Linear(????????) def forward(self, src, ref, src_mask, ref_mask): enc_outputs = self.encoder(src=src, mask=src_mask) dec_outputs = self.decoder(ref=ref, enc_outputs=enc_outputs, mask=ref_mask) output = self.to_vocab(dec_outputs) output = self.to_original_length(output) return output
Let’s say I have to target sentence, the first one has length of 10 and the second one has length of 15. But decoder generates two representations of length 5 and 7 correspondingly, so I want to use a Linear layer to project the dimension back to 10 and 15, but since the target sentences have different length, I can’t define the linear layer in the init function, otherwise for the coming sentences there will be dimension errors. Is there a way to deal with this without introduce the target sentence into the init function?