How to use nn.TransformerDecoder() at inference time

mathematicsofpaul · September 4, 2020, 2:06am

@david.waterworth Heyo, just to clarify, so what exactly was the fix David? Was it the removal of
for p in model.parameters(): if p.dim() > 1: torch.nn.init.xavier_normal_(p)
If you did remove the code above, then did you replace it with something else? Thanks again.

mathematicsofpaul · September 4, 2020, 2:36am

If you could post your full code somewhere, that would be great too!

david.waterworth · September 4, 2020, 3:14am

@mathematicsofpaul yes I just removed those lines and let each layer do it’s own initialisation. I think that code came from an earlier example

I’ll try and post the full code somewhere when I get a chance

mathematicsofpaul · September 4, 2020, 3:47am

@david.waterworth That is interesting because nn.Transformer() also has its own initialization that does a very similar thing, are you sure that the model.parameters() initializer was the only thing you changed?

From the nn.Transformer() Source Code

    def _reset_parameters(self):
        r"""Initiate parameters in the transformer model."""

        for p in self.parameters():
            if p.dim() > 1:
                xavier_uniform_(p)

Kind Regards

david.waterworth · September 4, 2020, 4:15am

Yes - the difference was the initialiser I removed was in my train.py file and my ‘model’ class contains the embedding layers, positional encoding, transformer and fully connected output

So the code I removed was using the Xavier initialisation on layers other than the transformer itself which I suspect is wrong. I copied my train loop from elsewhere.

shamoons · November 20, 2020, 3:30pm

What do you mean by shifted SOS token?

SenSai_Onegai · March 11, 2022, 6:37pm

Hi have you solved this repeating token problem? I am working for a long time to tackle this but still no progress has been done. Thanks in advance if you could offer some help…This problem really makes me confused.