How to use nn.TransformerDecoder() at inference time

@david.waterworth Heyo, just to clarify, so what exactly was the fix David? Was it the removal of
for p in model.parameters(): if p.dim() > 1: torch.nn.init.xavier_normal_(p)
If you did remove the code above, then did you replace it with something else? Thanks again.

If you could post your full code somewhere, that would be great too!

@mathematicsofpaul yes I just removed those lines and let each layer do it’s own initialisation. I think that code came from an earlier example

I’ll try and post the full code somewhere when I get a chance

@david.waterworth That is interesting because nn.Transformer() also has its own initialization that does a very similar thing, are you sure that the model.parameters() initializer was the only thing you changed?

From the nn.Transformer() Source Code

    def _reset_parameters(self):
        r"""Initiate parameters in the transformer model."""

        for p in self.parameters():
            if p.dim() > 1:

Kind Regards :slight_smile:

Yes - the difference was the initialiser I removed was in my file and my ‘model’ class contains the embedding layers, positional encoding, transformer and fully connected output

So the code I removed was using the Xavier initialisation on layers other than the transformer itself which I suspect is wrong. I copied my train loop from elsewhere.

1 Like

What do you mean by shifted SOS token?

Hi have you solved this repeating token problem? I am working for a long time to tackle this but still no progress has been done. Thanks in advance if you could offer some help…This problem really makes me confused.