@david.waterworth Heyo, just to clarify, so what exactly was the fix David? Was it the removal of
for p in model.parameters(): if p.dim() > 1: torch.nn.init.xavier_normal_(p)
If you did remove the code above, then did you replace it with something else? Thanks again.
If you could post your full code somewhere, that would be great too!
@mathematicsofpaul yes I just removed those lines and let each layer do itās own initialisation. I think that code came from an earlier example
Iāll try and post the full code somewhere when I get a chance
@david.waterworth That is interesting because nn.Transformer()
also has its own initialization that does a very similar thing, are you sure that the model.parameters()
initializer was the only thing you changed?
From the nn.Transformer() Source Code
def _reset_parameters(self):
r"""Initiate parameters in the transformer model."""
for p in self.parameters():
if p.dim() > 1:
xavier_uniform_(p)
Kind Regards
Yes - the difference was the initialiser I removed was in my train.py file and my āmodelā class contains the embedding layers, positional encoding, transformer and fully connected output
So the code I removed was using the Xavier initialisation on layers other than the transformer itself which I suspect is wrong. I copied my train loop from elsewhere.
What do you mean by shifted
SOS token?
Hi have you solved this repeating token problem? I am working for a long time to tackle this but still no progress has been done. Thanks in advance if you could offer some helpā¦This problem really makes me confused.