I have learned positional embeddings in my transformer with the following:
self.positional_embedding = nn.parameter.Parameter(
torch.zeros((sequence_len, d_model)), requires_grad=True
)
For embeddings, linear layers, and convolutions it seems like Pytorch (well, I’m using Pytorch Lightning), automatically initializes the tensors with random values.
But, for this positional embedding Pytorch does not initialize the tensor with random values. (I.e, if I call print(self.positional_embedding)
inside of the forward method, the output will be all 0s.)
Does this mean I did something wrong / this embedding will not automatically update during backprop, or is everything still fine and this is just a special case where I need to manually initialize the embedding.