Why doesn't Pytorch automatically initialize nn.Parameter?

Vedant_Roy · July 7, 2022, 5:51am

I have learned positional embeddings in my transformer with the following:

        self.positional_embedding = nn.parameter.Parameter(
            torch.zeros((sequence_len, d_model)), requires_grad=True
        )

For embeddings, linear layers, and convolutions it seems like Pytorch (well, I’m using Pytorch Lightning), automatically initializes the tensors with random values.

But, for this positional embedding Pytorch does not initialize the tensor with random values. (I.e, if I call print(self.positional_embedding) inside of the forward method, the output will be all 0s.)

Does this mean I did something wrong / this embedding will not automatically update during backprop, or is everything still fine and this is just a special case where I need to manually initialize the embedding.

ptrblck · July 7, 2022, 6:42am

The nn.Parameter class does not initialize the internal tensor and will use its values directly. Modules are implementing a reset_parameters function to initialize all parameters as seen in e.g. linear.py.
In your use case you are explicitly initializing the positional_embedding parameter with torch.zeros so it’s expected to see zeros afterwards. I’m not familiar with your use case so don’t know if a random initialization would perform better.

Vedant_Roy · July 7, 2022, 4:36pm

Got it, so the values in a nn.parameter.Parameter will update during back propogation. They just won’t be initialized automatically, which is fine by me, since on the next line I initialize the values using kaiming_he.