Weights initialization


I’m a bit confused about weight initialization. In my neural network I use: BatchNorm1d, Conv1d, ELU, MaxPool1d, Linear, Dropout and Flatten.

Now I think only Conv1D, Linear and ELU have weights right? In particular:

Conv1D: Has weights for the weighted sum it uses.
ELU: Has alpha as a weight
Linear: Weights represent basically the transformation matrix

Question 1:
Now all those weights need to be set to something in the beginning. I know that for symmetrical activation functions, one uses Xavier and for thinks like ReLU (and I guess ELU) one uses Kaiming to set these weights. Correct?

Question 2:
What is used for the weights for Linear?

Question 3:
What is used for Conv1D weights?

Question 4:
What are the default weights set by pyTorch? I guess they are:

  • Linear: alpha: float = 1.
  • Conv1D: U(-sqrt(k), sqrt(k)) with k = groups / (Cin*kernel siye) whereas k = 1 by default.
  • ELU: alpha = 1.0


Question 5:
Do people set the weights only at the beginning or are there usecases where one does it while training?

Question 6:
What is the correct way of initialize weights?

Thanks in advance

In general, I highly recommend looking into PyTorch’s documentation, e.g. information about weight initialisation for torch.nn.Linear can be found under Variables section. Same information can be found also in the source code - see reset_parameters() method of Linear class.

For correct way of initialising weights, see torch.nn.init.

The example with Conv2D, would be:

conv = torch.nn.Conv2d(16, 33, 3)

Oh I did check all that, that’s how I came up with the above information in the post. I actually ended up doing what you did, I was just very unsure what people usually do and how.

also this is outdated btw. one should use xavier_uniform_()

You are correct, I’ve fixed it

I am also interested in knowing the answer of those questions…