I’m a bit confused about weight initialization. In my neural network I use: BatchNorm1d, Conv1d, ELU, MaxPool1d, Linear, Dropout and Flatten.
Now I think only Conv1D, Linear and ELU have weights right? In particular:
Conv1D: Has weights for the weighted sum it uses.
ELU: Has alpha as a weight
Linear: Weights represent basically the transformation matrix
Now all those weights need to be set to something in the beginning. I know that for symmetrical activation functions, one uses Xavier and for thinks like ReLU (and I guess ELU) one uses Kaiming to set these weights. Correct?
What is used for the weights for Linear?
What is used for Conv1D weights?
What are the default weights set by pyTorch? I guess they are:
- Linear: alpha: float = 1.
- Conv1D: U(-sqrt(k), sqrt(k)) with k = groups / (Cin*kernel siye) whereas k = 1 by default.
- ELU: alpha = 1.0
Do people set the weights only at the beginning or are there usecases where one does it while training?
What is the correct way of initialize weights?
Thanks in advance