Is it a good idea to initialize weights itself in the init function by looping over the layers?
Class Net(nn.Module):
def __init__():
#...... Defining Layers
#.......................
for m in self.modules():
if isinstance(m, nn.Conv3d):
m.weight = nn.init.kaiming_normal(m.weight, mode='fan_out')
elif isinstance(m, nn.BatchNorm3d) or isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
def forward():
Could you clarify why we donāt use .data attribute anymore with nn.init? The documentation for the init functions refer to input Tensors and not Parameters still.
(Also, as an aside: I lurk on the Pytorch forums a lot, thank you for all your extremely helpful responses.)
Kaiming uniform would initialise with variance 2 / fan_in.
However, with a=math.sqrt(5), the initialisation ends up with a variance 1 / (3 * fan_in), which does not correspond to any standard initialisation scheme.
I hope you are well. Sorry, I wan to train my classifier with 10 ensembles. The different between ensembles are the order of the subjects that I used for creating data. I want to be sure that for each ensemble the weight initialization is different from other. ensembles are running in parallel jobs independent from each other.
I just define my model and use it directly . can I be sure that for each ensemble the weight initialization is different ? for example randomly be different.
Yes, as long as you donāt set the random seed before initializing each module, the parameters would be different. You can check (some or all) via print(modelX.my_layer.param) where the X denotes the current model and would see that the same parameter would have different values after initializing all models.
What difference would it make if we donāt set nonlinearity=āreluā while the nonlinearity is layer is relu, because the default nonlinearity is leaky relu
The calculate_gain is using the specified nonlinearity as seen here:
Return the recommended gain value for the given nonlinearity function.
The values are as follows:
================= ====================================================
nonlinearity gain
================= ====================================================
Linear / Identity :math:`1`
Conv{1,2,3}D :math:`1`
Sigmoid :math:`1`
Tanh :math:`\frac{5}{3}`
ReLU :math:`\sqrt{2}`
Leaky Relu :math:`\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}`
SELU :math:`\frac{3}{4}`
================= ====================================================
Hi, Iām not sure about the custom initialization, I havenāt tried this yet, Iāve been sticking to xavier initialisation for almost all the applications but will give this a try to see how it compares with xavier!
Hey . this is implementation of xavier init from source code of pytorch and also i read that xavier is better if use symmetric functions like sigm or tanh and cause of that i experiment like that (with relu) and get confidence that using xaiver without sigm or tanh is not good approach. share recently received learning with you
if you had better result with xavier init where you had non symmetric function instead of without xavier , let me know i will be glad. because if i have apply sigmoid function in my little code that return me more reliable gradients. also you can check it yourself)))