how shoud I initalize the weights of nn.ConvTranspose2d ? like nn.Conv2d? is this any special for Pytorch
Add another question:Does pytorch require manual weight initialization or pytorch layers would initialize automatically? means:if i do’t initialize the weight or bias ,it is all zero or random value ?
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
infer : the bias will automatically initialize with random value . is that right?
The work I want to do is to build a FCN based on a caffemodel FCN . now i want to initalize the network .as following
my initalization:
def weights_initG(m):
for p in m.modules():
if isinstance(p,nn.Conv2d):
n = p.kernel_size[0] * p.kernel_size[1] * p.out_channels
p.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(p,nn.BatchNorm2d):
p.weight.data.normal_(1.0, 0.02)
p.bias.data.fill_(0)
elif isinstance(p,nn.ConvTranspose2d):
n=p.kernel_size[1]
factor = (n+1)//2
if n%2 ==1:
center = factor - 1
else :
center = factor -0.5
og = np.ogrid[:n,:n]
weights_np=(1-abs(og[0]-center)/factor)* (1-abs(og[1]-center)/ factor)
p.weight.data.copy_(torch.from_numpy(weights_np))
Question one : how should I initalize the bias of Conv and deconv
Question two :since Pytorch image data is between[0,1],caffe image data is [0,255]. the weight initalization method have any difference with Caffe?
@chenyuntc Does pytorch require manual weight initialization or pytorch layers would initialize automatically? I noticed there is .reset_parameters() in the base _ConvNd class, but I didn’t see where this function is called.
The parameters are initialized automatically. If you want to use a specific initialization strategy take a look at torch.nn.init. I’ll need to add that to the docs.
hello,hank you very much for your help, I ask you another question, for the above code, use the normal distribution to initialize the weights. If i want a normal distribution of variables in a certain range, what method can be done? Do I need to customize a distribution function?
Or called Truncated Normal Distribution
Can anyone please explain how exactly this snippet works?
I’m not able to get how the m.weight.data.normal_ line works
and if I set bias=False while defining my network is it necessary to define m.bias.data.fill_(0) explicitly again?
torch.no_grad will disable gradient calculation, so that all operations in this block won’t be tracked by Autograd.
Using the underlying .data attribute will most likely work in this case.
However, I consider it dangerous, as it might silently introduce errors in the gradient calculations.