how shoud I initalize the weights of nn.ConvTranspose2d ? like nn.Conv2d? is this any special for Pytorch
Add another question:Does pytorch require manual weight initialization or pytorch layers would initialize automatically? means:if i do’t initialize the weight or bias ,it is all zero or random value ?
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size * m.kernel_size * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
infer : the bias will automatically initialize with random value . is that right?
The work I want to do is to build a FCN based on a caffemodel FCN . now i want to initalize the network .as following
for p in m.modules():
n = p.kernel_size * p.kernel_size * p.out_channels
p.weight.data.normal_(0, math.sqrt(2. / n))
factor = (n+1)//2
if n%2 ==1:
center = factor - 1
center = factor -0.5
og = np.ogrid[:n,:n]
weights_np=(1-abs(og-center)/factor)* (1-abs(og-center)/ factor)
Question one : how should I initalize the bias of Conv and deconv
Question two :since Pytorch image data is between[0,1],caffe image data is [0,255]. the weight initalization method have any difference with Caffe?
have a look at example dcgan
classname = m.__class__.__name__
if classname.find('Conv') != -1:
elif classname.find('BatchNorm') != -1:
it should work.
@chenyuntc Does pytorch require manual weight initialization or pytorch layers would initialize automatically? I noticed there is .reset_parameters() in the base _ConvNd class, but I didn’t see where this function is called.
The parameters are initialized automatically. If you want to use a specific initialization strategy take a look at
torch.nn.init. I’ll need to add that to the docs.
reset_parameters() should be called in
hello,hank you very much for your help, I ask you another question, for the above code, use the normal distribution to initialize the weights. If i want a normal distribution of variables in a certain range, what method can be done? Do I need to customize a distribution function?
Or called Truncated Normal Distribution
maybe just use
ok，thank you！by the way，if I want to clamp [-1, -0.1] and [0.1,1] 。How to operate？
Can anyone please explain how exactly this snippet works?
I’m not able to get how the m.weight.data.normal_ line works
and if I set bias=False while defining my network is it necessary to define m.bias.data.fill_(0) explicitly again?
.normal_ fills the tensor inplace with values drawn from the normal distribution using the specified mean and standard deviation (docs).
If you set
bias=False, you don’t have to and in fact cannot call
bias will be
Note, that I would recommend to wrap the parameter manipulations in a
torch.no_grad block and avoid using the
.data attribute directly:
lin = nn.Linear(10, 10, bias=False)
@ptrblck thank you! This helped me a lot
Just one question what is the difference between using torch.no_grad() and using .data?
torch.no_grad will disable gradient calculation, so that all operations in this block won’t be tracked by Autograd.
Using the underlying
.data attribute will most likely work in this case.
However, I consider it dangerous, as it might silently introduce errors in the gradient calculations.