Initalize the weights of nn.ConvTranspose2d

how shoud I initalize the weights of nn.ConvTranspose2d ? like nn.Conv2d? is this any special for Pytorch

Add another question:Does pytorch require manual weight initialization or pytorch layers would initialize automatically? means:if i do’t initialize the weight or bias ,it is all zero or random value ?

for m in self.modules():
    if isinstance(m, nn.Conv2d):
        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
        m.weight.data.normal_(0, math.sqrt(2. / n))
    elif isinstance(m, nn.BatchNorm2d):
        m.weight.data.fill_(1)
        m.bias.data.zero_()

infer : the bias will automatically initialize with random value . is that right?

The work I want to do is to build a FCN based on a caffemodel FCN . now i want to initalize the network .as following
my initalization:

def weights_initG(m):
    for p in m.modules():
        if isinstance(p,nn.Conv2d):
            n = p.kernel_size[0] * p.kernel_size[1] * p.out_channels
            p.weight.data.normal_(0, math.sqrt(2. / n))
        elif isinstance(p,nn.BatchNorm2d):
            p.weight.data.normal_(1.0, 0.02)
            p.bias.data.fill_(0)
        elif isinstance(p,nn.ConvTranspose2d):
            n=p.kernel_size[1]
            factor = (n+1)//2
            if n%2 ==1:
                center = factor - 1
            else :
                center = factor -0.5
            og = np.ogrid[:n,:n]
            weights_np=(1-abs(og[0]-center)/factor)* (1-abs(og[1]-center)/ factor)
            p.weight.data.copy_(torch.from_numpy(weights_np))

Question one : how should I initalize the bias of Conv and deconv
Question two :since Pytorch image data is between[0,1],caffe image data is [0,255]. the weight initalization method have any difference with Caffe?

have a look at example dcgan

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)

netG.apply(weights_init)

it should work.

1 Like

@chenyuntc Does pytorch require manual weight initialization or pytorch layers would initialize automatically? I noticed there is .reset_parameters() in the base _ConvNd class, but I didn’t see where this function is called.

1 Like

The parameters are initialized automatically. If you want to use a specific initialization strategy take a look at torch.nn.init. I’ll need to add that to the docs.

3 Likes

reset_parameters() should be called in __init__.

hello,hank you very much for your help, I ask you another question, for the above code, use the normal distribution to initialize the weights. If i want a normal distribution of variables in a certain range, what method can be done? Do I need to customize a distribution function?
Or called Truncated Normal Distribution

maybe just use clamp_

m.weight.data.normal_(1.0, 0.02).clamp_(min=0,max=2)

ok,thank you!by the way,if I want to clamp [-1, -0.1] and [0.1,1] 。How to operate?

a.clamp_(min=-1,max=1)
a[a.abs()<0.1]=t.sign(a[a.abs()<0.1])*0.1

Can anyone please explain how exactly this snippet works?
I’m not able to get how the m.weight.data.normal_ line works
and if I set bias=False while defining my network is it necessary to define m.bias.data.fill_(0) explicitly again?

.normal_ fills the tensor inplace with values drawn from the normal distribution using the specified mean and standard deviation (docs).

If you set bias=False, you don’t have to and in fact cannot call m.bias.data, since bias will be None.

Note, that I would recommend to wrap the parameter manipulations in a torch.no_grad block and avoid using the .data attribute directly:

lin = nn.Linear(10, 10, bias=False)

with torch.no_grad():
    lin.weight.normal_(0.0, 1.0)
1 Like

@ptrblck thank you! This helped me a lot
Just one question what is the difference between using torch.no_grad() and using .data?

torch.no_grad will disable gradient calculation, so that all operations in this block won’t be tracked by Autograd.
Using the underlying .data attribute will most likely work in this case.
However, I consider it dangerous, as it might silently introduce errors in the gradient calculations.

1 Like