Initalize the weights of nn.ConvTranspose2d

huxc_ustc · March 8, 2017, 4:15am

how shoud I initalize the weights of nn.ConvTranspose2d ? like nn.Conv2d? is this any special for Pytorch

Add another question:Does pytorch require manual weight initialization or pytorch layers would initialize automatically? means:if i do’t initialize the weight or bias ,it is all zero or random value ?

for m in self.modules():
    if isinstance(m, nn.Conv2d):
        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
        m.weight.data.normal_(0, math.sqrt(2. / n))
    elif isinstance(m, nn.BatchNorm2d):
        m.weight.data.fill_(1)
        m.bias.data.zero_()

infer : the bias will automatically initialize with random value . is that right?

The work I want to do is to build a FCN based on a caffemodel FCN . now i want to initalize the network .as following
my initalization:

def weights_initG(m):
    for p in m.modules():
        if isinstance(p,nn.Conv2d):
            n = p.kernel_size[0] * p.kernel_size[1] * p.out_channels
            p.weight.data.normal_(0, math.sqrt(2. / n))
        elif isinstance(p,nn.BatchNorm2d):
            p.weight.data.normal_(1.0, 0.02)
            p.bias.data.fill_(0)
        elif isinstance(p,nn.ConvTranspose2d):
            n=p.kernel_size[1]
            factor = (n+1)//2
            if n%2 ==1:
                center = factor - 1
            else :
                center = factor -0.5
            og = np.ogrid[:n,:n]
            weights_np=(1-abs(og[0]-center)/factor)* (1-abs(og[1]-center)/ factor)
            p.weight.data.copy_(torch.from_numpy(weights_np))

Question one : how should I initalize the bias of Conv and deconv
Question two :since Pytorch image data is between[0,1],caffe image data is [0,255]. the weight initalization method have any difference with Caffe?

chenyuntc · March 8, 2017, 4:22am

have a look at example dcgan

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)

netG.apply(weights_init)

it should work.

david-leon · March 8, 2017, 5:19am

@chenyuntc Does pytorch require manual weight initialization or pytorch layers would initialize automatically? I noticed there is .reset_parameters() in the base _ConvNd class, but I didn’t see where this function is called.

apaszke · March 8, 2017, 10:36am

The parameters are initialized automatically. If you want to use a specific initialization strategy take a look at torch.nn.init. I’ll need to add that to the docs.

acgtyrant · May 18, 2017, 6:30am

reset_parameters() should be called in __init__.

bille_du · June 2, 2017, 10:04am

hello,hank you very much for your help, I ask you another question, for the above code, use the normal distribution to initialize the weights. If i want a normal distribution of variables in a certain range, what method can be done? Do I need to customize a distribution function?
Or called Truncated Normal Distribution

chenyuntc · June 3, 2017, 1:10pm

maybe just use clamp_

m.weight.data.normal_(1.0, 0.02).clamp_(min=0,max=2)

bille_du · June 5, 2017, 12:58pm

ok，thank you！by the way，if I want to clamp [-1, -0.1] and [0.1,1] 。How to operate？

chenyuntc · June 6, 2017, 2:22am

a.clamp_(min=-1,max=1)
a[a.abs()<0.1]=t.sign(a[a.abs()<0.1])*0.1

bolt25 · July 13, 2019, 4:41pm

Can anyone please explain how exactly this snippet works?
I’m not able to get how the m.weight.data.normal_ line works
and if I set bias=False while defining my network is it necessary to define m.bias.data.fill_(0) explicitly again?

ptrblck · July 14, 2019, 12:16am

.normal_ fills the tensor inplace with values drawn from the normal distribution using the specified mean and standard deviation (docs).

If you set bias=False, you don’t have to and in fact cannot call m.bias.data, since bias will be None.

Note, that I would recommend to wrap the parameter manipulations in a torch.no_grad block and avoid using the .data attribute directly:

lin = nn.Linear(10, 10, bias=False)

with torch.no_grad():
    lin.weight.normal_(0.0, 1.0)

bolt25 · July 14, 2019, 4:13am

@ptrblck thank you! This helped me a lot
Just one question what is the difference between using torch.no_grad() and using .data?

ptrblck · July 14, 2019, 11:25am

torch.no_grad will disable gradient calculation, so that all operations in this block won’t be tracked by Autograd.
Using the underlying .data attribute will most likely work in this case.
However, I consider it dangerous, as it might silently introduce errors in the gradient calculations.