How are layer weights and biases initialized by default?

I was wondering how are layer weights and biases initialized by default? E.g. if I create the linear layer torch.nn.Linear(5,100) How are weights and biases for this layer initialized by default?


Linear layers are initialized with

stdv = 1. / math.sqrt(self.weight.size(1)), stdv)
if self.bias is not None:, stdv)

See also here.


Thanks! So it depends on the layer you use?

1 Like

The layers are initialized in some way after creation. E.g. the conv layer is initialized like this.
However, it’s a good idea to use a suitable init function for your model.
Have a look at the init functions.
You can apply the weight inits like this:

def weights_init(m):
    if isinstance(m, nn.Conv2d):


So it won’t throw any error if I forget to initialize some conv layers?

Yes, it won’t throw any errors. Depending on your problem, training could be trickier.


Is there a way to alter this code for a situation where you have nn.Conv2d layers whos bias can be on or off depending on their position in the network?

e.g. you have a first Conv2d with a bias term but then a later Conv2d with no bias term.

As the following will return an error:

    if isinstance(m, nn.Conv2d(bias=True):
1 Like

You could use a condition to check, if bias was set:

if isinstance(m, nn.Conv2d):
    if m.bias:

If I try that I get the following error when using:

def weight_init(m):
    if isinstance(m, torch.nn.Conv2d) or isinstance(m, torch.nn.Linear):
        if m.bias: 
RuntimeError: bool value of Tensor with more than one value is ambiguous

Sorry for the misleading code. It should be: if m.bias is not None:
Also, xavier_uniform will fail on bias, as it has less than 2 dimensions, so that fan_in and fan_out cannot be computed.

if isinstance(m, nn.Conv2d):
    if m.bias is not None:

No no not at all, I should have been able to work that out for myself :joy:

Okay that seems to be okay except that I get the following error from the .zeros

AttributeError: module 'torch.nn.init' has no attribute 'zeros_'

Perhaps it’s an outdated atribute?

I think it was introduced in the latest release, i.e. 0.4.1.
I would recommend to update to it or in case it’s not possible at the moment due to whatever reason, you could use:

with torch.no_grad():

Ahh I’m using 0.4.0 so I will update to the newest version.

That code you sent works on 0.4.0 though which is great, thanks for your help as always! :slight_smile:

1 Like

Hi ptrblck,

I just wanted to follow up on this: If you were to use nn.Conv2d( ... ,bias = True) presumably the weight would be zeroed would it not? Because True is != None in Python language…

Therefore, you must either use bias = False, or don’t insert and bias information to nn.Conv2d? Does this sound right?

If you don’t want to use the bias, you should set bias=False during the instantiation of the layer.
Are you somehow referring to its initialization? In my example I set the bias to zeros, if it’s available.
It’s still a learnable and used parameter in case you are wondering if the bias is useless afterwards.

Sorry, my mistake I understand now. I meant bias = False in my first sentence above but I was concerned that because in Python False is not None, that it would somehow try to attribute some bias initialisation to the layer even if you set it to False. But I assume False trumps the weight initialisation so that you are left with no bias, which is what you want.

Sorry for the confusion. In the construction of the conv layer you pass bias as a bool value (code).
If it is set to True (or anything that returns True in the line of code), self.bias will be initialized to the nn.Parameter.

@ptrblck , in this script, the xavier algorithm will apply on all layers or only in nn.Conv2d?

It depends on the condition you are using.
In the scripts I’ve posted in this thread, I’ve used if isinstance(m, nn.Conv2d), so it’ll be just used for nn.Conv2d layers.

You can of course add more conditions to it for other layers/parameters etc.

1 Like

@ptrblck, Thank you. I have a UNet model, then do I need to put if condition for each layer?