Weight initilzation

How I can initialize variables, say Xavier initialization?


Hi @Hamid,

I think you can extract the network’s parameters params = list(net.parameters()) and then apply the initialisation you may like.
If you need to apply the initialisation to a specific module, say conv1, you can extract the specific parameters with conv1Params = list(net.conv1.parameters()). You will have the kernels in conv1Params[0] and the bias terms in conv1Params[1].


Another possibility, which is present in the examples, can be found here. This function specifies how the weights should be handled, and the weights are modified in this line.


Another initialization example from PyTorch Vision resnet implementation.


@Atcold Can you give an example of what you mean? Thanks!

@Kalamaya, I believe @fmassa is a cleaner solution.
You traverse all Modules, and, upon __class__.__name__ matching, you initialise the parameters with what you prefer.
My method presupposes that you know the order of the Modules in the _modules OrderedDict().

Does it make any sense what I am saying? If it does not, I can try better and with an example.

@Atcold Yes, an example will help here since I am still traversing unfamiliar territory… :slight_smile: thank you for your help in advanced! Much obliged. If it helps, I am basically trying to initialize my conv and fully connected layers that I have. (I’d like to do Xavier, or fan_in fan_out etc). Thanks!

You first define your name check function, which applies selectively the initialisation.

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:

Then you traverse the whole set of Modules.

net = Net() # generate an instance network from the Net class
net.apply(weights_init) # apply weight init

And this is it. You just need to define the xavier() function.


A less Lua way of doing that would be to check if some module is an instance of a class. This is the recommended way:

def weights_init(m):
    if isinstance(m, nn.Conv2d):

@Atcold another thing, accessing members prefixed with underscore is not recommended. They’re internal and subject to change without notice. If you want to get a iterator over modules use .modules() (searches recusrively) or .children() (only one level).


Thanks guys.
I am trying to apply weight initialization to a fully connected network (nn.Linear). I need, however, fan_out and fan_in of this layer. By fan_out and fan_in I mean number of output neurons and input neurons, respectively. How can I access them?

@Hamid, you can check the size of the weight matrix.

size = m.weight.size() # returns a tuple
fan_out = size[0] # number of rows
fan_in = size[1] # number of columns

@apaszke, thanks for the heads-up! I’m still new to the Python world…

Edit: apply @apaszke fix.


A small note - .size() is also defined on Variables, so no need to unpack the data. m.weight.size() will work too.

@Hamid, are you trying to ask something? I am not sure I understand.
Could you also please format your code with three backticks and the word python, so that I can read what you posted?

1 Like

@Atcold, by checking if isinstance(m, nn.Linear) it would apply to linear module, correct?
If I call the weight initialization, it would be applied to all layers?
I have a residual module each having 2 linear layers. Then several of these modules.

my code:

def weight_init(m): 
	if isinstance(m, nn.Linear):
		size = m.weight.size()
		fan_out = size[0] # number of rows
		fan_in = size[1] # number of columns
		variance = np.sqrt(2.0/(fan_in + fan_out))
		m.weight.data.normal_(0.0, variance)

class Residual(nn.Module):
	def __init__(self,dropout, shape, negative_slope, BNflag = False):
		super(Residual, self).__init__()
		self.dropout = dropout
		self.linear1 = nn.Linear(shape[0],shape[1])
		self.linear2 = nn.Linear(shape[1],shape[0])
		self.dropout = nn.Dropout(self.dropout)
		self.BNflag = BNflag
		self.batch_normlization = nn.BatchNorm1d(shape[0])
		self.leakyRelu = nn.LeakyReLU(negative_slope = negative_slope , inplace=False)

	def forward(self, X):
		x = X
		if self.BNFlag:
			x = self.batch_normlization(x)
		x = self.leakyRelu(x)
		x = self.dropout(x)
		x = self.linear1(x)
		if self.BNFlag:
			x = self.batch_normlization(x)
		x = self.leakyRelu(x)
		x = self.dropout(x)
		x = self.linear2(x)
		x = torch.add(x,X)
		return x
class FullyCN(nn.Module):
	def __init__(self, args):
		super(FullyCN, self).__init__()
		self.numlayers = arg.sm-num-hidden-layers
		self.learning-rate= args.sm-learning-rate
		self.dropout = arg.sm-dropout-prob
		self.BNflag = args.sm-bn
		self.shape = [args.sm-input-size,args.sm-num-hidden-units]		
		self.res =  Residual(self.dropout,self.shape,args.sm_act_param,self.self.BNflag)
		self.res-outpus = []

	def forward(self,X):
		for i in range(self.numlayers):
		return self.res-outpus[-1]

sorry about confusion


Yup. All linear layers.

Sure, it will apply the normalisation to each Module that belongs to the class nn.Linear.

But I have called weight_init once for the class while I call linear layers in a for loop (i.e., there are multiple sets of variables).

net = Residual() # generate an instance network from the Net class
net.apply(weights_init) # apply weight init

I’m not too sure what you’re doing with FullyCN

1 Like

The apply function will search recursively for all the modules inside your network, and will call the function on each of them. So all Linear layers you have in your model will be initialized using this one call.


@Atcold, in FullyCN I use several residual modules. In another piece of code, I pass data to FullyCN which returns corresponding output via its forward function.