Weight initilzation

Hamid · January 23, 2017, 11:18pm

How I can initialize variables, say Xavier initialization?

Atcold · January 23, 2017, 11:32pm

I think you can extract the network’s parameters params = list(net.parameters()) and then apply the initialisation you may like.
If you need to apply the initialisation to a specific module, say conv1, you can extract the specific parameters with conv1Params = list(net.conv1.parameters()). You will have the kernels in conv1Params[0] and the bias terms in conv1Params[1].

fmassa · January 23, 2017, 11:33pm

Another possibility, which is present in the examples, can be found here. This function specifies how the weights should be handled, and the weights are modified in this line.

michael_k · January 24, 2017, 12:43am

Another initialization example from PyTorch Vision resnet implementation.

Kalamaya · January 24, 2017, 2:15am

@Atcold Can you give an example of what you mean? Thanks!

Atcold · January 24, 2017, 4:14pm

@Kalamaya, I believe @fmassa is a cleaner solution.
You traverse all Modules, and, upon __class__.__name__ matching, you initialise the parameters with what you prefer.
My method presupposes that you know the order of the Modules in the _modules OrderedDict().

Does it make any sense what I am saying? If it does not, I can try better and with an example.

Kalamaya · January 24, 2017, 4:18pm

@Atcold Yes, an example will help here since I am still traversing unfamiliar territory… thank you for your help in advanced! Much obliged. If it helps, I am basically trying to initialize my conv and fully connected layers that I have. (I’d like to do Xavier, or fan_in fan_out etc). Thanks!

Atcold · January 24, 2017, 4:34pm

You first define your name check function, which applies selectively the initialisation.

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        xavier(m.weight.data)
        xavier(m.bias.data)

Then you traverse the whole set of Modules.

net = Net() # generate an instance network from the Net class
net.apply(weights_init) # apply weight init

And this is it. You just need to define the xavier() function.

apaszke · January 24, 2017, 4:47pm

A less Lua way of doing that would be to check if some module is an instance of a class. This is the recommended way:

def weights_init(m):
    if isinstance(m, nn.Conv2d):
        xavier(m.weight.data)
        xavier(m.bias.data)

@Atcold another thing, accessing members prefixed with underscore is not recommended. They’re internal and subject to change without notice. If you want to get a iterator over modules use .modules() (searches recusrively) or .children() (only one level).

Hamid · January 24, 2017, 7:37pm

Thanks guys.
I am trying to apply weight initialization to a fully connected network (nn.Linear). I need, however, fan_out and fan_in of this layer. By fan_out and fan_in I mean number of output neurons and input neurons, respectively. How can I access them?

Atcold · January 24, 2017, 7:51pm

@Hamid, you can check the size of the weight matrix.

size = m.weight.size() # returns a tuple
fan_out = size[0] # number of rows
fan_in = size[1] # number of columns

@apaszke, thanks for the heads-up! I’m still new to the Python world…

Edit: apply @apaszke fix.

apaszke · January 24, 2017, 7:54pm

A small note - .size() is also defined on Variables, so no need to unpack the data. m.weight.size() will work too.

Atcold · January 24, 2017, 8:37pm

@Hamid, are you trying to ask something? I am not sure I understand.
Could you also please format your code with three backticks and the word python, so that I can read what you posted?

Hamid · January 24, 2017, 8:40pm

@Atcold, by checking if isinstance(m, nn.Linear) it would apply to linear module, correct?
If I call the weight initialization, it would be applied to all layers?
I have a residual module each having 2 linear layers. Then several of these modules.

my code:

def weight_init(m): 
	if isinstance(m, nn.Linear):
		size = m.weight.size()
		fan_out = size[0] # number of rows
		fan_in = size[1] # number of columns
		variance = np.sqrt(2.0/(fan_in + fan_out))
		m.weight.data.normal_(0.0, variance)


class Residual(nn.Module):
	def __init__(self,dropout, shape, negative_slope, BNflag = False):
		super(Residual, self).__init__()
		self.dropout = dropout
		self.linear1 = nn.Linear(shape[0],shape[1])
		self.linear2 = nn.Linear(shape[1],shape[0])
		self.dropout = nn.Dropout(self.dropout)
		self.BNflag = BNflag
		self.batch_normlization = nn.BatchNorm1d(shape[0])
		self.leakyRelu = nn.LeakyReLU(negative_slope = negative_slope , inplace=False)

	def forward(self, X):
		x = X
		if self.BNFlag:
			x = self.batch_normlization(x)
		x = self.leakyRelu(x)
		x = self.dropout(x)
		x = self.linear1(x)
		if self.BNFlag:
			x = self.batch_normlization(x)
		x = self.leakyRelu(x)
		x = self.dropout(x)
		x = self.linear2(x)
		x = torch.add(x,X)
		return x
		
		
class FullyCN(nn.Module):
	def __init__(self, args):
		super(FullyCN, self).__init__()
		self.numlayers = arg.sm-num-hidden-layers
		self.learning-rate= args.sm-learning-rate
		self.dropout = arg.sm-dropout-prob
		self.BNflag = args.sm-bn
		self.shape = [args.sm-input-size,args.sm-num-hidden-units]		
		self.res =  Residual(self.dropout,self.shape,args.sm_act_param,self.self.BNflag)
		self.res(weight_init)
		self.res-outpus = []

	def forward(self,X):
		self.res-outpus.append(self.res(X))
		for i in range(self.numlayers):
			self.res-outpus.append(self.res(self.res-outpus[-1]))
		return self.res-outpus[-1]

Hamid · January 24, 2017, 8:43pm

sorry about confusion

Atcold · January 24, 2017, 8:45pm

Correct.

Yup. All linear layers.

Sure, it will apply the normalisation to each Module that belongs to the class nn.Linear.

Hamid · January 24, 2017, 8:48pm

But I have called weight_init once for the class while I call linear layers in a for loop (i.e., there are multiple sets of variables).

Atcold · January 24, 2017, 8:53pm

net = Residual() # generate an instance network from the Net class
net.apply(weights_init) # apply weight init

I’m not too sure what you’re doing with FullyCN…

apaszke · January 24, 2017, 8:54pm

The apply function will search recursively for all the modules inside your network, and will call the function on each of them. So all Linear layers you have in your model will be initialized using this one call.

Hamid · January 24, 2017, 9:07pm

@Atcold, in FullyCN I use several residual modules. In another piece of code, I pass data to FullyCN which returns corresponding output via its forward function.