Defining weights of a custom layer as parameters


(Sriharsha Annamaneni) #1
class Mask(nn.Module):
	def __init__(self):
		super(Mask, self).__init__()
		self.weight = torch.nn.Parameter(data=torch.Tensor(outC, inC, kernel_size, 
			kernel_size), requires_grad=True)
		stddev = 1.0/math.sqrt(inC*kernel_size*kernel_size)
		self.weight.data.uniform(-stddev, stddev)
		self.weight = self.weight.contiguous()
		self.func()

	def func(self):
		...

	def forward(self, inp):
		masked_wt = self.weight.mul(self.mask.cuda())
		return torch.nn.functional.Conv2d(inp, masked_wt, stride=self.stride, padding=self.padding)



class Model(nn.Module):
	def __init__(self, inC=3, outC=32, kernel_size=1, stride=1, padding=0,
		groups=2, bisa=None):
		self.Mask = Mask(inC, outC, kernel_size, stride, groups=groups, bias=bias)
	def forward(self,x):
		x = Mask(x)
		return x

I Intialised Mask.weight as a nn.Parameter. I’m using this module in a larger network. After training when i print all the parameters using named_parameters() iterator. The custom layer weights are not appearing.


#2

I’ve deleted some unnecessary parts of your model and just checked, if Mask.weight are in the parameters, and it seems to work.
Also, I’ve formatted the code in your post. You can post code enclosed in three `.

class Mask(nn.Module):
    def __init__(self):
        super(Mask, self).__init__()
        self.weight = torch.nn.Parameter(data=torch.Tensor(1, 1, 1, 1), requires_grad=True)
        
        self.weight.data.uniform_(-1, 1)
        
    
    def forward(self, x):
        masked_wt = self.weight.mul(1)
        return masked_wt


class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.Mask = Mask()

    def forward(self,x):
        x = Mask(x)
        return x

model = Model()

for name, param in model.named_parameters():
    print(name, param)

(Sriharsha Annamaneni) #3

Thanks for the help these past few weeks. But I still have a doubt. I built Model A and Model B of MobileNet. Model A is build using Mask for pointwise convolutions of groups 2 which is Dense and Model B is build using nn.Conv2d of groups 2. I tried to pass a random Tensor [1,3,32,32] to Model A and copied the weights of Model A to Model B [[1,0; 0, 1] --> [1;1] in this fashion. I am getting same output for both models. I loaded the trained weights and it is still giving the same error.

I thought validation accuracy should be same for both. but I am getting for Model A 85.9% and for Model B 10% on cifar-10 dataset.

When I disabled Net.eval(), I’m getting same Validation Accuracy for both Models 81.3%.

Case II:
I passed random Tensor as input to Models with ModelA.eval() and ModelB.eval() statements after loading weights from state_dict. I’m getting different outputs for both models compared to outputs for ModelA and ModelB which are same before initialising ModelA and ModelB.

What is the reason behind this? Can I still say ModelA and ModelB are behaving the same?

Update

Currently removing BatchNorm weights and training the model. Will validate the model and let you know. But I do not understand why BatchNorm should cause this error.


#4

As far as I understand, you are somehow copying weights between modelA and modelB. Was one model trained and the other randomly initialized?
BatchNorm layers come with weights and a bias (gamma and beta in the paper) as well as with the running statistics (running_mean and running_var).
If you forget to copy the running stats, this could be the issue.


(Sriharsha Annamaneni) #5

You’re correct. I’m not copying running_mean and running_var to Model B.

I am storing weights as mentioned in pytorch imagenet tutorials.

'state_dict': model.state_dict()

I wrote print statement after prec=validate(testloader, model, criterion)

print(model.trianing)

It returned false.

#2

I loaded weights to ModelA as given in Tutorials

checkpoint = torch.load('./result/Mobilenet/grad_best.pth.tar')
Dict = checkpoint['state_dict']
Model.load_state_dict(Dict)

I’m copying weights from Model A to Model B
batchnorm weights

Net.model[i][1].weight.data.copy_(Model.model[i][1].weight.data)
Net.model[i][1].bias.data.copy_(Model.model[i][1].bias.data)

Question

  1. Does state_dict() save running statistics? If not how to save running batchnorm weights?

  2. When I’m copying batchnorm weights layer wise using copy_ does it copy running batchnorm weights also. If not how to copy runninng batchnorm weights to Model B?


(Ryan Cv) #6

Hi ptrblck, for the mask layer, self.weight has shape of (1,1,1,1). For my application, I want the mask be (Batch, Channel, Height, Width)=(32, 1, 28, 28), but the values of mask (actually the shape of mask is 1x28x28) in each batch (all 32 batches) are identical.
question1: I want to update the mask values at each iteration using back propagation just like the mini-batch SGD to update the parameters. I tried to using torch.cat() to get 32 identical copies of mask, but it seems too silly way. Could you tell me how to implement it?

question2: how can it guarantee that only the mask is updated during the back propagation?
Thanks.


#7

I think you can just add a batch dimension of 1 into your mask and let broadcasting do the magic. :wink:
If you don’t want the gradient to be backpropagated to the input, you might need to detach it with mask * x.detach().


(Ryan Cv) #8

for broadcasting, I didn’t find any function in torch to this, could you tell me which function do you mean for broadcasting?