Defining weights of a custom layer as parameters

sriharsha0806 · May 7, 2018, 6:52pm

class Mask(nn.Module):
	def __init__(self):
		super(Mask, self).__init__()
		self.weight = torch.nn.Parameter(data=torch.Tensor(outC, inC, kernel_size, 
			kernel_size), requires_grad=True)
		stddev = 1.0/math.sqrt(inC*kernel_size*kernel_size)
		self.weight.data.uniform(-stddev, stddev)
		self.weight = self.weight.contiguous()
		self.func()

	def func(self):
		...

	def forward(self, inp):
		masked_wt = self.weight.mul(self.mask.cuda())
		return torch.nn.functional.Conv2d(inp, masked_wt, stride=self.stride, padding=self.padding)



class Model(nn.Module):
	def __init__(self, inC=3, outC=32, kernel_size=1, stride=1, padding=0,
		groups=2, bisa=None):
		self.Mask = Mask(inC, outC, kernel_size, stride, groups=groups, bias=bias)
	def forward(self,x):
		x = Mask(x)
		return x

I Intialised Mask.weight as a nn.Parameter. I’m using this module in a larger network. After training when i print all the parameters using named_parameters() iterator. The custom layer weights are not appearing.

ptrblck · May 8, 2018, 10:52am

I’ve deleted some unnecessary parts of your model and just checked, if Mask.weight are in the parameters, and it seems to work.
Also, I’ve formatted the code in your post. You can post code enclosed in three `.

class Mask(nn.Module):
    def __init__(self):
        super(Mask, self).__init__()
        self.weight = torch.nn.Parameter(data=torch.Tensor(1, 1, 1, 1), requires_grad=True)
        
        self.weight.data.uniform_(-1, 1)
        
    
    def forward(self, x):
        masked_wt = self.weight.mul(1)
        return masked_wt


class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.Mask = Mask()

    def forward(self,x):
        x = Mask(x)
        return x

model = Model()

for name, param in model.named_parameters():
    print(name, param)

sriharsha0806 · May 9, 2018, 3:33am

Thanks for the help these past few weeks. But I still have a doubt. I built Model A and Model B of MobileNet. Model A is build using Mask for pointwise convolutions of groups 2 which is Dense and Model B is build using nn.Conv2d of groups 2. I tried to pass a random Tensor [1,3,32,32] to Model A and copied the weights of Model A to Model B [[1,0; 0, 1] --> [1;1] in this fashion. I am getting same output for both models. I loaded the trained weights and it is still giving the same error.

I thought validation accuracy should be same for both. but I am getting for Model A 85.9% and for Model B 10% on cifar-10 dataset.

When I disabled Net.eval(), I’m getting same Validation Accuracy for both Models 81.3%.

Case II:
I passed random Tensor as input to Models with ModelA.eval() and ModelB.eval() statements after loading weights from state_dict. I’m getting different outputs for both models compared to outputs for ModelA and ModelB which are same before initialising ModelA and ModelB.

What is the reason behind this? Can I still say ModelA and ModelB are behaving the same?

Update

Currently removing BatchNorm weights and training the model. Will validate the model and let you know. But I do not understand why BatchNorm should cause this error.

ptrblck · May 9, 2018, 7:29am

As far as I understand, you are somehow copying weights between modelA and modelB. Was one model trained and the other randomly initialized?
BatchNorm layers come with weights and a bias (gamma and beta in the paper) as well as with the running statistics (running_mean and running_var).
If you forget to copy the running stats, this could be the issue.

sriharsha0806 · May 9, 2018, 9:25am

You’re correct. I’m not copying running_mean and running_var to Model B.

I am storing weights as mentioned in pytorch imagenet tutorials.

'state_dict': model.state_dict()

I wrote print statement after prec=validate(testloader, model, criterion)

print(model.trianing)

It returned false.

#2

I loaded weights to ModelA as given in Tutorials

checkpoint = torch.load('./result/Mobilenet/grad_best.pth.tar')
Dict = checkpoint['state_dict']
Model.load_state_dict(Dict)

I’m copying weights from Model A to Model B
batchnorm weights

Net.model[i][1].weight.data.copy_(Model.model[i][1].weight.data)
Net.model[i][1].bias.data.copy_(Model.model[i][1].bias.data)

Question

Does state_dict() save running statistics? If not how to save running batchnorm weights?
When I’m copying batchnorm weights layer wise using copy_ does it copy running batchnorm weights also. If not how to copy runninng batchnorm weights to Model B?

RyanCV · October 10, 2018, 7:53pm

Hi ptrblck, for the mask layer, self.weight has shape of (1,1,1,1). For my application, I want the mask be (Batch, Channel, Height, Width)=(32, 1, 28, 28), but the values of mask (actually the shape of mask is 1x28x28) in each batch (all 32 batches) are identical.
question1: I want to update the mask values at each iteration using back propagation just like the mini-batch SGD to update the parameters. I tried to using torch.cat() to get 32 identical copies of mask, but it seems too silly way. Could you tell me how to implement it?

question2: how can it guarantee that only the mask is updated during the back propagation?
Thanks.

ptrblck · October 10, 2018, 11:18pm

I think you can just add a batch dimension of 1 into your mask and let broadcasting do the magic.
If you don’t want the gradient to be backpropagated to the input, you might need to detach it with mask * x.detach().

RyanCV · October 11, 2018, 1:21am

for broadcasting, I didn’t find any function in torch to this, could you tell me which function do you mean for broadcasting?

Steve_Hu · November 4, 2019, 2:58am

if i have defined a mask layer like this, how should i add it to a existed torchvision model,say add mask behind resnet50 some layer?

ptrblck · November 4, 2019, 6:51am

It depends a bit on your use case, but I think the most flexible and general use case would be to derive a custom class from resnet50 and to override the forward method with the mask layer applied to the appropriate layer.

Let me know, if this workflow would work for you or if you plan to apply the mask in another way.

Steve_Hu · November 4, 2019, 7:22am

it sounds a feasible idea, i will try it ! thanks:grin:

Steve_Hu · November 5, 2019, 3:04am

the hook technique cann’t change the forward value? right?

ptrblck · November 5, 2019, 3:07am

It should be possible, if you don’t detach the tensor.
Are you seeing any issues using forward hooks?

Steve_Hu · November 5, 2019, 3:12am

nope，so i think use forward hook to add attention after some conv layer seems feasible and quick;

Rowing0914 · March 8, 2020, 4:34am

I guess, this post might be related to our discussion!

Rami_Nasser · July 2, 2020, 7:49am

in this setup the weights tensor have to be initialized with some values ?

ptrblck · July 2, 2020, 10:03am

If you are working with trainable parameters, then you would usually initialize them using a method from nn.init or any other random distribution.

I’m not sure, if I understand the question correctly, so please feel free to add more information in case I missed the point.

Rami_Nasser · July 2, 2020, 1:10pm

I want to build a custom layer using a Parameter object, the layer applies some matrix multiplications on the input using the Parameter object(see below part of the code), my question is: do I have to initialize the Parameter with values in the constructor ? or it will implicitly be handled by PyTorch ?

class MuLayer(nn.Module):

    def __init__(self, comp, features):
        super(MuLayer, self).__init__()
        self.w = torch.nn.Parameter(data=torch.Tensor(comp, features), requires_grad=True)

    def forward(self, y, x):
        x.mm(torch.transpose(self.w, 0, 1))
        ...

ptrblck · July 3, 2020, 6:58am

You should either use a factory method (e.g. torch.randn(size), which would create a tensor with values samples from the normal distribution) or initialize your parameter manually.
torch.Tensor will use uninitialized memory and will thus contain random values and might also contain invalid values (NaN, Inf etc.).