I Intialised Mask.weight as a nn.Parameter. I’m using this module in a larger network. After training when i print all the parameters using named_parameters() iterator. The custom layer weights are not appearing.
I’ve deleted some unnecessary parts of your model and just checked, if Mask.weight are in the parameters, and it seems to work.
Also, I’ve formatted the code in your post. You can post code enclosed in three `.
class Mask(nn.Module):
def __init__(self):
super(Mask, self).__init__()
self.weight = torch.nn.Parameter(data=torch.Tensor(1, 1, 1, 1), requires_grad=True)
self.weight.data.uniform_(-1, 1)
def forward(self, x):
masked_wt = self.weight.mul(1)
return masked_wt
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.Mask = Mask()
def forward(self,x):
x = Mask(x)
return x
model = Model()
for name, param in model.named_parameters():
print(name, param)
Thanks for the help these past few weeks. But I still have a doubt. I built Model A and Model B of MobileNet. Model A is build using Mask for pointwise convolutions of groups 2 which is Dense and Model B is build using nn.Conv2d of groups 2. I tried to pass a random Tensor [1,3,32,32] to Model A and copied the weights of Model A to Model B [[1,0; 0, 1] --> [1;1] in this fashion. I am getting same output for both models. I loaded the trained weights and it is still giving the same error.
I thought validation accuracy should be same for both. but I am getting for Model A 85.9% and for Model B 10% on cifar-10 dataset.
When I disabled Net.eval(), I’m getting same Validation Accuracy for both Models 81.3%.
Case II:
I passed random Tensor as input to Models with ModelA.eval() and ModelB.eval() statements after loading weights from state_dict. I’m getting different outputs for both models compared to outputs for ModelA and ModelB which are same before initialising ModelA and ModelB.
What is the reason behind this? Can I still say ModelA and ModelB are behaving the same?
Update
Currently removing BatchNorm weights and training the model. Will validate the model and let you know. But I do not understand why BatchNorm should cause this error.
As far as I understand, you are somehow copying weights between modelA and modelB. Was one model trained and the other randomly initialized? BatchNorm layers come with weights and a bias (gamma and beta in the paper) as well as with the running statistics (running_mean and running_var).
If you forget to copy the running stats, this could be the issue.
Does state_dict() save running statistics? If not how to save running batchnorm weights?
When I’m copying batchnorm weights layer wise using copy_ does it copy running batchnorm weights also. If not how to copy runninng batchnorm weights to Model B?
Hi ptrblck, for the mask layer, self.weight has shape of (1,1,1,1). For my application, I want the mask be (Batch, Channel, Height, Width)=(32, 1, 28, 28), but the values of mask (actually the shape of mask is 1x28x28) in each batch (all 32 batches) are identical.
question1: I want to update the mask values at each iteration using back propagation just like the mini-batch SGD to update the parameters. I tried to using torch.cat() to get 32 identical copies of mask, but it seems too silly way. Could you tell me how to implement it?
question2: how can it guarantee that only the mask is updated during the back propagation?
Thanks.
I think you can just add a batch dimension of 1 into your mask and let broadcasting do the magic.
If you don’t want the gradient to be backpropagated to the input, you might need to detach it with mask * x.detach().
It depends a bit on your use case, but I think the most flexible and general use case would be to derive a custom class from resnet50 and to override the forward method with the mask layer applied to the appropriate layer.
Let me know, if this workflow would work for you or if you plan to apply the mask in another way.
I want to build a custom layer using a Parameter object, the layer applies some matrix multiplications on the input using the Parameter object(see below part of the code), my question is: do I have to initialize the Parameter with values in the constructor ? or it will implicitly be handled by PyTorch ?
You should either use a factory method (e.g. torch.randn(size), which would create a tensor with values samples from the normal distribution) or initialize your parameter manually. torch.Tensor will use uninitialized memory and will thus contain random values and might also contain invalid values (NaN, Inf etc.).