How to make a tensor part of model parameters?

John_Deterious · July 19, 2019, 7:36am

I have a parameter that is learnable, I want the model to update it. Here is how I attached it to the model:

class Dan(nn.Module):
    def __init__(self):
        super(Dan, self).__init__()
blah blah blah
        self.alpha = t.tensor(0.5, requires_grad=True).cuda()

It is alpha. However, after training, I find its value unchanged. It must hven’t been passed to optimizer when I asked for model.parameters. Why isn’t included in that method, and how to include it?

Mazhar_Shaikh · July 19, 2019, 7:46am

Hi John,
alpha needs to be added to the model as a Parameter.

self.alpha = t.nn.Parameter(t.tensor(0.5), requires_grad=True).cuda()

The documentation for this is available here

John_Deterious · July 19, 2019, 7:58am

Hi, thanks for suggestion. It is not working however, still not being trained.

Paulo_Mann · July 19, 2019, 1:49pm

Are you sure that you send this param to the optimizer? Would be better to help if you provide more code, especially for the optimizer creation and forward() function to see where you use the self.alpha

John_Deterious · July 19, 2019, 9:29pm

Hi Paulo. Good question. Am I sure that I sent this to the optimizer? Well, not explicitly, I just passed the entire model parameters as is conventional, and I assume that alpha was one of them, because after all, it was in the init method, so it should be part of the model. Can I pass it explicitly to the optimizer? How?

The other good question, how was alpha used in the code? I sent it as a paramter to a function that needed it, which is this:

def modrelu(re, im, alpha):
    abs_ = t.sqrt(re**2 + im**2)
    ang = t.atan2(im, re)
    abs_ = nn.functional.relu(abs_ + alpha)
    return abs_ * t.cos(ang), abs_ * t.sin(ang)

So, inside the forward of my neural net, at some point I call this function and I go like:

blah  =  modrelu( blah, blah, self.alpha )

Thanks.

ptrblck · July 19, 2019, 10:33pm

Although the tensor was defined in the __init__ method, it won’t show in the interal parameters:

class Dan(nn.Module):
    def __init__(self):
        super(Dan, self).__init__()
        self.alpha = torch.tensor(0.5, requires_grad=True)

model = Dan()
print(list(model.parameters()))
> []

As @Mazhar_Shaikh said, you should register this tensor as an nn.Parameter:

class Dan(nn.Module):
    def __init__(self):
        super(Dan, self).__init__()
        self.alpha = nn.Parameter(torch.tensor(0.5, requires_grad=True))

model = Dan()
print(list(model.parameters()))
> [Parameter containing:
tensor(0.5000, requires_grad=True)]

which will make sure it’ll be passed to the optimizer.

The requires_grad attribute will be set to True by default for nn.Parameters, so you don’t have to set it manually.

The usage of alpha in modrelu looks good and will yield valid gradients for it.

John_Deterious · July 19, 2019, 10:40pm

It worked! Thank you ptrblck

yingjiao_liu · August 6, 2020, 4:55am

Hi Ptrblck! I got the same problem that the parameters are not updating. I am using torch.nn.Parameters as you suggested, but still does not work. At the last line of init, I created the model parameter named combine_weight in order to obtain trainable weights.

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.GRU = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
        self.combine_weight = nn.Parameter(torch.ones(12) / 12, requires_grad=True).to(device)

And I checked whether the combine_weight has existed in model.parameters and optimizer.param_groups. And yes the print results show that the parameter does exist.

down_model_optimizer = get_optimizer(config, downstream_model, given_learning_rate=config.PER_learning_rate)
i_list = [down_model_optimizer.param_groups[0]['params']]
print(i_list)

print results >>>[[Parameter containing:
tensor([0.0833, 0.0833, 0.0833, 0.0833, 0.0833, 0.0833, 0.0833, 0.0833, 0.0833,
        0.0833, 0.0833, 0.0833], requires_grad=True)

Although this created parameter is in model and optimizer, the number is still 0.0833 without updating.
what other places should I also check? Do you think what may cause this problem? Thank you so much!

ptrblck · August 7, 2020, 4:28am

By calling the to() operation on nn.Parameter you could create a non-leaf parameter.
Move the device transfer into the tensor creation:

self.combine_weight = nn.Parameter(torch.ones(12).to(device) / 12, requires_grad=True)

Let me know, if that helps.

Jaideep_Valani · August 7, 2020, 9:48am

@ptrblck

class loss(torch.nn.Module):
    def __init__(self):
        super(qloss, self).__init__()
        self.w=torch.nn.Parameter( torch.tensor([0.2,0.5,0.8]))

I have defined loss as Module and put some learnable paramters their as you see above.
Will these parameters be updated ,optimizers i suppose takes care of updating only model .parameters

ptrblck · August 7, 2020, 11:31pm

Yes, the optimizer will update the w parameter, if you pass the loss parameters to it (as is done with any other module):

l = loss()
optimizer = optim.SGD(l.parameters(), lr=1.)

Jaideep_Valani · August 8, 2020, 11:09am

how would i have both of them updated … i tried passing it as list but i got an error module.Parameters not accepted or so…
i tried some thing like thiis
optim.SGD([model.parameters(),loss.parameters())

ptrblck · August 9, 2020, 2:27am

You would have to pass all parameters as a list e.g. via:

optim.SGD(list(model.parameters()) + list(l.parameters()))

mathematics · September 12, 2020, 10:22am

Hi
what is difference between two of them
[p for p in model.parameters() if p.requires_grad] or model.parameters()
while using with optimizer, i used to use second , Is there any difference when initializing them in optimizer or both work fine

ptrblck · September 12, 2020, 8:54pm

Both would yield the same result. However, in the second approach the optimizer will internally iterate all parameters and skip the ones which don’t require gradients.
This can have a tiny overhead, as this check is not needed in the first approach.

ptrblck · January 11, 2021, 2:17am

I don’t know when and where you are calling mem_update. Note that you should create the trainable nn.Parameters, pass them to the optimizer, and use them later. Currently you are creating the threshold parameter inside mem_update so if you are repeatedly calling this method, move the parameter creation outside of this method and pass it as an argument to it.

111179 · January 11, 2021, 7:44am

Thank you for your replying. The mem_update is called in

class neural network(nn.Module):
   def __init__(self, inplanes, planes,  batch_size, stride=1, option='A'):
        super(neural network, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = BatchNorm(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = BatchNorm(planes)
        self.drop2 = nn.Dropout(0.2)
        self.planes = planes
        self.stride = stride
  def forward(self, x, c1_mem, c1_spike, c2_mem, c2_spike):
        out = self.bn1(self.conv1(x))
        c1_mem, c1_spike = mem_update(out, c1_mem, c1_spike)
        out = self.bn2(self.conv2(c1_spike))
        out += self.shortcut(x)
        c2_mem, c2_spike = mem_update(out, c2_mem, c2_spike)
        c2_spike = self.drop2(c2_spike)
        return c2_spike, c1_mem, c1_spike, c2_mem, c2_spike

I have moved the threshold parameter outside of the mem_update, but this goes against the learnable threshold parameter I want to implement.

ptrblck · January 11, 2021, 5:00pm

Why do you think it goes against the learnable threshold parameter?
I don’t see in the current code snippet where threshold is created, but in your previous code you were recreating the threshold in each forward pass, so no training was performed on this parameter.

111179 · January 12, 2021, 5:13am

Thank you for your replying.
Based on your suggestions, I moved the threshold to the nn.Module, like:

class SpikingBasicBlock(nn.Module):
    expansion = 1
    def __init__(self, inplanes, planes, image_size, batch_size, stride=1, option='A',init_threshold=1.0):
        super(SpikingBasicBlock, self).__init__()
        self.threshold = nn.Parameter(torch.tensor(init_threshold, dtype=torch.float))
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = BatchNorm(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = BatchNorm(planes)
        self.drop2 = nn.Dropout(0.2)
        self.planes = planes
        self.stride = stride 
    def forward(self, x, c1_mem, c1_spike, c2_mem, c2_spike):
        out = self.bn1(self.conv1(x))
        c1_mem, c1_spike = mem_update(out, c1_mem, c1_spike,self.threshold)     
        out = self.bn2(self.conv2(c1_spike))
        out += self.shortcut(x)
        c2_mem, c2_spike = mem_update(out, c2_mem, c2_spike,self.threshold)
        c2_spike = self.drop2(c2_spike)
        return c2_spike, c1_mem, c1_spike, c2_mem, c2_spike

Up to now, the threshold can update by forward pass, but its changes are minimal. Is it because the gradient of self. threshold is 0 everywhere?

Thank you again for your support.
Best regards.

ptrblck · January 12, 2021, 6:29am

Yes, if the gradient is calculated as zeros for self.threshold, it won’t be updated (unless the gradient was non zero before and you are using an optimizer with running estimates, weight decay etc.).