ReLU the weights at the end of RNN with autograd computing the graident


#1

I am currently doing a constrained optimization in RNN, where the weights are constrained with all entries non-negative. That is, at the end of each iteration, I need to call

self.weight = ReLU(self.weight)

but it will give an error:
TypeError: cannot assign 'torch.cuda.FloatTensor' as parameter 'weight' (torch.nn.Parameter or None expected)

There are two options now:

self.weight = nn.Parameter(ReLU(self.weight)), which would make self.weight.grad None. Or I do:

self.weight.data = ReLU(self.weight), which would make the autograd not consider the gradient of ReLU. Is there any way to make sure the last call of ReLU in each iteration does have its gradient passed?

By the way, letting w = ReLU(self.weight) and use w for the subsequent computation does not work, because it will take the gradient of ReLU at the end of the backward, rather than at the beginning, which could be drastically different if I do some non-multiplicative operations in between.


#2

Hello,

I wrote a small snippet, it seems that the first options you mentioned works for me.

import torch
import torch.nn as nn
import torch.nn.functional as f
class Unit(nn.Module):
    def __init__(self):
        super(Unit, self).__init__()
        self.linear = nn.Linear(3,3,bias=False)

    def forward(self, input):
        self.linear.weight = nn.Parameter(f.relu(self.linear.weight), requires_grad=True)
        output = self.linear(input)
        return output

random_input = torch.randn(3,3)
random_target = torch.randn(3,3)
criterion = nn.MSELoss()

unit = Unit()
loss = criterion(unit(random_input), random_target)
print("linear weight", unit.linear.weight)  # here linear.weight is non-negative
loss.backward()
print(unit.linear.weight.requires_grad)
print(unit.linear.weight.grad) # here linear.weight.grad is not None

#3

Why was the weight.grad for me NoneType? Probably it doesn’t work if the forward is in a for loop.


#4

It also works fine in a for loop.:thinking:


#5

Can you please share your code with for loop? I tried this and in the end my self.weight.grad is NoneType.


#6

I just run the snippet above in for loop as follows:

random_input = torch.randn(3,3)
random_target = torch.randn(3,3)
criterion = nn.MSELoss()

unit = Unit()
for i in range(5):
    loss = criterion(unit(random_input), random_target)
    print("linear weight", unit.linear.weight)
    loss.backward()
    print(unit.linear.weight.requires_grad)
    print(unit.linear.weight.grad)

And linear.weight.grad has value in each iteration.