How to access gradients for activations

michaelklachko · December 21, 2018, 1:23am

Gradients for model parameters could be accessed directly (e.g. self.conv1.weight.grad). What about gradients for activations?

I use ReLU activations, so I technically I could use gradients for biases. The problem is I don’t use biases in my network. So, short of computing the entire chain of gradients manually, is there a way to get them from autograd?

Why do I need them? I want to use a learnable threshold for ReLU clipping:

relu1 = torch.where(relu1 > thr, thr, relu1)

where thr is a trainable model parameter. The threshold function is not differentiable, so I want to estimate its gradient from the gradients of the activations. The gradient for thr should be proportional to the sum of gradients for all activations.

Any ideas of how to do it better are welcome!

albanD · December 21, 2018, 8:51pm

Hi,

You can call ‘.retain_grad()’ on any tensor that requires gradients so that the .grad field will be populated by a call to backward.

michaelklachko · December 22, 2018, 8:04am

I tried retain_grad(), but I’m getting None value for the gradients:

class Net(nn.Module):
	def __init__(self):
		super(Net, self).__init__()

		self.act_max = nn.Parameter(torch.Tensor([0]), requires_grad=True)

		self.conv1 = nn.Conv2d(3, 32, kernel_size=5)
		self.conv2 = nn.Conv2d(32, 64, kernel_size=5)
		self.pool = nn.MaxPool2d(2, 2)
		self.relu = nn.ReLU()
		self.linear = nn.Linear(64 * 5 * 5, 10)

	def forward(self, input):
		conv1 = self.conv1(input)
		pool1 = self.pool(conv1)
		relu1 = self.relu(pool1)

		relu1 = torch.where(relu1 > self.act_max, self.act_max, relu1)
                relu1.retain_grad()
                print(relu1.grad)    #this prints None

		conv2 = self.conv2(relu1)
		pool2 = self.pool(conv2)
		relu2 = self.relu(pool2)
		relu2 = relu2.view(relu2.size(0), -1)
		return self.linear(relu2)

model = Net()
model.apply(utils.weights_init)
nn.init.constant_(model.act_max, 1.0)
model = model.cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, nesterov=True)

for epoch in range(100):
	model.train()
	for i in range(1000):
		output = model(input)
		loss = nn.CrossEntropyLoss()(output, label)
		optimizer.zero_grad()
		loss.backward()
		optimizer.step()

How should I do it?

I also tried print(torch.autograd.grad(loss, relu1)) after loss.backward(retain_graph=True) and that works, but if I understand this correctly, it repeats the backward pass, so retain_grad() method should be more efficient, right?

albanD · December 22, 2018, 5:24pm

Hi,

The “.grad” field is only populated when you call .backward(). Just after creation of the Tensor it will always be None.
You will need to save the “relu1” value in some way. Like in self. or return it or in a global. And then print relu1.grad after calling backward().

michaelklachko · December 22, 2018, 11:52pm

Got it, works now, thanks!

kountaydwivedi · July 29, 2022, 12:15pm

Hello Sir,

I am stuck at the same problem. Could you please explain how did you accessed the gradients after calling loss.backward() (or if possible, share your script) ? It would be of great help.

@albanD Sir, I have tried saving the gradients of ReLU using a variable in forward function (shown below). However, when I print it after calling model(), it prints None, as you mentioned. However, when I print it after calling loss.backward(), I get nothing.

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        
#         self.flatten = nn.Flatten()
#         self.classify = nn.Sequential(
#             collections.OrderedDict([
#                 ('linear1', nn.Linear(28*28, 8)),
#                 ('relu1', nn.ReLU()),
#                 ('linear2', nn.Linear(8, 10)),
#             ])
#         )
        
        self.flatten = nn.Flatten()
        self.linear1 = nn.Linear(28*28, 8)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(8, 10)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, x):
        flatten = self.flatten(x)
        linear1 = self.linear1(flatten)
        relu = self.relu(linear1)
        relu_grad = relu.retain_grad()
        print('*****',relu_grad)
        linear2 = self.linear2(relu)
        output = self.softmax(linear2)
#         x = self.classify(x)
        return output, relu_grad

Thank you.

albanD · August 5, 2022, 6:49pm

The retain_grad() function doesn’t return anything.
You need to return relu here and check relu.grad after the backward call.

kountaydwivedi · August 12, 2022, 10:30am

Thanks @albanD . I’ll try it out.