How to access gradients for activations

Gradients for model parameters could be accessed directly (e.g. self.conv1.weight.grad). What about gradients for activations?

I use ReLU activations, so I technically I could use gradients for biases. The problem is I don’t use biases in my network. So, short of computing the entire chain of gradients manually, is there a way to get them from autograd?

Why do I need them? I want to use a learnable threshold for ReLU clipping:

relu1 = torch.where(relu1 > thr, thr, relu1)

where thr is a trainable model parameter. The threshold function is not differentiable, so I want to estimate its gradient from the gradients of the activations. The gradient for thr should be proportional to the sum of gradients for all activations.

Any ideas of how to do it better are welcome!

1 Like


You can call ‘.retain_grad()’ on any tensor that requires gradients so that the .grad field will be populated by a call to backward.

I tried retain_grad(), but I’m getting None value for the gradients:

class Net(nn.Module):
	def __init__(self):
		super(Net, self).__init__()

		self.act_max = nn.Parameter(torch.Tensor([0]), requires_grad=True)

		self.conv1 = nn.Conv2d(3, 32, kernel_size=5)
		self.conv2 = nn.Conv2d(32, 64, kernel_size=5)
		self.pool = nn.MaxPool2d(2, 2)
		self.relu = nn.ReLU()
		self.linear = nn.Linear(64 * 5 * 5, 10)

	def forward(self, input):
		conv1 = self.conv1(input)
		pool1 = self.pool(conv1)
		relu1 = self.relu(pool1)

		relu1 = torch.where(relu1 > self.act_max, self.act_max, relu1)
                print(relu1.grad)    #this prints None

		conv2 = self.conv2(relu1)
		pool2 = self.pool(conv2)
		relu2 = self.relu(pool2)
		relu2 = relu2.view(relu2.size(0), -1)
		return self.linear(relu2)

model = Net()
nn.init.constant_(model.act_max, 1.0)
model = model.cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, nesterov=True)

for epoch in range(100):
	for i in range(1000):
		output = model(input)
		loss = nn.CrossEntropyLoss()(output, label)

How should I do it?

I also tried print(torch.autograd.grad(loss, relu1)) after loss.backward(retain_graph=True) and that works, but if I understand this correctly, it repeats the backward pass, so retain_grad() method should be more efficient, right?


The “.grad” field is only populated when you call .backward(). Just after creation of the Tensor it will always be None.
You will need to save the “relu1” value in some way. Like in self. or return it or in a global. And then print relu1.grad after calling backward().

1 Like

Got it, works now, thanks!