Using hook function to save gradients

chenjus · June 26, 2017, 4:26pm

What’s the difference between register_hook(), and register_backward_hook()? Do they do the same thing? How do I save these gradients to a list? Do I need to remove the hook on each iteration of the training loop?

bin_li · June 28, 2017, 6:22am

Hi chen! register_hook() is a function for Variable instance while register_backward_hook() is a function for nn.Module. If you want to save gradients, you can append them to a global list. You don’t need to remove it as long as you want to keep tracking the gradients.

chenjus · July 2, 2017, 1:54pm

How do I use register_backward_hook() if I define my network in a separate class instead of using nn.Sequential as in this thread: Register_backward_hook on nn.Sequential

I’m defining my network class as follows:

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

class Feedforward(nn.Module):
	def __init__(self, topology):
		super(Feedforward, self).__init__()
		self.input_layer  = nn.Linear(topology['features'], topology['hidden_dim'])
		self.hidden_layer = nn.Linear(topology['hidden_dim'], topology['hidden_dim'])
		self.output_layer = nn.Linear(topology['hidden_dim'], topology['output_dim'])
		self.num_hidden   = topology['hidden_layers']


	def forward(self, x):
		hidden = self.input_layer(x).clamp(min=0)

		for _ in range(self.num_hidden):
			hidden = self.hidden_layer(hidden).clamp(min=0)
			
		return self.output_layer(hidden)

and I’m using it in my training class like this:

class Train(object):
	def __init__(self, topology):
		self.network    = Feedforward(topology)
        self.grad_queue = []

def save_gradients(module, in_grad, out_grad):
        self.grad_queue.append(in_grad)


def train(self):
		dh = DataHandler(self.training['data'])

		losses = []
		valid_acc = []
		loss_fn = torch.nn.MSELoss(size_average=False)
		optimizer = torch.optim.Adam(self.network.parameters(), lr=self.training['lr'])

		for x in range(self.training['iterations']):
			batch = dh.get_batch(self.training['batch_size'])
			x = Variable(torch.from_numpy(batch[0]), requires_grad=False)
			y = Variable(torch.from_numpy(batch[1]), requires_grad=False)

			optimizer.zero_grad()
			cost_fn = nn.MSELoss()
			cost = cost_fn(self.network(x), y)

			cost.backward()
            // register_backward_hook(save_gradients)
            optimizer.step()

How would I use register_backward_hook() above? My goal is to be able to manipulate specific and arbitrary gradients, save them, and then use those manipulated gradients for updating the parameters.

I’ve also tried capturing the gradients like this, but I’m not entirely sure if this is the correct way to do it. I’m guessing that using register_backward_hook() is the cleaner and better way to do what I want.

	def get_weights(self):
		obj  = self.network.__dict__['_modules']
		params = {}
		for k, _ in obj.items():
			att = getattr(self.network, k)
			if 'torch.nn.modules' in str(type(att)):
				params[k] = att.weight

		return params


	def capture_grad(self):
		gradients = {}
		params = self.get_weights()
		for p in params:
			gradients[p] = params[p].grad 

		return gradients

Edit:
I compared the way I’m doing it to self.network.parameters() as suggested in the link in my post below and the gradients are the same except using parameters() gives you extra vectors - not sure what those vectors represent. You also have to assume that the gradients are given in order when using parameters() - even indices are the parameters and odd indices are those extra vectors. I haven’t thoroughly played with Pytorch and haven’t tried more types of experiments with capturing gradients using my code above, but it at least using my way, you can index into the dict to access specific layer parameters. Please correct me if I’m wrong. Thanks.

chenjus · July 2, 2017, 5:21pm

Update:
So I found this answer Explicitly obtain gradients? today. What’s the difference between using register_backward_hook() and grad = [p.grad for p in list(network.parameters())]?

smth · July 2, 2017, 10:30pm

if you want gradients just for parameters, 2nd code snippet will work. register_backward_hook will give you intermediate gradients too (not just parameter gradients)

chenjus · July 3, 2017, 3:20am

Ah ok. Sweet thanks!

joekid · March 10, 2020, 1:00pm

Sorry Smth, I am some months learning Pytorch and I have a question related to this topic:

Given a net(), with some convolution layers, I can get the gradients easily:

g1 = net.conv1.weight.grad

for instance, for a `conv1 = conv1 = nn.Conv2d(64, 128, kernel_size=5, padding=2), g1 has a shape torch.Size([128, 64, 5, 5]).

How can I use these gradients in order to get the gradient vector to do other computations (for instance, compute orthogonal vectors, dot product, normalize them but in the context of vectors). Can I assume the kernel_size as the vector-dim ?

Thanks in advance.