Using hook function to save gradients

How do I use register_backward_hook() if I define my network in a separate class instead of using nn.Sequential as in this thread: Register_backward_hook on nn.Sequential

I’m defining my network class as follows:

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

class Feedforward(nn.Module):
	def __init__(self, topology):
		super(Feedforward, self).__init__()
		self.input_layer  = nn.Linear(topology['features'], topology['hidden_dim'])
		self.hidden_layer = nn.Linear(topology['hidden_dim'], topology['hidden_dim'])
		self.output_layer = nn.Linear(topology['hidden_dim'], topology['output_dim'])
		self.num_hidden   = topology['hidden_layers']


	def forward(self, x):
		hidden = self.input_layer(x).clamp(min=0)

		for _ in range(self.num_hidden):
			hidden = self.hidden_layer(hidden).clamp(min=0)
			
		return self.output_layer(hidden)

and I’m using it in my training class like this:

class Train(object):
	def __init__(self, topology):
		self.network    = Feedforward(topology)
        self.grad_queue = []

def save_gradients(module, in_grad, out_grad):
        self.grad_queue.append(in_grad)


def train(self):
		dh = DataHandler(self.training['data'])

		losses = []
		valid_acc = []
		loss_fn = torch.nn.MSELoss(size_average=False)
		optimizer = torch.optim.Adam(self.network.parameters(), lr=self.training['lr'])

		for x in range(self.training['iterations']):
			batch = dh.get_batch(self.training['batch_size'])
			x = Variable(torch.from_numpy(batch[0]), requires_grad=False)
			y = Variable(torch.from_numpy(batch[1]), requires_grad=False)

			optimizer.zero_grad()
			cost_fn = nn.MSELoss()
			cost = cost_fn(self.network(x), y)

			cost.backward()
            // register_backward_hook(save_gradients)
            optimizer.step()

How would I use register_backward_hook() above? My goal is to be able to manipulate specific and arbitrary gradients, save them, and then use those manipulated gradients for updating the parameters.

I’ve also tried capturing the gradients like this, but I’m not entirely sure if this is the correct way to do it. I’m guessing that using register_backward_hook() is the cleaner and better way to do what I want.

	def get_weights(self):
		obj  = self.network.__dict__['_modules']
		params = {}
		for k, _ in obj.items():
			att = getattr(self.network, k)
			if 'torch.nn.modules' in str(type(att)):
				params[k] = att.weight

		return params


	def capture_grad(self):
		gradients = {}
		params = self.get_weights()
		for p in params:
			gradients[p] = params[p].grad 

		return gradients

Edit:
I compared the way I’m doing it to self.network.parameters() as suggested in the link in my post below and the gradients are the same except using parameters() gives you extra vectors - not sure what those vectors represent. You also have to assume that the gradients are given in order when using parameters() - even indices are the parameters and odd indices are those extra vectors. I haven’t thoroughly played with Pytorch and haven’t tried more types of experiments with capturing gradients using my code above, but it at least using my way, you can index into the dict to access specific layer parameters. Please correct me if I’m wrong. Thanks.