Hi,
I know that .backward() can dynamically calculate the gradient. I wonder how we can obtain the weight’s gradient layer by layer during the .backward() calculation. Hope someone could help
-
I know we can monitor the input and output of each layer during the .backward() by using .register_backward_hook(). I wonder if layer.register_backward_hook(module, grad_out, grad_in) is the right way to get the weights’ gradient of each layer. The reason I am asking is that I don’t see the output difference between layer.register_backward_hook(module, grad_out, grad_in) and layer.register_backward_hook(module, input, output) in my own example.
-
If I am able to manually do the backward propagation layer by layer (see code below, followed by an example from How to split backward process wrt each layer of neural network?). I wonder if this is correct that I get weight gradient of each layer via self.layers[i].weight.grad during the backward execution (check the last line of backward() function).
import torch.nn as nn
from torch.autograd import Variable
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.layers = nn.ModuleList([
nn.Linear(10, 10),
nn.Linear(10, 10),
nn.Linear(10, 10),
nn.Linear(10, 10),
])
def forward(self, x):
self.output = []
self.input = []
for layer in self.layers:
# detach from previous history
x = Variable(x.data, requires_grad=True)
self.input.append(x)
# compute output
x = layer(x)
# add to list of outputs
self.output.append(x)
return x
def backward(self, g):
for i, output in reversed(list(enumerate(self.output))):
if i == (len(self.output) - 1):
# for last node, use g
output.backward(g)
print(self.input[i].grad.shape)
else:
output.backward(self.input[i+1].grad.data)
print(self.layers[i].weight.grad)
model = Net()
inp = Variable(torch.randn(4, 10))
output = model(inp)
gradients = torch.randn(*output.size())
model.backward(gradients)