I have a quick question regarding
register_backward_full_hook. To give a quick explanation on my model I effectively have a model which is a Feed-Forward Network that has
N inputs and
L number of
nn.Linear layers. The output of my network is the sign of the output along with its log-absolute value, which is done via
Now, my loss function is effectively broken down into 2 parts. The first part calculates a scaling factor (can be positive or negative) which is detached so it holds no gradient, it only scales each sample of input data. Let’s define that as
scale_factor, which will have the dimensions of
B is the number of samples within my batch. The second part is the log-absolute value of the network, so the output from the
torch.linalg.slogdet function but ignoring the sign part, so,
torch.linalg.slogdet(x). I then define my loss to be the element wise product of these two losses, then mean reduced and that’s my loss, an example of this would be,
input_data = torch.randn(B,N) #input data is shape [B,N] scale_factor = loss1(input_data) #returns Tensor of shape [B] #net returns torch.linalg.slogdet, so just grab logabs value logabs_net = net(input_data) loss = torch.mean( scale_factor.detach() * logabs_net ) optim.zero_grad() loss.backward() #calculate gradients and call backward hooks (all in one go) optim.step()
Within my network, I register
backward_full_hook on my Linear layers (because I seek to precondition my gradients via this information). However, the
grad_output property I need is slightly different than the one PyTorch returns to me and I was wondering if you can place
backward_full_hooks on a different loss value than the same one that’s used to calculate the gradients of your given loss.
As it currently stands the
grad_output Tensor that’s return via the
backward_full_hook returns the gradient of
scale_factor.detach() * logabs_net with respect to the output of a given
nn.Linear layer for all input samples.
My question: Is it at all possible to change the hook such that it returns the gradient of
logabs_net with respect to the output of a given
nn.Linear layer? I’ve tried taking the returned value of the hook and dividing by the
scale_factor.detach() Tensor. However, this value can be equal to 0, and in the case it is it crashes my code because I’m dividing by 0 when in fact the value of the gradient of
logabs_net with respect to output of a Layer will no non-zero and finite.
Any help on this would be greatly appreciated!