Is it possible to use `register_backward_full_hook` on a different loss than the one used to compute the gradient?

Hi All,

I have a quick question regarding register_backward_full_hook. To give a quick explanation on my model I effectively have a model which is a Feed-Forward Network that has N inputs and L number of nn.Linear layers. The output of my network is the sign of the output along with its log-absolute value, which is done via torch.linalg.slogdet.

Now, my loss function is effectively broken down into 2 parts. The first part calculates a scaling factor (can be positive or negative) which is detached so it holds no gradient, it only scales each sample of input data. Let’s define that as scale_factor, which will have the dimensions of [B] where B is the number of samples within my batch. The second part is the log-absolute value of the network, so the output from the torch.linalg.slogdet function but ignoring the sign part, so, torch.linalg.slogdet(x)[1]. I then define my loss to be the element wise product of these two losses, then mean reduced and that’s my loss, an example of this would be,

input_data = torch.randn(B,N) #input data is shape [B,N]

scale_factor = loss1(input_data) #returns Tensor of shape [B]

#net returns torch.linalg.slogdet, so just grab logabs value 
logabs_net = net(input_data)[1] 

loss = torch.mean( scale_factor.detach() * logabs_net )

loss.backward() #calculate gradients and call backward hooks (all in one go)

Within my network, I register forward_pre_hook and backward_full_hook on my Linear layers (because I seek to precondition my gradients via this information). However, the grad_output property I need is slightly different than the one PyTorch returns to me and I was wondering if you can place backward_full_hooks on a different loss value than the same one that’s used to calculate the gradients of your given loss.

As it currently stands the grad_output Tensor that’s return via the backward_full_hook returns the gradient of scale_factor.detach() * logabs_net with respect to the output of a given nn.Linear layer for all input samples.

My question: Is it at all possible to change the hook such that it returns the gradient of logabs_net with respect to the output of a given nn.Linear layer? I’ve tried taking the returned value of the hook and dividing by the scale_factor.detach() Tensor. However, this value can be equal to 0, and in the case it is it crashes my code because I’m dividing by 0 when in fact the value of the gradient of logabs_net with respect to output of a Layer will no non-zero and finite.

Any help on this would be greatly appreciated! :slight_smile:

You can do this via using both loss function and just caching the needed values