Hi All,

I have a quick question regarding `register_backward_full_hook`

. To give a quick explanation on my model I effectively have a model which is a Feed-Forward Network that has `N`

inputs and `L`

number of `nn.Linear`

layers. The output of my network is the sign of the output along with its log-absolute value, which is done via `torch.linalg.slogdet`

.

Now, my loss function is effectively broken down into 2 parts. The first part calculates a scaling factor (can be positive or negative) which is detached so it holds no gradient, it only scales each sample of input data. Let’s define that as `scale_factor`

, which will have the dimensions of `[B]`

where `B`

is the number of samples within my batch. The second part is the log-absolute value of the network, so the output from the `torch.linalg.slogdet`

function but ignoring the sign part, so, `torch.linalg.slogdet(x)[1]`

. I then define my loss to be the element wise product of these two losses, then mean reduced and that’s my loss, an example of this would be,

```
input_data = torch.randn(B,N) #input data is shape [B,N]
scale_factor = loss1(input_data) #returns Tensor of shape [B]
#net returns torch.linalg.slogdet, so just grab logabs value
logabs_net = net(input_data)[1]
loss = torch.mean( scale_factor.detach() * logabs_net )
optim.zero_grad()
loss.backward() #calculate gradients and call backward hooks (all in one go)
optim.step()
```

Within my network, I register `forward_pre_hook`

and `backward_full_hook`

on my Linear layers (because I seek to precondition my gradients via this information). However, the `grad_output`

property I need is slightly different than the one PyTorch returns to me and I was wondering if you can place `backward_full_hooks`

on a different loss value than the same one that’s used to calculate the gradients of your given loss.

As it currently stands the `grad_output`

Tensor that’s return via the `backward_full_hook`

returns the gradient of `scale_factor.detach() * logabs_net`

with respect to the output of a given `nn.Linear`

layer for **all** input samples.

**My question**: Is it at all possible to change the hook such that it returns the gradient of `logabs_net`

with respect to the output of a given `nn.Linear`

layer? I’ve tried taking the returned value of the hook and dividing by the `scale_factor.detach()`

Tensor. However, this value can be equal to 0, and in the case it is it crashes my code because I’m dividing by 0 when in fact the value of the gradient of `logabs_net`

with respect to output of a Layer will no non-zero and finite.

Any help on this would be greatly appreciated!