Hi All,
I have a quick question regarding register_backward_full_hook
. To give a quick explanation on my model I effectively have a model which is a Feed-Forward Network that has N
inputs and L
number of nn.Linear
layers. The output of my network is the sign of the output along with its log-absolute value, which is done via torch.linalg.slogdet
.
Now, my loss function is effectively broken down into 2 parts. The first part calculates a scaling factor (can be positive or negative) which is detached so it holds no gradient, it only scales each sample of input data. Let’s define that as scale_factor
, which will have the dimensions of [B]
where B
is the number of samples within my batch. The second part is the log-absolute value of the network, so the output from the torch.linalg.slogdet
function but ignoring the sign part, so, torch.linalg.slogdet(x)[1]
. I then define my loss to be the element wise product of these two losses, then mean reduced and that’s my loss, an example of this would be,
input_data = torch.randn(B,N) #input data is shape [B,N]
scale_factor = loss1(input_data) #returns Tensor of shape [B]
#net returns torch.linalg.slogdet, so just grab logabs value
logabs_net = net(input_data)[1]
loss = torch.mean( scale_factor.detach() * logabs_net )
optim.zero_grad()
loss.backward() #calculate gradients and call backward hooks (all in one go)
optim.step()
Within my network, I register forward_pre_hook
and backward_full_hook
on my Linear layers (because I seek to precondition my gradients via this information). However, the grad_output
property I need is slightly different than the one PyTorch returns to me and I was wondering if you can place backward_full_hooks
on a different loss value than the same one that’s used to calculate the gradients of your given loss.
As it currently stands the grad_output
Tensor that’s return via the backward_full_hook
returns the gradient of scale_factor.detach() * logabs_net
with respect to the output of a given nn.Linear
layer for all input samples.
My question: Is it at all possible to change the hook such that it returns the gradient of logabs_net
with respect to the output of a given nn.Linear
layer? I’ve tried taking the returned value of the hook and dividing by the scale_factor.detach()
Tensor. However, this value can be equal to 0, and in the case it is it crashes my code because I’m dividing by 0 when in fact the value of the gradient of logabs_net
with respect to output of a Layer will no non-zero and finite.
Any help on this would be greatly appreciated!