Backpropagation through non-torch operations in custom layers

gcrd · October 6, 2022, 11:27am

Hi,

I am implementing a method that combines torch layers and custom layers with non-torch operations. I was wondering what was happening to the gradients during backpropagation through these particular layers because the model is still trained. Are they just treated as identity layers, and are the gradients passed as they are through the next layer? Or do the gradients stop flowing?

Thank you for your help.

srishti-git1110 · October 6, 2022, 12:20pm

Could you please post some code? What are those non-torch operations?

gcrd · October 6, 2022, 12:58pm

Unfortunately, I can’t share the code, but the non-torch operations are forward and back projectors for PET image reconstruction.

AlphaBetaGamma96 · October 6, 2022, 2:26pm

Hi @gcrd,

You can manually define this within a torch.autograd.Function with its .forward method being the function call and its .backward method being the derivative of the loss w.r.t all inputs. You’ll need to manually derive an expression, which is the derivative of the output of your function with respect to its inputs, but you can check these expressions with pytorch gradcheck function.

gcrd · October 6, 2022, 3:09pm

Hi,
Thank you for your answer. Unfortunately, I think there has been a misunderstanding in my question.

I agree that I can define this with torch.autograd.Function, although my main question is: what happens when I do not explicitly do it? How does PyTorch take care of parts written with non-torch objects? Because my code runs as expected, the trainable parameters are updated, however, I would like to understand what’s going on during the backpropagation.
Thanks again!

AlphaBetaGamma96 · October 6, 2022, 3:16pm

well, it’s a little hard to visualize the issue at hand given there’s no example code.

But if you’re doing something like linear regression of PET scans, I assume your loss function is something like loss = torch.mean((target - predict)**2). If the target is constructed via non-torch ops, and the predict is purely torch ops then that’s fine as the gradient of the loss w.r.t the parameters is basically,

d_loss/d_params = torch.mean( 2. * (target  -  predict) * d_predict/d_params )

So, from what little you’ve shared it might be the case that you’re not actually backpropagating through your non-torch ops at all, which is why your code works.

gcrd · October 6, 2022, 3:32pm

Hi, thank you for your answer.
Let’s consider the following network:
Input (image) → Conv layer 1 → Non-torch operators → Conv layer 2 → Output (image)

How are the gradients backpropagated through the non-torch operators? I know they are because both Conv layer 1 and 2 parameters are updated.
Thanks.

AlphaBetaGamma96 · October 6, 2022, 11:04pm

Hi @gcrd, it’s a little hard to visualize but perhaps my previous statement about not backpropagating through those ops may explain it? Or if you have some residual/skip connections in your network you might be skipping the non-torch ops, which allows for a gradient to flow?

gcrd · October 7, 2022, 2:00pm

Hi,
Yes, I also think that these custom non-torch layers are probably skipped and the backpropagation is performed as if they were not in the graph.
Many thanks for your answers!

AlphaBetaGamma96 · October 7, 2022, 2:11pm

If you want to visualize the graph, you can use the torchviz package their github is here