I have a kind of specific problem that I can’t seem to figure out. I have created a custom nn.Module mod1. This module contains a conv2d and a second nn.Module mod2 which also contains a conv2d. The calculation process is for mod1 to do a conv2d then sigmoid on its input, mod2 to do a conv2d then sigmoid on its input, then to add those two values together.
I am using a backward hook to track the grad_in and grad_out of mod1. If I do not add mod2’s value grad_in is the inverse sigmoid * grad_out as it should be. If I just return mod2s output value on its own mod1’s grad_in is (inverse sigmoid of mod2)* grad_out. However if I add the two values together like I need to do grad_in just contains two copies of grad_out.
Problem also exists in a much simpler situation. If I just use Mod1 and add 0.0001 to the output same thing happens. Conv2d->sigmoid makes grad_in be inverse sigmoid of grad_out. Conv2d->sigmoid->add 0.0001 makes grad_in = grad_out
After further digging it looks like the the actual value that gets passed to the input layer is the inverse sigmoid, but this value is not contained in grad_in. I was under the Impression when you register backward hook for a module grad_in is the final grad? This seems to imply grad_in is just an intermediary grad and not the final one?
It’s a little hard for me to understand what’s going on, but I’d suggest testing things out on a small scale (make a simple module, add a backwards hook) and see what happens.
It’s a simple module already. The one from my second comment is just two layers of a module that contains a single conv2d, applies, it then applies a sigmoid. If you call a backward hook on that grad_in is correctly, inverse sigmoid * grad_out. If instead you add a constant after applying the sigmoid, grad_out and grad_in are the same in the backward hook for the module. However, it seems that the inverse does get applied before going to the input layer and grad_in just isnt displayed properly in the backward hook which seems to only go one step into the modules multistep gradient calculation. If you’re a dev, I’d be happy to try to help get to the bottom of this. But otherwise, thank you for the reply! The actual grad is correct so I don’t have any issue with grad_in not being right in the backward hook function.