I have a few questions about what
torch.autograd.grad really does. When
True does this mean that gradients are computed for no other leaves? Or that no gradient flows past the inputs. Similarly, does
False mean that gradient will flow beyond the inputs to leaves further in the network? Or does
False mean it will flow to everything between inputs and outputs, but not past inputs?
In particular, what happens when using
allow_unused=True where all inputs are “unused”? Can I pass
require_grad=False variables as “unused” as input, then use
grad to compute gradients on leaf nodes based on an output dependent on the unused inputs?