Torch.autograd.grad got None gradients for cascaded model

InnovArul · April 6, 2022, 7:01am

tl: dr; use append() instead of extend().

Long story:

The reason for this behavior is due to how extend() works.
extend() iterates over the given batch variable and adds the rows one by one to x_list. By doing this, extend() creates its non-leaf nodes, which are unrelated to the computation graphs involving model1, model2.
Using append() leaves batch variable untouched.

You can find an explanation about leaf variables from @albanD here: