Torch.autograd.grad got None gradients for cascaded model

tl: dr; use append() instead of extend().

Long story:

The reason for this behavior is due to how extend() works.
extend() iterates over the given batch variable and adds the rows one by one to x_list. By doing this, extend() creates its non-leaf nodes, which are unrelated to the computation graphs involving model1, model2.
Using append() leaves batch variable untouched.

image

You can find an explanation about leaf variables from @albanD here:

1 Like