So the backward of y = x.expand(y_size)
is dl_dx = dl_dy.sum_to_size(x.size())
.
Just as broadcasting is inserting implicit expands, the autograd engine will insert implicit “expand backwards” in the form of sumtosize. If you do the backwards manually, you have to do the sum_to_size
yourself.
Best regards
Thomas