Are there conditions where slicing changes gradients/backprop?

jkr · February 9, 2022, 8:18am

We are training a network and changing
x = [cls_score for cls_score in cls_scores]
to
x = [cls_score[:] for cls_score in cls_scores]
in the loss calculation of the network yields different losses but only after a few iterations. This is the only change we made. The differences in the loss start very small and then become bigger. (Our actual use case is different, but this is the smallest change in code that elicited the difference in the loss.)
Our guess is, that gradients are different during backpropagation in the two cases. This would explain why the losses became different.
We have seen different examples in the forum where slicing did not have a negative effect on backpropagation, but we can’t think about another explanation for this effect.
We have also read that there were problems with backprop with torch versions before 1.0.9, but we are using 1.10.2.

Does anyone have an idea/guess why the losses differ? Is is because of different gradients or is there another explanation for our observation? We are happy about any suggestions and hints!