To my understanding stop_gradient() in Tensorflow treats the thing as a constant. Sending in x.detach() to a nn layer will prevent gradients from being computed for x, so I believe the behavior is the same
1 Like