How to understand detach() is safe but data() is not?

kmyfoer · January 24, 2019, 11:27am

The official document says,

“However, .data can be unsafe in some cases. Any changes on x.data wouldn’t be tracked by autograd, and the computed gradients would be incorrect if x is needed in a backward pass. A safer alternative is to use x.detach(), which also returns a Tensor that shares data with requires_grad=False, but will have its in-place changes reported by autograd if x is needed in backward.”

How to understand the sentence of bold? I get a new tensor with detach(), the new tensor is still related to x ? What the in-place changes are reported to? If it means in-place changes of the new tensor will be reported to x, why is it doing this?

oh…I really dont understand. Anyone can help me ? thanks a lot!

albanD · January 24, 2019, 1:02pm

Hi,

If you do c = a * b. And get the gradients for a given the gradient for c as gc, you get ga = gc * b.
As you can see, the value of b is needed for by the backward pass to be computed properly.
The autograd engine needs to make sure that if you change b, then it needs to raise an error because it can’t compute the gradients for a anymore.

The thing is that if you change b.data inplace. The autograd engine will not know it and will compute ga = gc * b with no error even though the content of b has changed and so the computed gradient will be wrong.

Hope this helps !

kmyfoer · January 25, 2019, 8:33am

Yes, it helps! So the .data and .detach() will get a tensor sharing data with b. The change on the new tensor will also influence b, but detach() will raise an error in that case. Do you mean this?

two_four · January 25, 2019, 8:55am

when you call backward the detach() version will raise an error to notify you b being changed somewhere but the .data version would not, here is an example What about .data?