When is `tensor.data` fine to leave in old code?

murph213 · April 5, 2020, 1:15pm

I know that the .data attribute of tensors has been deprecated (when Variable got merged into Tensor) and I also know its use is discouraged because it causes problems with gradient computation. However, I still come across old code bases that use it extensively. In initial phases of my short-term projects, I do not want to update a bunch of deprecated uses if they are harmless to the logical flow and effect of the code.

In short

When is it ok to leave tensor.data in old code? Can we understand the OK and dangerous uses?

What I know is bad: Let’s say we are building a computation graph with the ultimate goal of computing the gradient of loss with respect to weights. Assigning values directly to the data attribute in any computations on tensors in the path to loss will not be directly tracked. This could lead to incorrect derivatives of loss wrt weights.

What I’m not sure about: If we just use tensor.data on the RIGHT HAND SIDE of a computation that does NOT lead to the desired output (loss) in the computation graph, this should be fine, right? In this case, .data will not track the gradient, and the computation does not lead to a gradient we care about anyway. For example, say I want to compute the norm of all my weight matrices and write them to disk for analysis later on. I could do norm(model.layer.weight.data), right? Same goes for simply tracking the loss: you could in principle do print(loss.data) without causing harm.

I know that it would be better to use .detach() here, but would it be dangerous or lead to unintended consequences to use .data here?

What else? Are there other dangers of .data to be aware of?

Thank you

charan_Vjy · April 6, 2020, 4:01am

You seem to have covered the pitfalls of using .data quite well. Rule of thumb, make sure variables needed for gradient computation do not use the .data method. The simplest way to ensure is this is to convert .data and .detach and see if it throws an error. Another way to avoid incorrect gradient computation would be to comment out the .data call and see if the ouputs after few epochs are the same. PyTorch 0.4.0 Migration Guide | PyTorch highlights the problem of using .data with an example.

Are there other dangers of .data to be aware of?

None that I know off.

albanD · April 6, 2020, 3:32pm

would it be dangerous or lead to unintended consequences to use .data here?

Another major difference between .data and .detach() that you don’t mention here is inplace correctness.
Basically, when we save Tensors because their value is needed to compute the gradients, we have special logic that make sure that it wasn’t modified inplace between the moment where it was saved and the moment where we try to use the value.
.data will break these checks. So even if you use .data on the right hand side of some op that is not used to compute the loss. If you later modify that Tensor inplace and that Tensor was needed for backward, you can end up with wrong gradients because the saved value was changed but the .data hide it from the inplace detection code.