I know that the .data
attribute of tensors has been deprecated (when Variable got merged into Tensor) and I also know its use is discouraged because it causes problems with gradient computation. However, I still come across old code bases that use it extensively. In initial phases of my short-term projects, I do not want to update a bunch of deprecated uses if they are harmless to the logical flow and effect of the code.
In short
When is it ok to leave
tensor.data
in old code? Can we understand the OK and dangerous uses?
What I know is bad: Let’s say we are building a computation graph with the ultimate goal of computing the gradient of loss
with respect to weights. Assigning values directly to the data
attribute in any computations on tensors in the path to loss
will not be directly tracked. This could lead to incorrect derivatives of loss wrt weights.
What I’m not sure about: If we just use tensor.data
on the RIGHT HAND SIDE of a computation that does NOT lead to the desired output (loss) in the computation graph, this should be fine, right? In this case, .data
will not track the gradient, and the computation does not lead to a gradient we care about anyway. For example, say I want to compute the norm of all my weight matrices and write them to disk for analysis later on. I could do norm(model.layer.weight.data)
, right? Same goes for simply tracking the loss: you could in principle do print(loss.data)
without causing harm.
I know that it would be better to use .detach()
here, but would it be dangerous or lead to unintended consequences to use .data
here?
What else? Are there other dangers of .data
to be aware of?
Thank you