Why do we need no_grad before updating a tensor but when not when updating .data?

(The Raven Chaser) #1

I’m new to pytorch. I see some code which always calls .data to indirectly update a tensor, an example is
the moving average for the target network in DQN

target_param.data.copy_(tau*local_param.data + (1.0-tau)*target_param.data)

but I also notice that some code directly update tensor in with torch.no_grad(), such as update weights in network

with torch.no_grad():
    w -= w.grad

What’s the difference between these two different update methods for tensor?


If you use .data autograd doesn’t track these changes and you might end up with wrong gradients, as you are modifying the underlying data.
If you want to update a parameter like weight, you should use the new with torch.no_grad() op, as it’s generally not advised to use .data.

(The Raven Chaser) #3

Based on what you said, it seems that .data is the same as the tensor itself. I want to know whether it is necessary to use .data anymore since pytorch 4.0 has deprecated variable?


I wouldn’t use .data unless it’s really necessary and I know exactly, what I’m doing.
Most use cases will work just fine using torch.no_grad().

(The Raven Chaser) #5

Thanks, I just want to make sure. Does it mean that I could use torch.no_grad() instead in my first example? i.e.

with torch.no_grad():
    target_param = tau*local_param + (1-tau)*target_param