Why do we need no_grad before updating a tensor but when not when updating .data?

xlnwel · August 11, 2018, 12:40pm

I’m new to pytorch. I see some code which always calls .data to indirectly update a tensor, an example is
the moving average for the target network in DQN

target_param.data.copy_(tau*local_param.data + (1.0-tau)*target_param.data)

but I also notice that some code directly update tensor in with torch.no_grad(), such as update weights in network

with torch.no_grad():
    w -= w.grad

What’s the difference between these two different update methods for tensor?

ptrblck · August 11, 2018, 12:47pm

If you use .data autograd doesn’t track these changes and you might end up with wrong gradients, as you are modifying the underlying data.
If you want to update a parameter like weight, you should use the new with torch.no_grad() op, as it’s generally not advised to use .data.

xlnwel · August 12, 2018, 8:43am

Based on what you said, it seems that .data is the same as the tensor itself. I want to know whether it is necessary to use .data anymore since pytorch 4.0 has deprecated variable?

ptrblck · August 12, 2018, 11:44am

I wouldn’t use .data unless it’s really necessary and I know exactly, what I’m doing.
Most use cases will work just fine using torch.no_grad().

xlnwel · August 15, 2018, 12:49am

Thanks, I just want to make sure. Does it mean that I could use torch.no_grad() instead in my first example? i.e.

with torch.no_grad():
    target_param = tau*local_param + (1-tau)*target_param