var_a = var_b will just make the python variable var_a contain the same tensor as the python variable var_b. Whatever was contained in var_a is discarded.
var_a.data.copy(var_b.data) this will copy the content of the tensor var_b into the tensor contained in the python variable var_a.
what happens is that var_b + 1 creates a new tensor containing the result, and then this tensor is associated with the python variable var_a.
The use of .data here breaks the autograd engine. This means that this operation won’t be tracked and so gradient computation for var_a and var_b might be wrong as some of the operations you perform on them are not recorded.
If you were doing var_a.copy_(var_b), Then this is a differentiable operation: the gradient for the original values in var_a is just 0 everywhere (as it is independant of the output), and the gradients for the elements in var_b are 1.
The copy changes elements in place, so that is useful when you want to do that:
big_tensor = torch.rand(batch, n_chan+1, dim)
# I want to fill the last channel with the mean of the others (as an example):
last_channel = big_tensor.select(1, -1) # Get the last channel in-place.
last_channel.copy_(big_tensor.narrow(1, 0, n_chan).mean(1)) # Change the last channel
# Now you can use big_tensor, where the last channel has been changed.
out = my_net(big_tensor)