Tensor view not consistent between GPU and CPU

Julian_Buchel · April 7, 2022, 1:35pm

I created a minimal working example of the bug/issue that I have that can be run easily.

import torch
device = "cuda:0"

class Test(torch.nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        w = torch.randn(20,20)
        self.x = torch.nn.Parameter(w)
        self.y = torch.nn.Parameter(w.view(-1))

test = Test().to(device)
assert torch.allclose(test.x.flatten(),test.y), "This works"
test.x.data *= 2.
# assert torch.allclose(test.x.flatten(),test.y), "This does not"

test2 = Test()
assert torch.allclose(test2.x.flatten(),test2.y), "This works"
test2.x.data *= 2.
test2.to(device)
assert torch.allclose(test2.x.flatten(),test2.y), "This also works"

Why is that the case? I feel like when the network is on cpu, the memory is indeed shared. But when the data is on the GPU, updating the base weight does not update the view of it. Why is that? And how can I solve it?

InnovArul · April 7, 2022, 2:08pm

When the data movement happens from the source device to the target device (from CPU to GPU, or vice versa), for different parameters, a different memory location is allocated in the target device, even if they share the same memory in the source device.

for example,

test = Test()  # create in CPU
print(test.x.data_ptr(), test.y.data_ptr()) # prints 1763172664192 1763172664192

test.cuda() # move to GPU
print(test.x.data_ptr(), test.y.data_ptr()) # prints 30150754304 30150756352

I am not sure if it’s intended behavior or a bug.
@ptrblck do you know if it’s the intended behavior?

ptrblck · April 7, 2022, 6:43pm

The usage of the .data attribute is deprecated and should not be used.
Based on your code you are trying to manipulate a leaf variable inplace, which isn’t supported, so could you explain your use case a bit more?

Julian_Buchel · April 8, 2022, 6:47am

My use-case is the following: I re-implemented the Conv2d layer so that it also has a Linear layer. The linear layer basically implements the same conv operation, just using an MVM. Why I do that is not important for this, but in the end they always share the same parameters.
The code that I posted here is just to reproduce this quickly. In practice, this happens when I load a state_dict. The state_dict does not contain that linear layer (I want to hide that from the user) since the information is already in the conv weights.
So, when I initialize the model on a GPU, the parameters do not share the same core data anymore. As a result, when I load the state dict, only the conv weights and not the linear ones get updated correctly. This is not a problem when the model is on the CPU.

ptrblck · April 8, 2022, 6:52am

Thanks for the description. Based on the use case, I think the cleanest way to implement it would be to either register the parameter explicitly and use the functional API for the F.conv2d and F.linear operations or to initialize the nn.Conv2d module and use its weight in F.linear.

Julian_Buchel · April 8, 2022, 7:20am

Unfortunately not an option since the Linear is not always equivalent to a Linear module. It is a custom class that inherits from another class, which in turn inherits from torch.nn.Module. In train and eval mode, your solution would work fine, but I have another mode where this does not work anymore.

Julian_Buchel · April 8, 2022, 7:22am

Is there something like an “update hook”? I.e. everytime the data tensor of the conv parameter gets updated to some value, I also update the data of the linear parameter.
But this created too much overhead during training, since I only need the linear layer to be correct during inference mode.