How to move all tensors to cuda?

I am kind of new to PyTorch and training on GPU. When I define a model (a network) myself, I can move all tensor I define in the model to cuda using However, if I want to use the model defined by others, for example, cloning from others’ github repo, I cannot modify the model.
In this case, if I just move the network to cuda, it won’t work. I wonder how I can move all tesnors they define in the model to cuda? :thinking:

Pytorch works with a recursive calling.

A nn.Module instance (aka, neural network/layer) is usually composed by other nn.Module instances.
once you have defined the main one (let’s call it model) you can move all the instances and subinstances (layers) inside that by doing model=model.cuda()

nn.Modules are composed of 2 main parts.
The init method in which you define the layers (which are also nn.Modules as I aforementioned) and the forward, which defines how those layers are used.

Following good practices, code shouldn’t hardcode allocation (whether cuda or cpu) in neither init or forward.
That means that, once you have a model, you can choose whether to process it using the cpu or cuda.

For the former case you have to do nothing (as tensors are defined as cpu tensors by default). For the latter you just need to allocate the model and inputs in cuda:

model = model.cuda()
for batch in dataloader:
   output = model(batch.cuda())

Hi, thanks for the reply!

I try to do that and the same error still occurs.
“expected device cpu but got device cuda:0”.

I think the problem is that when they define some tensors in the main model or subinstances, they do not allocate the device. I wonder whether there is a way to move all those tensors to cuda.
I once met the same problem and what I did that time is to allocate all tensors defined in the model and subinstances to the same device as the input. However, this time the model is larger and I may not want to modfiy their code.(I am tryin to directly import their model this time)

I read some posts and think maybe the problem is that those tensors are not defined as parameters or buffers. For example, maybe they are just defined to do some computation in forward function. Does that make sense to you? And is there a way to solve this problem?

The best way I can find is to use register_buffer. However, it seems that it has to be used when you defined the model.

hmmm in theory when you register a tensor inside a nn.Module it does internally calls register_buffer.

if you use a debugger (for example in pycharm, spyder or ipdb) to stop the execution in the exception. You will be able to see which tensor is at cpu.

BTW are you sure it’s not about the input tensors?
That error can be thrown whether because the model is allocated at cuda but the input is not or the other way around.
Without seeing the code it’s difficult to provide more clue.

The only exception in which calling model.cuda() would map tensors to gpu is if any parameter isn’t properly registered as a nn.parameter/buffer.

This happens, for example, if you create an ordinary list/tuple/dictionary of tensors/layers. The recursive allocator only explores nn.Modules.

That is why pytorch provide their own list-like object and dict-like object.

Hi, I try to debug it and find where the problem occurs. I copy a small part of the code to explain the situation. In the function _permute, they define a tensor using

logabsdet = torch.zeros(batch_size)

After I modify the code to

logabsdet = torch.zeros(batch_size, device = inputs.device)

The code works. Is there anyway to solve this problem without modifying the model since in real case I am using a much larger model and it may not be very convenient to modify all of them in this way. :pensive:

class Permutation(Transform):
    """Permutes inputs on a given dimension using a given permutation."""

    def __init__(self, permutation, dim=1):
        if permutation.ndimension() != 1:
            raise ValueError("Permutation must be a 1D tensor.")
        if not is_positive_int(dim):
            raise ValueError("dim must be a positive integer.")

        self._dim = dim
        self.register_buffer("_permutation", permutation)

    def _inverse_permutation(self):
        return torch.argsort(self._permutation)

    def _permute(inputs, permutation, dim):
        if dim >= inputs.ndimension():
            raise ValueError("No dimension {} in inputs.".format(dim))
        if inputs.shape[dim] != len(permutation):
            raise ValueError(
                "Dimension {} in inputs must be of size {}.".format(
                    dim, len(permutation)
        batch_size = inputs.shape[0]
        outputs = torch.index_select(inputs, dim, permutation)
        logabsdet = torch.zeros(batch_size)
        return outputs, logabsdet

    def forward(self, inputs, context=None):
        return self._permute(inputs, self._permutation, self._dim)

    def inverse(self, inputs, context=None):
        return self._permute(inputs, self._inverse_permutation, self._dim)

Hmm… I do find one way to do this by setting the default tensor type to cuda. But this does not seem like a decent way…

Hmm I’m afraid there is not.

Once again I doubt that if the code is properly done it can fall in issues like that. I imagine that original authors also used a gpu. Therefore it should be somehow adapted to a gpu allocation.

Anyway if you plan to use that code, reformating to be adapted to cpu/gpu/multi-gpu is not a loss of time.

In the end, you are creating a new tensor there (which is the only case in which you have to choose allocation device)

1 Like

Hi, I also once assumed they should be adapted to a gpu allocation. Maybe it’s because I miss something. Anyway, the code is working now. In future, modifying their code myself or asking the author directly might be a good idea.
Thanks for your help!! :smile: