Questions on GPU/ CPU tensor transfer

Using libtorch gpu and CUDA, windows 10.

I have several questions on this topic. I feel like answers should be in the doc but I could not find them, both in libtorch and pytorch, if I missed them please point me towards the right place.

1- How can I create a Module on GPU directly ? I know that you can transfer it after creation on CPU, but I would like to avoid this overhead. doc

2- Does tensor A = B.clone() create A on GPU if B was on GPU ?

3- How do I cast a c10::device into a CUDA device ?

4- How do I copy the content of tensor A into the storage pointed to by tensor B (created with “from_blob”, or “normally”, with eye for instance)? A and B have the same sizes. Is std::copy the best way ?

Will answer some of your questions.

  1. Every module allows you to pass a device argument. For example:
model = nn.Linear(10, 5, device = 'cuda:0')
  1. Yes, the clone will happen on whichever device the current tensor is located.

  2. No idea.

  3. Not exactly clear on what you mean here. “storage pointed to by tensor B”. You can obtain the device of any tensor via tensor.device. For example:

import torch

A = torch.rand((2,2), device = 'cuda:0')
B = torch.tensor(A.clone().detach(), device = A.device)
print(A)
print(B)

Pytorch also has some useful functions when you wish to make a tensor of the same size as A all in ones or zeros with the same dtype and device.

A = torch.rand((2,2), device = 'cuda:0')
B = torch.zeros_like(A)
C = torch.ones_like(A)
D = torch.rand_like(A)

Thank you for your reply.

I am using libtorch, not pytorch. So c++, not python. Even though they share the same backend (and documentation ?!).

1- There is the relevant documentation for the c++ equivalent of nn.linear : Class LinearImpl — PyTorch main documentation There is the doc of the construction parameters struct : Struct LinearOptions — PyTorch main documentation Unless I miss the obvious, there is no way to do it as python does, i.e. the device = 'cuda:0' argument you suggested.

2- Thank you, I expect it to work the same for libtorch then. If someone could confirm …

3- Cant find anything on that. =(

4-

  1. Not exactly clear on what you mean here. “storage pointed to by tensor B”.

I dont think one can access raw storage in python without external modules (?). In libtorch (C++), the underlying storage of a tensor is accessed with something like .data_ptr<float>(). I want to make sure I am never reallocating a when copying b, using tensor a = b.clone(), given a and b have the same sizes. And especially when a and b are on GPU.