I have several questions on this topic. I feel like answers should be in the doc but I could not find them, both in libtorch and pytorch, if I missed them please point me towards the right place.
1- How can I create a Module on GPU directly ? I know that you can transfer it after creation on CPU, but I would like to avoid this overhead. doc
2- Does tensor A = B.clone() create A on GPU if B was on GPU ?
3- How do I cast a c10::device into a CUDA device ?
4- How do I copy the content of tensor A into the storage pointed to by tensor B (created with “from_blob”, or “normally”, with eye for instance)? A and B have the same sizes. Is std::copy the best way ?
2- Thank you, I expect it to work the same for libtorch then. If someone could confirm …
3- Cant find anything on that. =(
4-
Not exactly clear on what you mean here. “storage pointed to by tensor B”.
I dont think one can access raw storage in python without external modules (?). In libtorch (C++), the underlying storage of a tensor is accessed with something like .data_ptr<float>(). I want to make sure I am never reallocating a when copying b, using tensor a = b.clone(), given a and b have the same sizes. And especially when a and b are on GPU.