There's no way to put, and keep a model on the CUDA device

Alio · October 20, 2022, 1:12pm

I cannot find a way put, and keep the model on the CUDA device. I cannot send a tensor that is already on CUDA through a model without getting the “found at least two devices, cpu and cuda” error. All I can do is put a tensor, and a model that are both only on the CPU onto the CUDA device, but I cannot be putting tensors that are already on CUDA onto the CPU only to put them back on the CUDA device, there is no point in even using CUDA at that point, right?

The full, reproducible example is below, but the lines in question are quite simple as show below…
I have a tensor that is on CUDA I want to send it though a model. This causes an error

auto the_tensor = torch::rand({42, 427}).to(device);
std::cout << net.forward(the_tensor).to(device);  

terminate called after throwing an instance of 'c10::Error'
what():  Expected all tensors to be on the same device, but found at least two devices, cpu and 
cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)

If I do NOT put the tensor on CUDA I can run the tensor and the model both on CUDA like so

auto the_tensor = torch::rand({42, 427});
std::cout << net.forward(the_tensor).to(device);

I can also send the tensor back to the CPU and this also does NOT create an error. But, I have a large script with a lot of tensors that will already be on the CUDA device I DO NOT want to be sending the tensors from CUDA back to the CPU and then back to the CUDA device. This is why I call it a bug. How do I put the model on the CUDA device and keep it there other then putting .to(device) on the end of the model only when it is being called with forward net.forward(tensor)

auto the_tensor = torch::rand({42, 427}).to(device);
std::cout << net.forward(the_tensor.to(torch::kCPU)).to(device);

I have tried permanently putting the model on the device but nothing I try works.

net.to(device);
net->to(device);
Critic_Net().to(device);

I’ve tried many variations like these above to put the model on the CUDA device and keep it on the CUDA device but nothing works but to put the model on the CUDA device with net.forward(the_tensor).to(device);

The full, reproducible example.

#include <torch/torch.h>
using namespace torch::indexing;

torch::Device device(torch::kCUDA);

struct Critic_Net : torch::nn::Module {
    torch::Tensor next_state_batch__sampled_action;
    public:
    Critic_Net() {
        lin1 = torch::nn::Linear(427, 42);
        lin2 = torch::nn::Linear(42, 286);
        lin3 = torch::nn::Linear(286, 1);
    }
    torch::Tensor forward(torch::Tensor next_state_batch__sampled_action) {
        auto h = next_state_batch__sampled_action;
        h = torch::relu(lin1->forward(h));
        h = torch::tanh(lin2->forward(h));
        h = lin3->forward(h);
        return torch::nan_to_num(h);
    }
    torch::nn::Linear lin1{nullptr}, lin2{nullptr}, lin3{nullptr};
};

auto net = Critic_Net();


int main() {
    net.to(device);
    auto the_tensor = torch::rand({42, 427}).to(device);
    
    std::cout << net.forward(the_tensor).to(device);
}

Versions

I’m on Ubuntu 22.04
My CUDA version is 11.7
Using Libtorch 1.12.1+cu116

Matej_Kompanek · October 20, 2022, 2:44pm

I think you forgot to call register_module in the constructor


Critic_Net() 
{
   lin1 = register_module("lin1", torch::nn::Linear(427, 42));
   lin2 = register_module("lin2", torch::nn::Linear(42, 286));
   lin3 = register_module("lin3", torch::nn::Linear(286, 1));
}