How can i check the model device (libtorch)

Seungsu · September 15, 2022, 8:38am

Hi.

I have two questions about libtorch.

First, i want to know my module’s location in libtorch like cpu or cuda:0.

Is there any api in libtorch?

torch::jit::module model = torch::jit::load(model_path);

model.??? <- ??? is the api that prints the model location (device)

Second, Can i directly copy the module from gpu:0 to gpu:1, using NVLinks?

torch::Device device = torch::Device(torch::kCUDA, 0);
torch::jit::module model_a = torch::jit::load(model_path);
model_a.to(device)

torch::jit::module model_b = ??? <- ??? is the api that copies the model directly using NVLinks. do not across the host dram.

Thanks for reading.

ptrblck · September 15, 2022, 6:38pm

A model itself does not belong to any specific device as its parameters and buffers can be located on different devices. Assuming you’ve moved the parameters and buffers to a single device, you could grab e.g. the first parameter and check its device attribute instead.

NVLink will be automatically used when detected and you won’t be able to enable or disable it from your SW stack.

Seungsu · September 16, 2022, 8:32am

Thanks.

Can i get an answer about one more question?

If i want to copy my module (actually create independent Module), what api should i call?

I found apis in torch/jit/api/module.h, that are clone and deepcopy.

What is difference between them?

Actually, i want to copy a module from cuda:0 to cuda:1.

ptrblck · September 16, 2022, 4:55pm

You should be able to clone the module to create independent parameters and push the cloned one to the other device afterwards:

auto module2 = module->clone();