Force libtorch to use CUDA context

Nacho_Valladares · December 20, 2019, 8:48am

I’m trying to integrate libtoch to load a model into my application. My application does many CUDA stauff before I load the model with libtorch. Thus, the CUDA context has already been created.

For some reason, even the CUDA context has already been created and the calling thread has already a valid context, when I call

torch::jit::script::Module module = torch::jit::load(“test.pt”);
module.to(at::kCUDA);

a new context is created by libtorch. The new context is not event push to the stack of context, is overwritting the current context. I know because if after the module.to(at::kCUDA) I call cuCtxPopCurrent the current context is null.

This causes a lot of problems because I cannot interact with current allocated memory I have in my context.

Is there anyway I can initialize somehow the libtorch CUDA context to use mine?

Thanks

albanD · December 20, 2019, 9:45am

Is there a recommended behavior here @ngimel ?

Nacho_Valladares · December 20, 2019, 10:43am

According to CUDA programming guide best practices:

A CUDA context is analogous to a CPU process. All resources and actions performed within the driver API are encapsulated inside a CUDA context, and the system automatically cleans up these resources when the context is destroyed. Besides objects such as modules and texture or surface references, each context has its own distinct address space. As a result, CUdeviceptr values from different contexts reference different memory locations.

And

While multiple contexts (and their associated resources such as global memory allocations) can be allocated concurrently on a given GPU, only one of these contexts can execute work at any given moment on that GPU; contexts sharing the same GPU are time-sliced. Creating additional contexts incurs memory overhead for per-context data and time overhead for context switching. Furthermore, the need for context switching can reduce utilization when work from several contexts could otherwise execute concurrently (see also Concurrent Kernel Execution). Therefore, it is best to avoid multiple contexts per GPU within the same CUDA application.

albanD · December 20, 2019, 5:14pm

Hi,

It might be the jit that does not handle the context as it should. Could you please file a github issue and put the jit label on it? Thanks !

Nacho_Valladares · December 23, 2019, 9:24am

Done:

First time entering a libTorch git issue, please review if I did correctly. I will provide some code example in short.

Cheers

albanD · December 23, 2019, 10:00am

Thanks for taking the time to file the issue. Could you link it here in case someone else stumble upon this same issue and wants to track the status of the issue.

Nacho_Valladares · December 23, 2019, 10:18am

Opened issue