Sharing CUDA tensors among multiple threads

By default, CUDA Runtime API uses the same context among multiple threads of the same process source. However, it seems like Pytorch creates a unique context per thread. How do I disable this s.t. I can use one context among all the threads of the same process and share the same CUDA global memory address?