Custom CUDA operator only work well on cuda:0

shzygmyx · May 1, 2022, 2:34am

I’m learning how to write custom cuda operator by your tutorial at GitHub - pytorch/extension-cpp: C++ extensions in PyTorch

However, I found the operator only output correct results on “cuda:0”, and output wrong aresults(all_zeros tensor) in other devices like “cuda:1”

Is there anyway to make the cuda operator work well on
“cuda:1”?

eqy · May 1, 2022, 7:05am

That sounds surprising; could you share some more details about the setup e.g., are cuda:0 and cuda:1 identical devices? What happens if cuda:1 used as cuda:0 e.g., with CUDA_VISIBLE_DEVICES=1?

shzygmyx · May 1, 2022, 11:13am

Interestingly, two methods that set cuda device lead to different result:
method 1) : set CUDA_VISIBLE_DEVICES=1 before running code x=torch.LongTensor([1,2,3]) ...
method 2): do not set environment variable and run code x=x.to("cuda:1")

method 1 runs perfectly, while method 2 fails.

ptrblck · May 1, 2022, 5:02pm

It sound as if your custom extension might be missing the deviceGuard usage?

shzygmyx · May 2, 2022, 2:01pm

Thanks for your suggestion! Could you please provide more details about how to add deviceGuard? Since I did not found anything related to this method in official tutorial, I guess this could be very helpful to point that out in a new version of tutorial

ptrblck · May 2, 2022, 2:45pm

In your custom extension you would have to add:

#include <c10/cuda/CUDAGuard.h>

const at::cuda::OptionalCUDAGuard device_guard(device_of(local_tensor));
/* 
your code
*/

shzygmyx · May 2, 2022, 3:06pm

It works! Thank you so much!