Multi-GPU with custom CUDA layer works with 1080s but not Titan GPUs

I wrote a custom layer backend in CUDA and have been successfully training it with DataParallel on a machine with 4 x GeForce GTX 1080 Ti GPUs. However, I switched over to a machine with a 1 x Titan X (Pascal) and 2 x GeForce GTX Titan X GPUs and my custom layer no longer works. The standard PyTorch library layers work just fine on both though.

The CUDA error code I’m getting is 48 which is not listed on the NVIDIA docs… However, a Google search turns up this GitHub issue, which doesn’t seem relevant because Titan X GPUs are still supported by PyTorch, and this forum question, which also doesn’t seem relevant for the same reason.

I am using a conda environment so I know that both Python environments are identical. Both machines are running CUDA 9.0 as well. It seems the only real difference is the GPU types.

It’s not a GPU memory issue. A batch size of 3 fits easily on a single GPU, so spreading it across 3 GPUs certainly shouldn’t be a problem.

Is there something I’m missing in my CUDA code? There’s too much to post here, but I’m not using any libraries other than cuBLAS and PyTorch’s ATen. In all cases, I’m using PyTorch 1.0.0.

It turns out that the compute capability required for atomicAdd() with doubles is >60. The Titans are all 51. That was the issue.