Cuda extension builds but won't import (or run, depending on the situation)

I posted about this in the relevant GitHub repo (this post is largely copied from https://github.com/erikwijmans/Pointnet2_PyTorch/issues/93), was hoping I could get some help here:

I can build the extension with python setup.py build_ext --inplace:

running build_ext
building 'pointnet2._ext' extension
creating build/lib.linux-x86_64-3.7/pointnet2
g++ -pthread -shared -Wl,-z,relro -g -L/usr/local/cuda/lib64 -L/usr/lib64 -lcudart -lpython3.7m -o build/lib.linux-x86_64-3.7/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.7/pointnet2/_ext.cpython-37m-x86_64-linux-gnu.so -> pointnet2

But when I try to import it, I run into problems:

python -c "import pointnet2._ext"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: dynamic module does not define module export function (PyInit__ext)

Cuda version: 10.2
Nvidia driver version: 440.33.01
GPU: Nvidia Tesla T4 (compute capability 7.5)
Pytorch version: 1.4
OS: Amazon Linux 2

I’m able to get it working on a similar setup, but using Ubuntu and a GTX 1080. It just started failing as I’ve tried to get it working on EC2. Things I’ve tried:

  • building on the EC2 instance
  • building a working docker image on one machine (does not work on the EC2 instance)
  • ensuring any python2 interpreters are unreachable
  • renaming the cuda extension to _ext, and matching the name of the package

If I try to run the code from a docker image that works on my local machine, I think it imports successfully, but I get this:

CUDA kernel failed : no kernel image is available for execution on the device                  
void furthest_point_sampling_kernel_wrapper(int, int, int, const float*, float*, int*) at L:228
 in pointnet2/_ext-src/src/sampling_gpu.cu                                                     

Any ideas?

I’m not sure how to solve the first error.
However, the “no kernel image” error is usually raised, if you’ve build your extension for a specific compute architecture (e.g. 7.0 for Volta), while you are now trying to run it on another one.
You could use TORCH_CUDA_ARCH_LIST="6.0 7.0" python setu.py build_ext to use the specified compute capabilities.

Did you have any luck resolving this issue? I have exactly the same problem the model runs fine on my docker image on local. But if I try to run in google compute VM with a Tesla P4 I get

CUDA kernel failed : no kernel image is available for execution on the device

I tried specifying compute capabilities with

TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0" python setup.py install

But still get the same error