Custom operator segfault with CUDA 10.2 and PyTorch 1.5

I have just installed up CUDA10.2 and PyTorch using pip. I am ubuntu 18.04 and using Python 3.6.9.

I have built a series of custom operators using these instructions: https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html#building-the-custom-operator

The only difference I have made is to the CMake file is changing the line:

target_compile_features(dilate PRIVATE cxx_range_for)

To:

target_compile_features(dilate PRIVATE cxx_std_14)

As the latest version of PyTorch uses the std library for c++14 so that I had to make this change to get it to compile.

The custom operators compile but they segfault when I try to use them in the code I am running. It segfaults with both the matching libtorch library downloaded from the PyTorch website (https://download.pytorch.org/libtorch/cu102/libtorch-shared-with-deps-1.5.0.zip) and it also segfaults when I compile using the libtorch libraries in the PyTorch package installation folder: Segmentation fault when loading custom operator

I was also having this issue when using CUDA 10.1 on this installation of ubuntu so I am not sure what the problem is. I had this working with CUDA 10.1 on an old installation of Ubuntu (with PyTorch 10.4) but unfortunately I have lost access to this install and I have not been able to recreate the environment. I also have been unable to get this to work using nvidia-docker.

If anyone has any suggestions of the best way to set up a new Ubuntu environment to get the custom operators to work that would be much appreciated also.

Ah I finally figured out the issue. It had nothing to do with the version of CUDA or Ubuntu. I was getting a segfault because I was massing in a cuda tensor and then try and access the memory with a CPU OpenCV Mat. Converting the tensor to CPU before passing it to opencv fixed the issue.