Share a solution for cudaCheckError() failed : invalid device function

chengjuzhou · January 8, 2020, 6:26am

I test the code from https://github.com/McDo/PSROIAlign-Multi-Batch-PyTorch in example.py,
and find it always end up with cudaCheckError() failed : invalid device function. The error info comes from line 124 of PSROIPool_cuda.cu. After several tries, I find the reason why it comes with such error is that I have added a new different GPU before creating new Ananconda env for pytorch. The details are follows:

two 1080Ti GPU on my workstation, creating Anaconda env test_1
using a Titan V GPU to replace one 1080Ti GPU. Now, one 1080Ti and one Titan V on my workstation. Create Anaconda env test_2
Encounter cudaCheckError() failed : invalid device function when using Anaconda env test_2. No error when using Anaconda env test_1

Guess: The pytorch would do some specific thing related with GPU when installing, and may have something wrong when there have two different GPU cards on workstation.

albanD · January 8, 2020, 3:17pm

Thanks for sharing.

Yes different GPU require different low level code. This makes install time and binary very large. So if we know in advance what the code is going to run on, we only compile for this GPU. You will need to update if you add more GPU indeed.

0e5ec01b9c14c98dcd37 · January 9, 2020, 10:51am

The code is based on the older version of ROIAlign by facebook. You may need to change

const int threads = 1024;

for your GPU spec and recompile it.