Error on CUDA Programming

shim94kr · June 5, 2022, 10:57am

Hello. I’m modifying a CUDA code of deformable convolution to work with circular padding, and the main change is to introduce modulo operation on the translated coordinates by offsets. I tried two versions that do the same thing, but the problem is that one raises an error, but the other works well. I want to ask you the difference or the reason for the error as I think they are the same in functions.

The base code I used is GitHub - CHONSPQX/modulated-deform-conv: deformable convolution 2D 3D DeformableConvolution DeformConv Modulated Pytorch CUDA. You can test the code by simply replacing the trilinear interpolation part in src/deformable_conv3d.cu.

The central part that I changed is the lines 25-27, which involve modulo operation.
Line 27 does not raise an error, and I think this covers the discrepancy between version 1 and version 2.

7 template
8 device scalar_t deform_conv3d_im2col_trilinear(
9 const scalar_t *bottom_data, const int data_width,const int data_length,
10 const int height, const int width, const int length,scalar_t h, scalar_t w,scalar_t l)
11 {
12
13 int h_low = floor(h);
14 int w_low = floor(w);
15 int l_low = floor(l);
17 int w_high = w_low + 1;
18 int l_high = l_low + 1;
19
20 scalar_t lh = h - h_low;//dh
21 scalar_t lw = w - w_low;//dw
22 scalar_t ll = l - l_low;//dl
23 scalar_t hh = 1 - lh, hw = 1 - lw, hl = 1 - ll; //1-dh 1-dw 1-dl
24
25 h_low = ((h_low % height) + height) % height;
26 h_high = ((h_high % height) + height) % height;
27 assert(h_low >= 0 && h_high <= height - 1);

The code of 28-49 lines is the version1 that raises an error of CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
28 /*
29 scalar_t v1 = 0;
30 scalar_t v5 = 0;
31 if (w_low >= 0 && l_low >= 0)
32 v1 = bottom_data[h_low * data_widthdata_length + w_lowdata_length+ l_low];
33 v5 = bottom_data[h_high * data_widthdata_length + w_lowdata_length+ l_low];
34 scalar_t v2 = 0;
35 scalar_t v6 = 0;
36 if (w_low >=0 && l_high<= length -1)
37 v2 = bottom_data[h_low * data_widthdata_length + w_lowdata_length+ l_high];
38 v6 = bottom_data[h_high * data_widthdata_length + w_lowdata_length+ l_high];
39 scalar_t v3 = 0;
40 scalar_t v7 = 0;
41 if (w_high <= width - 1 && l_low >= 0)
42 v3 = bottom_data[h_low * data_widthdata_length + w_highdata_length+ l_low];
43 v7 = bottom_data[h_high * data_widthdata_length + w_highdata_length+ l_low];
44 scalar_t v4 = 0;
45 scalar_t v8 = 0;
46 if (w_high <= width - 1 && l_high<= length -1)
47 v4 = bottom_data[h_low * data_widthdata_length + w_highdata_length+ l_high];
48 v8 = bottom_data[h_high * data_widthdata_length + w_highdata_length+ l_high];
49 */

The code of 50-76 lines is version2 which works without error.
50 /*
51 scalar_t v1 = 0;
52 if (h_low >= 0 && w_low >= 0 && l_low >= 0)
53 v1 = bottom_data[h_low * data_widthdata_length + w_lowdata_length+ l_low];
54 scalar_t v2 = 0;
55 if (h_low >= 0 && w_low >=0 && l_high<= length -1)
56 v2 = bottom_data[h_low * data_widthdata_length + w_lowdata_length+ l_high];
57 scalar_t v3 = 0;
58 if (h_low >= 0 && w_high <= width - 1 && l_low >= 0)
59 v3 = bottom_data[h_low * data_widthdata_length + w_highdata_length+ l_low];
60 scalar_t v4 = 0;
61 if (h_low >= 0 && w_high <= width - 1 && l_high<= length -1)
62 v4 = bottom_data[h_low * data_widthdata_length + w_highdata_length+ l_high];
63
64 scalar_t v5 = 0;
65 if (h_high <= height -1 && w_low >= 0 && l_low >= 0)
66 v5 = bottom_data[h_high * data_widthdata_length + w_lowdata_length+ l_low];
67 scalar_t v6 = 0;
68 if (h_high <= height -1 && w_low >= 0 && l_high<= length -1)
69 v6 = bottom_data[h_high * data_widthdata_length + w_lowdata_length+ l_high];
70 scalar_t v7 = 0;
71 if (h_high <= height -1 && w_high <= width - 1 && l_low >= 0)
72 v7 = bottom_data[h_high * data_widthdata_length + w_highdata_length+ l_low];
73 scalar_t v8 = 0;
74 if (h_high <= height -1 && w_high <= width - 1 && l_high<= length -1)
75 v8 = bottom_data[h_high * data_widthdata_length + w_highdata_length+ l_high];
76 */

The rest part performs the interpolation.
77 scalar_t w1 = hh * hw *hl, w2 = hh hw ll, w3 = hh * lwhl, w4 = hh * lw ll;
78 scalar_t w5 = lh * hw *hl, w6 = lh hw ll, w7 = lh * lwhl, w8 = lh * lw ll;
79
80 scalar_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4+w5 * v5 + w6 * v6 + w7 * v7 + w8 * v8);
81 return val;
82 }

Any comment or help on this would be appreciated!

ptrblck · June 5, 2022, 6:39pm

Assuming you want to debug a potential indexing error, run the code via compute-sanitizer and see if it would report invalid reads or writes. Additionally cuda-gdb (with the memcheck option set to on) might be useful as it could help narrowing down where the issue is.

shim94kr · June 5, 2022, 10:01pm

I couldn’t install the compute-sanitizer even though the manual said it is installed by default with nvcc.
I tried sudo apt-get install cuda-sanitizer-api-10-2 as I installed cuda-gdb with apt-get before.
Can I get a way to install it?

For running memcheck in cuda-gdb, I got a warp illegal address error, but I couldn’t identify where it comes from. It was hard to debug further as the cuda-gdb doesn’t step one-by-one through the cuda kernels as gdb does.

But, I found the error is raised when the offset is decimal (when the offset is an integer is okay).
Yet, it is weird because a target point’s result is aggregated through interpolating the results of multiple integer points, and there’s nothing to do with the decimal points.

Thank you for answering!

ptrblck · June 5, 2022, 11:21pm

If you are using an older CUDA toolkit you could use cuda-memcheck instead of compute-sanitizer.
In any case, cuda-gdb already points to te warp illegal address in the deform_conv3d_im2col... kernel so you would need to check the indexing logic there. Adding asserts might also be helpful in you have trouble isolating the issue.

shim94kr · June 6, 2022, 2:48am

I added the asserts to version1 and made it the same code as version2.
What I did was to change if statement to the assert one.

However, running with version 1 raises the same error, but version 2 doesn’t.
I think they are the same code, so further trouble-shooting might be turned out to be a waste of time.
Would you mind confirming this in that they are the same code?
Thank you for your time and efforts.

Version1

   scalar_t v1 = 0;
   if (w_low >= 0 && l_low >= 0)
     assert (h_low>=0);
     v1 = bottom_data[h_low * data_width*data_length + w_low*data_length+ l_low];
   scalar_t v2 = 0;
   if (w_low >=0 && l_high<= length -1)
     assert (h_low>=0);
     v2 = bottom_data[h_low * data_width*data_length + w_low*data_length+ l_high];
   scalar_t v3 = 0;
   if (w_high <= width - 1 && l_low  >= 0)
     assert (h_low>=0);
     v3 = bottom_data[h_low * data_width*data_length + w_high*data_length+ l_low];
   scalar_t v4 = 0;
   if (w_high <= width - 1 && l_high<= length -1)
     assert (h_low>=0);
     v4 = bottom_data[h_low * data_width*data_length + w_high*data_length+ l_high];
   scalar_t v5 = 0;
   if (w_low >= 0 && l_low >= 0)
     assert (h_high<=height - 1);
     v5 = bottom_data[h_high * data_width*data_length + w_low*data_length+ l_low];
   scalar_t v6 = 0;
   if (w_low >=0 && l_high<= length -1)
     assert (h_high<=height - 1);
     v6 = bottom_data[h_high * data_width*data_length + w_low*data_length+ l_high];
   scalar_t v7 = 0;
   if (w_high <= width - 1 && l_low  >= 0)
     assert (h_high<=height - 1);
     v7 = bottom_data[h_high * data_width*data_length + w_high*data_length+ l_low];
   scalar_t v8 = 0;
   if (w_high <= width - 1 && l_high<= length -1)
     assert (h_high<=height - 1);
     v8 = bottom_data[h_high * data_width*data_length + w_high*data_length+ l_high];

Version2

   scalar_t v1 = 0;
   if (h_low >= 0 && w_low >= 0 && l_low >= 0)
     v1 = bottom_data[h_low * data_width*data_length + w_low*data_length+ l_low];
   scalar_t v2 = 0;
   if (h_low >= 0 && w_low >=0 && l_high<= length -1)
     v2 = bottom_data[h_low * data_width*data_length + w_low*data_length+ l_high];
   scalar_t v3 = 0;
   if (h_low >= 0 && w_high <= width - 1 && l_low  >= 0)
     v3 = bottom_data[h_low * data_width*data_length + w_high*data_length+ l_low];
   scalar_t v4 = 0;
   if (h_low >= 0 && w_high <= width - 1 && l_high<= length -1)
     v4 = bottom_data[h_low * data_width*data_length + w_high*data_length+ l_high];

   scalar_t v5 = 0;
   if (h_high <= height -1 && w_low >= 0 && l_low >= 0)
     v5 = bottom_data[h_high * data_width*data_length + w_low*data_length+ l_low];
   scalar_t v6 = 0;
   if (h_high <= height -1 && w_low >= 0  && l_high<= length -1)
     v6 = bottom_data[h_high * data_width*data_length + w_low*data_length+ l_high];
   scalar_t v7 = 0;
   if (h_high <= height -1 && w_high <= width - 1 && l_low >= 0)
     v7 = bottom_data[h_high * data_width*data_length + w_high*data_length+ l_low];
   scalar_t v8 = 0;
   if (h_high <= height -1 && w_high <= width - 1 && l_high<= length -1)
     v8 = bottom_data[h_high * data_width*data_length + w_high*data_length+ l_high];

shim94kr · June 6, 2022, 3:54am

Just in a case, you can reproduce the problem of my case with following steps.

Step1: clone the following projects
git clone GitHub - shim94kr/modulated_deformable_conv

Step2: create virtual env & install pytorch
conda create -n reproduce python=3.8
conda activate reproduce
pip install -r requirements.txt

Step3: install deformable conv
python setup.py install

Step4: execute the test code
python my_test.py

The code is set to version1 and will raise the error.
If you switch version1 to version2 by commenting out version1 in src/deformable_conv3d.cu Line#28-59, the error will be disappeared.