The code in this gist does some indexing and simple arithmetic with the exact same inputs 20 000 times.
It is supposed to find the index of the first and last elements in groups of consecutive elements.
When run on a cpu it always returns the same correct result.
However when run on cuda it returns a wrong result 4-20 times out of 20 000 iterations.
I paired the code down as much as I could, however here are a few points:
- The error is data depended, if I delete any more data than I already have the error stops occurring
- The cumsum() in line 27 seems to be part of the problem. Commenting it out stops the error from occurring. It should not affect the result in any way since its result is not used.
- By keeping a copy of the correct result of isfirst and isLast in cFi and cLa I can see that the error is in isLast
- By subtracting the correct from the incorrect result in lines 39++ I can see that there is a single mismatch in position 767
- the results where obtained with pytorch 0.4 and a pascal gpu with cuda 9.0
Could this be related to me moving elements to overlapping regions of the tensor (lines 20 and 31)?
Any other ideas?
A typical result I see is below.
The first row shows that the CPU version correctly 20000/20000 times.
The last row shows that the GPU version gave incorrect results 6 out of 20000 times
cpu (20000, 0)
countOnes isFirst=1127 isLast=1128
No mismatches in isFirst tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’)
pos of mismatch in isLast tensor(767, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(1, device=‘cuda:0’)countOnes isFirst=1127 isLast=1128
No mismatches in isFirst tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’)
pos of mismatch in isLast tensor(767, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(1, device=‘cuda:0’)countOnes isFirst=1127 isLast=1128
No mismatches in isFirst tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’)
pos of mismatch in isLast tensor(767, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(1, device=‘cuda:0’)countOnes isFirst=1127 isLast=1128
No mismatches in isFirst tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’)
pos of mismatch in isLast tensor(767, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(1, device=‘cuda:0’)countOnes isFirst=1127 isLast=1128
No mismatches in isFirst tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’)
pos of mismatch in isLast tensor(767, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(1, device=‘cuda:0’)countOnes isFirst=1127 isLast=1128
No mismatches in isFirst tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(0, device=‘cuda:0’)
pos of mismatch in isLast tensor(767, device=‘cuda:0’) tensor(0, device=‘cuda:0’) tensor(1, device=‘cuda:0’)cuda (19994, 6)