Torch.linspace() precision issue in CUDA

I just tracked down an issue in my project that seems like a bug. I’m shuffling a (large) tensor by making an index tensor with torch.linspace(), randomizing it with randperm(), then using the shuffled indices to index_select() the original tensor. The index tensor argument has to be torch.long, so I make it by setting the linspace dtype to long in the call.

On the CPU this seems to work fine, but I’m seeing problems when I have it done on device=“cuda:0”. It looks like it’s being computed as float32, then cast to long, which for large indices gives bad values.

>>> torch.linspace(12713984, 16908287, 4194304, dtype=torch.long, device='cpu')[-4:]
tensor([16908284, 16908285, 16908286, 16908287])
>>> torch.linspace(12713984, 16908287, 4194304, dtype=torch.long, device='cuda:0')[-4:]
tensor([16908284, 16908286, 16908288, 16908288], device='cuda:0')

This is happening in PyTorch 1.6.0, with Python 3.8.5. It’s on Windows (Geforce driver 456.38) and Linux (450.66, and the 430 version I used before that). I’ve been playing with this project for months and haven’t seen this until recently. It might be due to a change in 1.6.0, which I only updated to in the last few weeks.

I’m just going to change it to call linspace with dtype=torch.float64, then cast to long, which seems to work fine. But I thought I should post something about it.

Don’t use linspace here but use arange. linspace uaes floating point under the hood more and that isn’t a good match with largeish integers.



1 Like