The code snippet is shown as follows:
a = torch.randn(100, 20)
b = torch.pdist(a)
c = torch.pdist(a.cuda()).cpu()
print(torch.sum(torch.abs(b - c))) # tensor(0.0007)
The output difference is quite large between gpu and cpu computation. What’s the cause of it?