Function torch.max() return indices inconsistency between cup and gpu

I encountered an inconsistent torch.max() behaviour when running it on cpu and gpu, which can be reproduced by:

import torch
x = torch.FloatTensor(2, 10, 10)
x[0, :, :] = 1
x[1, :, :] = 2
x[:, 3:7, 3:7] = 0
value, idx = torch.max(x, 0)
print(idx)

(0 ,.,.) =
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 0 0 0 0 1 1 1
1 1 1 0 0 0 0 1 1 1
1 1 1 0 0 0 0 1 1 1
1 1 1 0 0 0 0 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
[torch.LongTensor of size 1x10x10]

, and

value, idx = torch.max(x.cuda(), 0)
print(idx)

(0 ,.,.) =
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
[torch.cuda.LongTensor of size 1x10x10 (GPU 0)]

I supposed both the cpu and gpu output should be consistent?

this is an ambiguous case. In this case, both results are correct.
The CPU and GPU will return correct results but might not be consistent with each other when breaking ties).

Similar to max, you will see similar behavior when breaking ties in min, sort, topk, etc.

The reason it is hard to make CPU and GPU consistent is that if we need consistency then we will have to take a huge hit in GPU performance.

That makes sense. Thanks for the clarification.