How to get index of the last max using argmax()

Hi, I know that argmax can work differently with cpu and gpu as follows,

import torch
a=torch.tensor([[1,1,1,1,0],[1,1,0,0,0]])
ac=torch.tensor([[1,1,1,1,0],[1,1,0,0,0]]).cuda()
a.argmax(dim=1)
=>tensor([3, 1])
ac.argmax(dim=1)
=>tensor([0, 0], device=‘cuda:0’)

But how to get the index of the last max with GPU? I need to get [3,1] instead of [0,0] as in the above example.