Understanding training

ptrblck · November 4, 2021, 4:24am

Yes, you are exactly right. torch.max would return the values with their corresponding indices, while torch.argmax only returns the indices which correspond to the max. values.
As you can see in the tutorial, the values are not used _, preds = torch.max(outputs, 1) to it would be equal to preds = torch.argmax(outputs, 1).

That’s also a great question! You are generally right and should be careful about detaching the tensors. However, since the returned indices from torch.argmax are not differentiable, the computation graph won’t be stored. If you are using torch.max, note that the values will be attached to the computation graph while the indices also won’t be attached to it. You can check it via:

output = torch.randn(10, 10, requires_grad=True)

val, idx = torch.max(output, dim=1)

# a valid grad_fn shows that this tensor is attached to the computation graph
print(val.grad_fn)
> <MaxBackward0 object at 0x7fe084afbbb0> 

print(idx.grad_fn)
> None

idx = torch.argmax(output, dim=1)
print(idx.grad_fn)
> None