Really low accuraccy of 0%

SaraMoosbauer · May 19, 2022, 8:29am

I’ve tried training a simple mnist classifier on maps, when training the exact same code on the cpu I get an accuracy of 98%, however on mps I get 0.00%
Anyone an Idea why that is?

albanD · May 19, 2022, 1:44pm

That is surprising for sure.
Do you have a small code sample to reproduce this issue by any chance?

pannous · May 20, 2022, 8:09am

0% is very strange since you should get at least 10% for random guesses.

pannous · May 20, 2022, 9:42am

I reproduced your issue here:

gist.github.com

https://gist.github.com/pannous/0ab32760e8b7750efe980aebbba2eeb6

mnist_trivial.py

#!/usr/local/bin/python3
import torch
import torch.nn as nn
import torchvision.datasets as dsets
import torchvision.transforms as transforms

device = 'mps'

# parameters
learning_rate = 0.01

This file has been truncated. show original

the bug seems to happen in:
accuracy = correct_prediction.float().mean()

since the data suggests a mean different from 0.0
correct_predictions tensor([ True, True, True, ..., True, False, True], device='mps:0')

pannous · May 20, 2022, 9:45am

Related to the other .float() bug which @albanD identified yesterday and created a ticket for:

github.com/pytorch/pytorch

Conversion from int to float dtype is not working on MPS device

opened 02:08PM - 19 May 22 UTC

albanD

high priority triage review triaged module: mps

Simple repro: ```python import torch print(torch.tensor([[0, 0], [0, 1], …[1, 0], [1, 1]]).type(torch.float32)) print(torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], device="mps").type(torch.float32)) ``` This returns: ``` tensor([[0., 0.], [0., 1.], [1., 0.], [1., 1.]]) tensor([[0., 0.], [0., 0.], [0., 0.], [0., 0.]], device='mps:0') ``` cc @ezyang @gchanan @zou3519

In the meantime it can be solved by using
accuracy = float(correct_prediction.sum())/len(correct_prediction)

SaraMoosbauer · May 20, 2022, 10:00am

Thank you so much, I also found that the bug is happening in the evaluate function as the trainings loss is going down

albanD · May 20, 2022, 2:18pm

We sent a fix for that. The nightly build for tomorrow should have a fix for that.

KMT62 · September 12, 2023, 10:42pm

A similar issue is found when executing the sample code here: Quickstart — PyTorch Tutorials 2.0.1+cu117 documentation

Specifically in function test(), line:

correct += (pred.argmax(1) == y).type(torch.float).sum().item()

When device = ‘mps’ it always results in 10% accuracy. when device = ‘cpu’, the accuracy is as expected.