MPS is not working even with nightly conda and pip3 versions

Hi all,

I am trying to initialise yolov5 for mps/gpu, has anyone solve the Mac-BOOK-PRO’s MPS/GPU issues?
I am still struggling with this and I am sure the MPS/GPU is initiated as per

**Confirmation of MPS installed:**
import torch
x = torch.rand(5, 3)
print(x)

**Confirmation of MPS installed:**
x = torch.rand(5, 3)
print(x)
tensor([[0.6720, 0.5475, 0.5518],
[0.3920, 0.6301, 0.0521],
[0.6370, 0.7802, 0.6066],
[0.6236, 0.6443, 0.9573],
[0.6718, 0.1459, 0.3276]])

**Confirmation of MPS installed:**
> > > import torch
> > > import torchvision
> > > torch.backends.mps.is_available()
> > > True

**Another Confirmation of MPS installed:**
Python Platform: macOS-12.5.1-arm64-i386-64bit
Tensor Flow Version: 2.9.2
Keras Version: 2.9.0

Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:00:33)
[Clang 13.0.1 ]
Pandas 1.4.3
Scikit-Learn 1.1.2
GPU is available

1. WHAT I AM RUNNING?:
I am running a:
Monterey 12.5.1 (21G83) 2021 MBP M1 32 cores, 64gb

2. WHAT I DID AS: Per The Instruction For Initialisation?:
I ran the initialisation of the requirements files
pip3 install -r requirements.txt

Then initialise the command script:

python3 train.py --weights yolov5x.pt --data coco128.yaml --img 640 --batch 16 --epochs 3 --device mps 

3. THIS WORKS WITH-OUT THE Mac-book pro’s --device mps/gpu!!!
but what’s the point of this New Mac computer if the MPS cannot be initiated??!?!?!

4. I ALSO TRIED:

PYTORCH_ENABLE_MPS_FALLBACK=1 python3 train.py --weights yolov5x.pt --data coco128.yaml --img 640 --batch 16 --epochs 3 --device mps.

BUTTTT… This just surpasses the error and returns to CPU as the configurations are not seeing the MPS/GPU onboard devices, based on research, the blogs and responses from the author @glenn-jocher.

I researched most of the blog’s workarounds but as of now those just lead to more errors and more frustration.

5. Can someone assist by guiding me to an example in deployment that actually works for MAC MPS/GPU, please?

I am not sure what’s my way forward at this stage as I’ve WIPED MY MAC, deleted environments and packages and restarted the entire process on numerous occasions. Unfortunately, nothing seems to work to speed up this processing??!?!??!?!?!?!

6. WHY I AM TRYING TO DO THIS?
Simply, Speed up the Training and Inference Operations as it’s a pain running this on CPU.

7.The last training two weeks ago on yolov5m.pt 30 epochs, batch 32 procedure took 6days for the model to complete*its processing!!! and I would like to experiment on others but this is a huge drag.

Please Assist I am Welcoming Any Input: to just get pass this challenge, please…
And thanx in advance for your efforts and acknowledging my digital presence.

PACKAGES Installed:

Package Version
---
absl-py 1.2.0
appnope 0.1.3
asttokens 2.0.8
backcall 0.2.0
beautifulsoup4 4.11.1
cachetools 5.2.0
certifi 2022.6.15
charset-normalizer 2.1.1
coremltools 5.2.0
cycler 0.11.0
decorator 5.1.1
distro 1.7.0
executing 0.10.0
filelock 3.8.0
fonttools 4.37.1
gdown 4.5.1
google-auth 2.11.0
google-auth-oauthlib 0.4.6
grpcio 1.47.0
idna 3.3
ipython 8.4.0
jarowinkler 1.2.1
jedi 0.18.1
Jinja2 3.1.2
kiwisolver 1.4.4
Markdown 3.4.1
MarkupSafe 2.1.1
matplotlib 3.5.3
matplotlib-inline 0.1.6
mlhub 3.11.2
model-maker 0.0.5
mpmath 1.2.1
numpy 1.23.2
oauthlib 3.2.0
opencv-python 4.6.0.66
packaging 21.3
pandas 1.4.3
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.2.0
pip 22.2.2
prompt-toolkit 3.0.30
protobuf 3.19.4
psutil 5.9.1
ptyprocess 0.7.0
pure-eval 0.2.2
pyasn1 0.4.8
pyasn1-modules 0.2.8
Pygments 2.13.0
pyparsing 3.0.9
PySocks 1.7.1
python-dateutil 2.8.2
pytz 2022.2.1
PyYAML 6.0
rapidfuzz 2.6.0
requests 2.28.1
requests-oauthlib 1.3.1
rsa 4.9
scipy 1.9.0
seaborn 0.11.2
setuptools 65.3.0
six 1.16.0
soupsieve 2.3.2.post1
stack-data 0.4.0
sympy 1.11
tensorboard 2.10.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
thop 0.1.1.post2207130030
torch 1.13.0.dev20220826
torchaudio 0.13.0.dev20220826
torchvision 0.14.0.dev20220826
tqdm 4.64.0
traitlets 5.3.0
typing_extensions 4.3.0
urllib3 1.26.12
wcwidth 0.2.5
Werkzeug 2.2.2
wheel 0.37.1
yamlordereddictloader 0.4.0

ERRORS FOR: python3 train.py --device mps

AMP: checks failed ❌, disabling Automatic Mixed Precision. See https://github.com/ultralytics/yolov5/issues/7908

NotImplementedError: The operator 'aten::remainder.Tensor_out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable`PYTORCH_ENABLE_MPS_FALLBACK=1`to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.