Hi
I am training a network on M1 mackbook pro. I installed the compatible torch version for mps device.
I have a 3 channel image with shapes from 2000 x 2000 and above. At first i resized them to 200x200, the model trained fine. As soon as i changed the shapes to 500x500 the model training keeps giving out the Error: command buffer exited with error status.
error at each epoch.
What may be the cause of this error ?
2022-08-21 13:27:45.597158 Epoch 1 Training Loss 44.01341743328992
Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Caused GPU Hang Error (00000003:kIOGPUCommandBufferCallbackErrorHang)
<AGXG13XFamilyCommandBuffer: 0x175ba5a50>
label = <none>
device = <AGXG13XDevice: 0x144fd3e00>
name = Apple M1 Pro
commandQueue = <AGXG13XFamilyCommandQueue: 0x144fd5600>
label = <none>
device = <AGXG13XDevice: 0x144fd3e00>
name = Apple M1 Pro
retainedReferences = 1```