'MPS' Resource Exhaustion Issue Generating -Inf's and Nan's?

Running the Transformer Model NanoGPT with ‘MPS’ on MacOS 13.4 with an AMD Radeon Pro 5700 XT,
I started getting -Inf and NaNs after several thousand training iterations.
When I switched the backend from ‘MPS’ to ‘CPU’, there were no -Inf’s or -NaNs.
When I decreased the block_size and the batch_size, the problem stopped.
With the original larger block_size and batch_size, but, saving the tensor in the forward method it stopped the issue (change in timing?)

The -Infs first appeared in the

LayerNorm: F.layer_norm(input, self.weight.shape, self.weight, self.bias, 1e-5)

Does this sound like an ‘MPS’ resource issue?
Is there a way to display the resource usage in the Forward method?