Transformer architecture in PyTorch using Mac M1 GPU via MPS

To anyone wondering, apparently, there are problems with GRU on mps. See this link

1 Like