Bizarre InternalTorchDynamoError with locally and formerly working code

To launch torchrun on multiple devices you would use torchrun --nproc_per_node==8 ... which will then correspond to the --local-rank argument inside your script as described here.
In your approach you are launching your script with torchrun only and are not using the --local-rank at all, so again unsure how this should have ever worked.
Alternatively, you can also use a multiprocessing approach inside your script which will spawn the processes there as described in this tutorial.