Thanks a lot for digging into this! You also used 4 gpus? Could you rerun with say 10 epochs? This behavior is somewhat random for me and does not always trigger after the first epoch.
Update: I can reproduce the issue using 100 epochs and get:
File "tmp.py", line 64, in <module>
trainer.fit(model, dm)
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 444, in fit
results = self.accelerator_backend.train()
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/accelerators/dp_accelerator.py", line 106, in train
results = self.train_or_test()
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in train_or_test
results = self.trainer.train()
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 493, in train
self.train_loop.run_training_epoch()
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 589, in run_training_epoch
self.trainer.run_evaluation(test_mode=False)
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 609, in run_evaluation
eval_loop_results = self.evaluation_loop.log_epoch_metrics(deprecated_eval_results, epoch_logs, test_mode)
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 210, in log_epoch_metrics
eval_loop_results = self.trainer.logger_connector.on_evaluation_epoch_end(
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector.py", line 113, in on_evaluation_epoch_end
self._log_on_evaluation_epoch_end_metrics(epoch_logs)
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector.py", line 181, in _log_on_evaluation_epoch_end_metrics
reduced_epoch_metrics = dl_metrics[0].__class__.reduce_on_epoch_end(dl_metrics)
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/core/step_result.py", line 464, in reduce_on_epoch_end
recursive_stack(result)
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/core/step_result.py", line 603, in recursive_stack
result[k] = collate_tensors(v)
File "/opt/conda/envs/tmp/lib/python3.8/site-packages/pytorch_lightning/core/step_result.py", line 625, in collate_tensors
return torch.stack(items)
RuntimeError: All input tensors must be on the same device. Received cuda:1 and cuda:3
Iām not familiar enough with PyTorch Lightning and would suggest to create an issue with this code snippet in their repository.