PyTorch Forums
Training performance degrades with DistributedDataParallel
distributed
Sergii_Makarevych
(Sergii Makarevych)
July 19, 2020, 5:56pm
27
source code is pretty straightforward
Continue training from checkpoint returns high loss values, while reasonable with .eval()
show post in topic