After tens of epochs, it errors. I train my model on 4 GPUs and batchsize is 20. No errors occur when training on 1 gpu.
What’s the possible wrong here?
After tens of epochs, it errors. I train my model on 4 GPUs and batchsize is 20. No errors occur when training on 1 gpu.
What’s the possible wrong here?