How to deal with excessive memory usages of PyTorch?

I can’t speak to the CPU portion of this. But in order to get this working on a GPU – you could use half the batch size you are currently using – which should take up around 3GB and fit on your GPU. You could also reduce the image size. There are ways to ensure that a smaller batch size won’t reduce the models ability to learn – via gradient accumulation.