How to reduce the memory requirement for a GPU pytorch training process? (finally solved by using multiple GPUs)

You can split the model onto different GPUs, e.g. if the mode is really big and cannot fit on a single GPU.

Thank you very much!