Hello PyTorch Community,
I’m looking to optimize memory usage when loading PyTorch models onto CUDA. Typically, we load the model onto CPU memory first and then move it to CUDA, which can consume significant memory during the loading process.
Is there a method to load the model directly onto CUDA, bypassing CPU memory? Alternatively, what are the best practices to minimize memory usage when loading a model onto CUDA?
Thank you for your insights and guidance.