Load models directly into CUDA?

Hello PyTorch Community,

I’m looking to optimize memory usage when loading PyTorch models onto CUDA. Typically, we load the model onto CPU memory first and then move it to CUDA, which can consume significant memory during the loading process.

Is there a method to load the model directly onto CUDA, bypassing CPU memory? Alternatively, what are the best practices to minimize memory usage when loading a model onto CUDA?

Thank you for your insights and guidance.

We are working on an RFC to use GPUDirectStorage allowing tensors to bypass the host and to be loaded directly into the device. I’m unsure where this RFC is stuck, but the last time I checked @mikaylagawarecki was working on the serialization part.

2 Likes

@ptrblck Thanks a lot for the information. Glad to see someone is actively working on this!