Store model parameters on gpus

Tianyu9748 · October 13, 2022, 9:04pm

Is there any Lib helps to store the parameters of the model on gpu’s global memory (or on host memory), instead of the disk.
I have read the torch.save, it seems like will save the model parameters on disk.

Best
Max

ptrblck · October 14, 2022, 5:19am

The state_dict which you would store to your disk via torch.save is already in the host or GPU RAM (it has to live somewhere before you can serialize it) so could you explain your use case a bit more, please?

Tianyu9748 · October 14, 2022, 3:15pm

Hi ptrblck,
Sorry for the unclear explanation, hope this time I could make it clear.

My aim:
After finishing the training of a model, for example Resnet512, could I save the updated parameters to somewhere on the global memory?

If we could save parameters to somewhere on the global memory, could we load these parameters to set up a new Resnet512?
If we could save parameters to somewhere on the global memory, is there anyway to keep it ‘alive’, even after finishing the training and stop the training process?

Best
Max

ptrblck · October 14, 2022, 3:18pm

The parameters are already stored in the global memory of your GPU assuming you are training the model on the GPU. If you want to create a copy, you could use copy.deepcopy to do so.
No, you cannot use the GPU’s global memory for serialization and the memory will be released after your Python process exits.

Tianyu9748 · October 14, 2022, 4:46pm

Thank you for the explanation. That helps a lot.