Suggestions requested for saving and loading model snapshots/checkpoints

I have access to a GPU cluster that provides 20 minute slots.
I would like to run the training procedure for as many ‘mini-batches’ (not epochs) as possible in those 20 minutes and save the state. In the next 20 minute slot I would like to start training from the previous state. Can you please provide me with some pointers/example of how to do this? I am a beginner in PyTorch and apologies if this has been discussed elsewhere.

you can look at our examples repository on how to save / load models, maybe the imagenet example will serve the purpose:

Thanks for the pointer @smth and the prompt reply. I will check it out.