How to train multiple models on a single big GPU


I have to run multiple realizations of the same experiment. I want to use the recent and fast nvidia A100 chips, that have 40go of memory. Each realization of my experiment uses a Pytorch model that is small, i.e. typically works OK with 4Go of GPU memory. Therefore, I would like to know if it is possible to load multiple of these models on the A100 and train them in parallel on the same A100? Should i do this with the regular multiprocessing package or should i use other tools?


You could simply launch multiple Python scripts, which will execute kernels when device resources are free as explained here.