I want to train the model in more than one gpus, but when I use the parallel, I find that only the gpu:0 is work, other gpu use a little memory and the gpu_util is amostly 0; what’s more, the total memory that multi-gpus use is much less than one gpu in the same batch_size…please help me!
It’s possible that your model isn’t GPU bound so you’re not seeing a lot of activity on the GPU. Do you find the same thing happens on 0.3?
I meet the same thing ,only 1 gpu worked on code ,