Is there are some bug about parallel in version 0.4?

I want to train the model in more than one gpus, but when I use the parallel, I find that only the gpu:0 is work, other gpu use a little memory and the gpu_util is amostly 0; what’s more, the total memory that multi-gpus use is much less than one gpu in the same batch_size…please help me!

It’s possible that your model isn’t GPU bound so you’re not seeing a lot of activity on the GPU. Do you find the same thing happens on 0.3?

I meet the same thing ,only 1 gpu worked on code ,