I want to perform deep ensembles in my network and I have a question related to how to perform different training times in parallel.
With a batch size of 16 my network takes like 3 minutes per epoch and 2GB of GPU.
Having enough GPU memory, I was expecting that throwing 4 processes It would take also 3 min, but it took 12 minutes.
Clearly, I am doing something wrong or I don’t understand parallelism in GPU usage …
Could anyone help me in this matter?