I trained a simple network on MNIST dataset with multi-gpu and single-gpu. But the multi-gpu version took 61 seconds, the single-gpu version took 18 seconds? Is it to do with data switching between diffetent gpu?
I would suggest trying something larger in terms of both model and data.
MNIST data is really tiny and your model is likely very small, so you are probably running into more overhead than advantage when using multiple GPUs.
I played around a little but I didn’t actually see much of an advantage unless I moved to large-scale datasets like ImageNet, Pascal etc. where models are larger as well.
Thanks for your reply, and I will try it.