I’m struggling here with a multi-GPU setup with DataParallel, only getting 1.7x boost from 3 GPUs. If someone has any blind suggestions on how that can be improved, they would be appreciated. One thing I’ve noticed is that like 10x more PCIe bandwidth is used by PyTorch than by parallelized Keras+TF. Model is similar. I have very little idea why.
Anyway, I’m viewing this http://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html tutorial and I’m seeing some
model.gpu() and similar
.gpu() stuff written there and it seems like it’s a mistake made by someone who wanted to write