Is it possible to split a GPU in half, and apply DataParallel ? It seems like my model doesn’t use the full GPU capacity, and I’ve read that increasing the batch size (which would give more work to the GPU) changes the learning and could make it worse.
To my knowledge, the effect would be the same. Since dataparallel makes the copy of the model on different gpus and merge the results. The backward gradient is also passed across all the gpus and collectively updated. That means, all these copies update at once. So would there be a difference? It shouldnt.
So how to use you gpu more efficiently? run multiple experiments at once!