Each EC2 p3.2xlarge instance has 8 CPU and 1 GPU. I’m allowed a maximum of 16 CPUs on AWS at any given time, so I’ve been doing distributed training over 2 GPUs. However, I’ve just noticed that each p2.xlarge instance has 4 CPU and 1 GPU. This means that I could have 4 of these, which means training over 4 GPUs. Would this make training faster?
What factors should be taken into consideration? 4 instances do take more time to set up than 2. Cost is not an issue. I’m doing mixed precision distributed training with apex.