Four p2.xlarge vs two p3.2xlarge

Each EC2 p3.2xlarge instance has 8 CPU and 1 GPU. I’m allowed a maximum of 16 CPUs on AWS at any given time, so I’ve been doing distributed training over 2 GPUs. However, I’ve just noticed that each p2.xlarge instance has 4 CPU and 1 GPU. This means that I could have 4 of these, which means training over 4 GPUs. Would this make training faster?

What factors should be taken into consideration? 4 instances do take more time to set up than 2. Cost is not an issue. I’m doing mixed precision distributed training with apex.

Thanks

This is possible. Hope Figure 9 in this paper can offer some insight: https://arxiv.org/pdf/2006.15704.pdf

What factors should be taken into consideration?

If the GPUs are the same, the network bandwidth is one of the dominating factor of training speed.

4 instances do take more time to set up than 2.

This is one-time setup overhead, instead of per-iteration overhead, right? If so, it should be fine.

1 Like