Need for distributed training

I have few systems in my lab each having GPU Quadro K2000 but I am only able to perform Imagenet training on my deep model (MobileNet") with batch size up to 32, increasing the batch size gives me memory error while I am supposed to set batch size as 256 for reaching the convergence.
Therefore, I am thinking to use "distributed training " of PyTorch module to use all the systems and their GPUs to complete the job.
Please suggest, If distributed training serves the same purpose? Can I claim the results of distributed training as good as the one got with efficient GPU on the single node?