Independent optimizations on multiple GPUs


I have a classic use-case of white-box adversarial attacks generation, where for each input image from the test set (using a dataloader and a pre-trained model) I generate an adversarial example using a standard training process with Adam optimizer.

I have a machine with multiple GPUs, so I want to parallelize the optimization on different images, since they’re independent.
Essentially, we can look at it as completely different processes, except they use the same trained model (could be different copies of it if necessary) and there has to be a “main” process that in every iteration loads <num_gpus> samples from the dataloader, and sends each one of them to a different GPU.

what’s the cleanest way to implement this parallelization?


One approach would be to store the model copies from each GPU in e.g. a list, push the data chunks to the appropriate device, and execute the training using each model:

models = ['cuda:0'),'cuda:1), ...]
dataA ='cuda:0')
dataB ='cuda:1')
data = [dataA, dataB, ...]

for model, input in zip(models, data):
    output = model(input)

As long as the workload is not tiny and the CPU can schedule the kernel launches and run ahead, you should see overlapping computation.