Parallelizing projected gradient descent attack

I’m trying to parallelize a projected gradient descent attack (on a single node).
The models parameters and buffers do not change, but the input images do.
So there is no overhead of parameter/gradient synchronization between instances of the model running on different GPUs.
This should provide a good opportunity for a great deal of speed up, since there is no need to move around parameters/gradients.

What is the best way to implement this, so I can take full advantage of the opportunity (provided by not needing to sync model instances)

If I were to use DataParallel is there a way to tell it not to sync model after a forward/backward iteration? Or is it smart enough not to sync them if they haven’t changed?

Thanks in advance

if you do not want to sync gradients for the models, and you are using DistributedDataParallel(), you can try this:

ddp_model = DistributedDataParallel(model)
with ddp_model.no_sync():
training loop