Parallelizing projected gradient descent attack

Sia_Rezaei · April 13, 2021, 7:13pm

I’m trying to parallelize a projected gradient descent attack (on a single node).
The models parameters and buffers do not change, but the input images do.
So there is no overhead of parameter/gradient synchronization between instances of the model running on different GPUs.
This should provide a good opportunity for a great deal of speed up, since there is no need to move around parameters/gradients.

What is the best way to implement this, so I can take full advantage of the opportunity (provided by not needing to sync model instances)

If I were to use DataParallel is there a way to tell it not to sync model after a forward/backward iteration? Or is it smart enough not to sync them if they haven’t changed?

Thanks in advance

Yanli_Zhao · April 14, 2021, 9:54pm

if you do not want to sync gradients for the models, and you are using DistributedDataParallel(), you can try this:

ddp_model = DistributedDataParallel(model)
with ddp_model.no_sync():
training loop