I’m trying to parallelize a projected gradient descent attack (on a single node).
The models parameters and buffers do not change, but the input images do.
So there is no overhead of parameter/gradient synchronization between instances of the model running on different GPUs.
This should provide a good opportunity for a great deal of speed up, since there is no need to move around parameters/gradients.
What is the best way to implement this, so I can take full advantage of the opportunity (provided by not needing to sync model instances)
If I were to use DataParallel is there a way to tell it not to sync model after a forward/backward iteration? Or is it smart enough not to sync them if they haven’t changed?
Thanks in advance