Questions about using shared models

fedetask · December 3, 2020, 1:49pm

I have some questions about using the torch.multiprocessing module. Let’s say I have a torch.nn.Module called model and I call model.share_memory() on it.

What happens if two threads call the forward(), i.e. model(input) at the same time? Is it safe? Or should I use Lock mechanisms to be sure that model is not accessed at the same time by multiple threads?
Similarly, what happens if two or more threads have an optimizer working on model.parameters() and they call optimizer.step() at the same time?

I ask these questions because I often see the optimizer.step() being called on shared models without lock mechanisms (i.e. in RL implementations of A3C or ACER) and I wonder if it is a safe thing to do.

pritamdamania87 · December 4, 2020, 11:02pm

This really depends on the implementation of your forward function. Typically a forward function doesn’t modify any state so it is safe to call forward from two different threads. However if your forward function is modifying some state for some reason, there might be a race here if you have two threads calling forward.

In this case, the optimizers would step on each other and as you mention without any lock mechanisms there might be some inconsistency. However, many frameworks still do this without any lock mechanisms because they are leveraging HOGWILD!, which is basically a paper that showed your training can converge even if you don’t have strict locking around your parameter updates as long as your parameter updates are sparse. You can refer to the paper for more details on why and how this works.

The PyTorch Hogwild example does something similar: Multiprocessing best practices — PyTorch 2.1 documentation.