nn.Parameter in DDP

Hi,

I am optimizing(training) nn.Module and one nn.Parameter together.
DDP allows only nn.Module as its argument.
If I give nn.Module to DDP, the optimization works well in multigpu, but the nn.Parameter value is not updated after optimization is done. Does it require any postprocessing to make it updated?
How can I make the nn.Parameter value to be updated?
Do I need to get the parameter value per each gpu, and average them?

Thank you!

I create a nn.Module class to store nn.Parameter into the class. And wrap it to another DDP, which solves the problem.

class MyModel(nn.Module):
def init(self):
self._params = nn.ParamList()
def append(self, p):
self._params.append(p)
def forward(self, x): # dummy forward
return x

def train( local_rank, model, par ):

model = model.to(local_rank)
optimizer = Adam( list(model.parameters()) … )
DDP(model, device_ids=[local_rank])

for epoch in range(10):

loss = (par-2.0)**2
loss.backward()
optimizer.step()

if name == “main”:
model = MyModel()
p = nn.Parameter( torch.tensor(1.0), requires_grad=True )
model.append(p)
mp.spawn(train, args=(world_size, model, p), nprocs=world_size)

In the above code, p is nn.Parameter instance, which is created before starting multiprocessing. It is used in loss calculation. In the loss calculation, is p’s gradient different between subprocesses? If so, how can it be possible? p is a single instance which is created in if name == “main” block.

Thank you!

I have confirmed that its gradient is different for each subprocess.