i have a parameter, and it’s initialized during first forward by:
def forward(self, x): if not self.initialized: a = nn.Parameter(torch.mean(x)) self.initialized = True
note that, x is input data,
so the problem is,
when using ddp, batch data is different on each devices, so parameter a is initialized differently on each devices. this can not be solved by setting random seed, i think. it’s more like a sync problem.
how can i initialize them all the same?