I am currently running two versions of the same model (DDP and single GPU) and am trying to ensure the model parameters are the same between the two. To do so, I load a checkpoint at the first epoch of both models and run it for several iterations. When I load the checkpoint the initial parameters are the same (DDP is running two processes in this example):
DDP Model Output
Rank: 0
model.state_dict(): OrderedDict([('module.module.input_networks.0.0.net.net.0.weight', tensor([[ 0.3889, 0.1429, 0.1801, 0.2079],
[ 0.4926, -0.3095, -0.1581, -0.1851],
[ 0.2123, -0.4082, -0.3036, 0.1350],
[ 0.3666, 0.2588, 0.3510, -0.4564],
Rank: 1
OrderedDict([('module.module.input_networks.0.0.net.net.0.weight', tensor([[ 0.3889, 0.1429, 0.1801, 0.2079],
[ 0.4926, -0.3095, -0.1581, -0.1851],
[ 0.2123, -0.4082, -0.3036, 0.1350],
[ 0.3666, 0.2588, 0.3510, -0.4564],
Single GPU Model Output
model.state_dict(): OrderedDict([('input_networks.0.0.net.net.0.weight', tensor([[ 0.3889, 0.1429, 0.1801, 0.2079],
[ 0.4926, -0.3095, -0.1581, -0.1851],
[ 0.2123, -0.4082, -0.3036, 0.1350],
[ 0.3666, 0.2588, 0.3510, -0.4564],
However, once I run them both for 1 iteration, the parameters differ (by 0.001 to 0.002) and the outputs look like this:
DDP Model Output
Rank 0:
model.state_dict(): OrderedDict([('input_networks.0.0.net.net.0.weight', tensor([[ 0.3886, 0.1431, 0.1805, 0.2080],
[ 0.4927, -0.3096, -0.1583, -0.1849],
[ 0.2124, -0.4080, -0.3040, 0.1352],
[ 0.3663, 0.2588, 0.3514, -0.4563],
Rank 1:
model.state_dict(): OrderedDict([('input_networks.0.0.net.net.0.weight', tensor([[ 0.3886, 0.1431, 0.1805, 0.2080],
[ 0.4927, -0.3096, -0.1583, -0.1849],
[ 0.2124, -0.4080, -0.3040, 0.1352],
[ 0.3663, 0.2588, 0.3514, -0.4563],
Single GPU Model Output
model.state_dict(): OrderedDict([('module.module.input_networks.0.0.net.net.0.weight', tensor([[ 0.3888, 0.1430, 0.1802, 0.2078],
[ 0.4925, -0.3094, -0.1582, -0.1850],
[ 0.2124, -0.4081, -0.3037, 0.1351],
[ 0.3665, 0.2587, 0.3511, -0.4565],
Does anyone have experience with why this may be happening? My model does not have any dropout layers and should be updating the same way.