Example code is shown as follows:
import torch import torch.nn as nn import torch.nn.functional as F from torch.autograd import Variable class layer(nn.Module): def __init__(self): super(layer, self).__init__() self.fc = nn.Linear(10, 10) self.cnt = 0 def forward(self, x): x = self.fc(x) self.cnt += 10 return x model = nn.DataParallel(layer(), device_ids=[0,1]).cuda() x = Variable(torch.Tensor(10, 10)).cuda() out = model(x) print(model.module.cnt) # output: 0
The float variable
model.module.cnt cannot be modified in forward() function for
nn.DataParallel or the multi-gpu case, always resulting in zero. Is there any simple solution, e.g., put the addition operation on cpu?
The issue seems not conform to the official docs:
Arbitrary positional and keyword inputs are allowed to be passed into DataParallel EXCEPT Tensors. All variables will be scattered on dim specified (default 0). Primitive types will be broadcasted, but all other types will be a shallow copy and can be corrupted if written to in the model’s forward pass. [http://pytorch.org/docs/master/nn.html#torch.nn.DataParallel]