Example code is shown as follows:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class layer(nn.Module):
def __init__(self):
super(layer, self).__init__()
self.fc = nn.Linear(10, 10)
self.cnt = 0
def forward(self, x):
x = self.fc(x)
self.cnt += 10
return x
model = nn.DataParallel(layer(), device_ids=[0,1]).cuda()
x = Variable(torch.Tensor(10, 10)).cuda()
out = model(x)
print(model.module.cnt) # output: 0
The float variable model.module.cnt
cannot be modified in forward() function for nn.DataParallel
or the multi-gpu case, always resulting in zero. Is there any simple solution, e.g., put the addition operation on cpu?
The issue seems not conform to the official docs:
Arbitrary positional and keyword inputs are allowed to be passed into DataParallel EXCEPT Tensors. All variables will be scattered on dim specified (default 0). Primitive types will be broadcasted, but all other types will be a shallow copy and can be corrupted if written to in the model’s forward pass. [torch.nn — PyTorch master documentation]