How to change float variables in DataParallel forward function?

wandering007 · November 18, 2017, 8:35am

Example code is shown as follows:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


class layer(nn.Module):
    def __init__(self):
        super(layer, self).__init__()
        self.fc = nn.Linear(10, 10)
        self.cnt = 0

    def forward(self, x):
        x = self.fc(x)
        self.cnt += 10
        return x

model = nn.DataParallel(layer(), device_ids=[0,1]).cuda()

x = Variable(torch.Tensor(10, 10)).cuda()
out = model(x)
print(model.module.cnt) # output: 0

The float variable model.module.cnt cannot be modified in forward() function for nn.DataParallel or the multi-gpu case, always resulting in zero. Is there any simple solution, e.g., put the addition operation on cpu?

The issue seems not conform to the official docs:

Arbitrary positional and keyword inputs are allowed to be passed into DataParallel EXCEPT Tensors. All variables will be scattered on dim specified (default 0). Primitive types will be broadcasted, but all other types will be a shallow copy and can be corrupted if written to in the model’s forward pass. [torch.nn — PyTorch master documentation]

Hiperdyne19012 · September 18, 2020, 6:38am

Did you find a solution?

wandering007 · September 18, 2020, 9:59am

Yes, registering cnt as the buffer of the module solves it.

self.register_buffer("cnt", torch.tensor(0))

groffo · September 7, 2021, 8:46am

Hi! Did you solved this problem?