Issues with nn.Module attribute on multiple GPUs

Hello everybody,

I am just having some issues executing a custom module on multiple GPUs.
Here is an equivalent sample of the code I am trying to debug:

class fooModule(nn.Module):
    def __init__(self):
        super(fooModule, self).__init__()

        self.first=True
    
    def forward(self, input):
        if self.first==True:
            print('First time in fooModule')
            self.first=False
        else:
            print('NOT first time in fooModule')

When I execute the following:

net=fooModule()
x=Variable(torch.Tensor(1))
for i in range(200):
    net(x)

the output is:

First time in fooModule
NOT first time in fooModule
NOT first time in fooModule
NOT first time in fooModule
NOT first time in fooModule
NOT first time in fooModule
NOT first time in fooModule
NOT first time in fooModule

If instead I try with:

net=fooModule()
net=torch.nn.DataParallel(net).cuda()
x=Variable(torch.Tensor(1)).cuda()

the output is:

First time in fooModule
First time in fooModule
First time in fooModule
First time in fooModule
First time in fooModule
First time in fooModule
First time in fooModule
First time in fooModule
First time in fooModule
First time in fooModule
First time in fooModule

I am not an expert with PyTorch, thus I was wondering if there is something wrong that I am not able to see yet in the implementation.
What I expect is exactly the output of the model running on the CPU.

Thank you in advance, this is the best deep learning framework I have ever used, especially for your documentation and availability in helping PyTorch users.

Jak94