NoneType attribute when using DataParallel

amitmore17 · December 26, 2017, 1:32pm

I have a model net() with attribute ‘self.x’.
I can access this attribute using net.x.
When using DataParallel as
net = nn.DataParallel(net, device_ids=[0])
I can access the attribute as net.module.x.
However, when I use
net = nn.DataParallel(net, device_ids=[0,1])
I get NoneType object in return using net.module.x

Following is a code to reproduce it.

import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as torchF
from torch.autograd import Variable

class Net(nn.Module):
      def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 3, kernel_size=3, padding=1, stride=1)	#
    	self.conv2 = nn.Conv2d(3, 3, kernel_size=3, padding=1, stride=1)	#
    	self.x = None
      def forward(self,x):
        x = self.conv1(x)
        self.x = x 
        x = self.conv2(x)
        return x


net = Net()
#net = nn.DataParallel(net, device_ids=[0]) #this line does not produce the error
net = nn.DataParallel(net, device_ids=[0,1]) #this line produces the error
net.cuda()
x = Variable(torch.cuda.FloatTensor(np.random.rand(3,3,7,7)))
out_x=net(x)
out = torchF.l1_loss(out_x,Variable(torch.cuda.FloatTensor(-np.random.rand(3,3,7,7))))
net.zero_grad()
out.backward()
print(type(net.module.x))   
print(net.module.x.size())

amitmore17 · December 26, 2017, 1:37pm

I am new to this forum, so I am unable to write my code with proper indentations.

richard · December 27, 2017, 12:47am

What exactly is your error message?

ngimel · December 27, 2017, 12:58am

This is expected, as you are assigning net.x only in forward. If you are wrapping your model in DataParallel with more than 1 devices (your failing line, device_ids=[0,1], DataParallel operates on model replicas, and changes made to replicas (during forward) are not visible outside forward/backward calls.

amitmore17 · December 27, 2017, 5:50am

Case1:
when running the code with “net = nn.DataParallel(net, device_ids=[0])” line enabled i get following response for the print statements:

<class ‘torch.autograd.variable.Variable’>
(3L, 3L, 7L, 7L)

Case 2:
When I ran the code with “net = nn.DataParallel(net, device_ids=[0,1])” line enabled following is a response:

<type ‘NoneType’>
Traceback (most recent call last):
File “test.py”, line 30, in
print(net.module.x.size())
AttributeError: ‘NoneType’ object has no attribute ‘size’

In summary, the object ‘self.x’ is of type autograd.variable.Variable.
But for Case 2, I get NoneType object in return instead of valid Variable.

amitmore17 · December 27, 2017, 5:53am

@ngimel : Thanks, I get it now.
Is there any way around this situation? I am training a model on multiple GPUs and need to access some model attributes assigned during forward pass for debugging purposes.

ngimel · December 27, 2017, 7:40pm

Try modifying your forward to also return the attribute you are interested in, e.g.:

y = self.conv1(x)
x = self.conv2(y)
return x,y

and inspect returned values of your net. Or use distributed instead of DataParallel.