I have a model net() with attribute ‘self.x’.
I can access this attribute using net.x.
When using DataParallel as
net = nn.DataParallel(net, device_ids=)
I can access the attribute as net.module.x.
However, when I use
net = nn.DataParallel(net, device_ids=[0,1])
I get NoneType object in return using net.module.x
Following is a code to reproduce it.
import numpy as np
import torch.nn as nn
import torch.nn.functional as torchF
from torch.autograd import Variable
self.conv1 = nn.Conv2d(3, 3, kernel_size=3, padding=1, stride=1) #
self.conv2 = nn.Conv2d(3, 3, kernel_size=3, padding=1, stride=1) #
self.x = None
x = self.conv1(x)
self.x = x
x = self.conv2(x)
net = Net()
#net = nn.DataParallel(net, device_ids=) #this line does not produce the error
net = nn.DataParallel(net, device_ids=[0,1]) #this line produces the error
x = Variable(torch.cuda.FloatTensor(np.random.rand(3,3,7,7)))
out = torchF.l1_loss(out_x,Variable(torch.cuda.FloatTensor(-np.random.rand(3,3,7,7))))
I am new to this forum, so I am unable to write my code with proper indentations.
What exactly is your error message?
This is expected, as you are assigning net.x only in forward. If you are wrapping your model in DataParallel with more than 1 devices (your failing line, device_ids=[0,1], DataParallel operates on model replicas, and changes made to replicas (during forward) are not visible outside forward/backward calls.
when running the code with “net = nn.DataParallel(net, device_ids=)” line enabled i get following response for the print statements:
(3L, 3L, 7L, 7L)
When I ran the code with “net = nn.DataParallel(net, device_ids=[0,1])” line enabled following is a response:
Traceback (most recent call last):
File “test.py”, line 30, in
AttributeError: ‘NoneType’ object has no attribute ‘size’
In summary, the object ‘self.x’ is of type autograd.variable.Variable.
But for Case 2, I get NoneType object in return instead of valid Variable.
@ngimel : Thanks, I get it now.
Is there any way around this situation? I am training a model on multiple GPUs and need to access some model attributes assigned during forward pass for debugging purposes.
Try modifying your forward to also return the attribute you are interested in, e.g.:
y = self.conv1(x)
x = self.conv2(y)
and inspect returned values of your net. Or use distributed instead of DataParallel.