class Node(object):
def __init__(self, name, layer, op_type=None):
self.name = name
self.layer = layer
self.op_type = op_type
self.output_trace = None
self.prev_list = []
self.next_list = []
class Graph(nn.Module):
def __init__(self, layer_dict):
super().__init__()
self.layer_dict = layer_dict
self.node_dict = {name: Node(name, layer, Graph.__get_op_type(name)) for name, layer in self.layer_dict.items()}
I define my model class at the code snippet above. self.layer_dict
is a nn.ModuleDict
and I define the topological structure of the network in self.node_dict
. However, when I wrapped my model using nn.DataParallel
I ran into this error RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
.
I came across this GitHub thread https://github.com/pytorch/pytorch/issues/8637 where the problem described seems to be very similar to mine. It seems like as the attribute self.node_dict
is not a tensor, it doesn’t get properly broadcast to all the GPUs so the broadcast copy self.node_dict
s all point to the same layers that are on GPU 0.
I’ve tried to rectify this by matching the device_ids in the self.forward
method like this:
def forward(self, x):
process_stack = self.topological_sort(self.node_dict)
while process_stack:
nodename = process_stack.pop()
current_node = self.node_dict[nodename]
# Matching the device ids HERE!!!!
data_device = x.device
current_node.layer = current_node.layer.to(data_device)
if not current_node.prev_list:
x = current_node.layer(x)
else:
x = current_node.layer(out_trace_sum)
current_node.output_trace = x
Unfortunately it doesn’t work most probably due to the async nature of the implementation:
Before OP, data: cuda:0, layer: cuda:0
After OP, data: cuda:0, layer: cuda:0
Before OP, data: cuda:1, layer: cuda:0
Before OP, data: cuda:2, layer: cuda:0
Before OP, data: cuda:3, layer: cuda:0
After OP, data: cuda:2, layer: cuda:2
After OP, data: cuda:1, layer: cuda:2
After OP, data: cuda:3, layer: cuda:3
I’m out of ideas…