Suppose I have one model of l1 → l2 → l3 … I want to make l1 knows l2 is its following layer, and l2 know l3. If I set l1.following_layer = l2 after defining l2 or the whole model, the model will become very cumbersome because l2 will become a child of l1. Moreover, when I reset parameters during initialization, l2’s weight might also change, which might not be desired. I only want l1 know l2 is following it.
Thanks @ptrblck . I found setting the following layer into dict solves this, i.e., l1.following_layer[‘following’] = l2. I need this because I need to use weights from l2 for l1.
@ptrblck It seems that this method does not work if I would like to distribute the model on multiple GPUs, as the parameters in the module will be on different devices.
For example, suppose my module is something like this:
class A(nn.Module):
def __init__(self, m):
super(A, self).__init__()
self.l_d = {'m': m}
def forward(self, x):
m = self.l_d['m']
if m is not None:
res = m.weight * self.weight * x
else:
res = self.weight * x
return res
class M(nn.Module):
def __init__(self):
super(M, self).__init__()
self.a = A(None)
self.b = A(self.a)
If I use m = M(); m = nn.DataParallel(m), it will report that self.l_d['m'].weight is not on the same device as m.weight, because that module is not parallelized.
If I use self.l_d = nn.ModuleDict() in the initialization of the class A, during nn.DataParallel it will report bug of Recursion Error: maximum recursion depth exceeded.
BTW the native dictionary works well for one device.