How to assign each layer its following layer but not make the following layer as a child of current layer?

deJQK · June 29, 2021, 9:27am

Suppose I have one model of l1 → l2 → l3 … I want to make l1 knows l2 is its following layer, and l2 know l3. If I set l1.following_layer = l2 after defining l2 or the whole model, the model will become very cumbersome because l2 will become a child of l1. Moreover, when I reset parameters during initialization, l2’s weight might also change, which might not be desired. I only want l1 know l2 is following it.

Thanks.

ptrblck · June 30, 2021, 5:14am

Could you explain the use case a bit?
.following_layer isn’t a default attribute so I assume you are trying to create a manual graph somehow?

deJQK · June 30, 2021, 5:33am

Thanks @ptrblck . I found setting the following layer into dict solves this, i.e., l1.following_layer[‘following’] = l2. I need this because I need to use weights from l2 for l1.

deJQK · July 13, 2021, 5:34am

@ptrblck It seems that this method does not work if I would like to distribute the model on multiple GPUs, as the parameters in the module will be on different devices.

For example, suppose my module is something like this:

class A(nn.Module):
    def __init__(self, m):
       super(A, self).__init__()
       self.l_d = {'m': m}

    def forward(self, x):
        m = self.l_d['m']
        if m is not None:
            res = m.weight * self.weight * x
        else:
            res = self.weight * x
        return res

class M(nn.Module):
    def __init__(self):
        super(M, self).__init__()
        self.a = A(None)
        self.b = A(self.a)

If I use m = M(); m = nn.DataParallel(m), it will report that self.l_d['m'].weight is not on the same device as m.weight, because that module is not parallelized.

If I use self.l_d = nn.ModuleDict() in the initialization of the class A, during nn.DataParallel it will report bug of Recursion Error: maximum recursion depth exceeded.

BTW the native dictionary works well for one device.