Is it a silent error to use layers added to self in a nn.sequential also assigned to self?


Sorry if this has answered before, it’s a bit hard to search for.

If I write a class where I instantiate torch.nn modules inside __init__ and make them attributes of self, then use those same layers in a Sequential model, is that a silent error?

Like this, I mean:

class MyModule(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(3, 64)
        self.conv2 = nn.Conv2d(64, 64) = nn.Sequential(self.conv1, self.conv2)

    def forward(self, x):

Would parameters get updated twice somehow, or something like that?

I realized I had done this without thinking about it when torchinfo showed me double the number of parameters for a layer – it counted once for the self.conv attribute and again for the parameters from the same module instance inside the Sequential.

Guessing the answer is “no” since this could result in some unpleasant surprises but just want to be sure.


edit: this looks close to what I’m asking

Not sure if that post is saying that what I’ve done above is a problem though.

It shouldn’t be a problem, since the nn.Sequential container will share the same parameters with the original layers.
However, TorchScript (and maybe other backends) might not accept models with shared parameters and could raise an error as described in the linked post.
The torchinfo output seems to be misleading, since you are not increasing the number of parameters:

Layer (type:depth-idx)                   Param #
MyModule                                 --
├─Conv2d: 1-1                            1,792
├─Conv2d: 1-2                            36,928
├─Sequential: 1-3                        38,720
│    └─Conv2d: 2-1                       (recursive)
│    └─Conv2d: 2-2                       (recursive)
Total params: 77,440
Trainable params: 77,440
Non-trainable params: 0

and you can also verify it e.g. my manipulating a parameter inplace:

class MyModule(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(3, 64, 3)
        self.conv2 = nn.Conv2d(64, 64, 3) = nn.Sequential(self.conv1, self.conv2)

    def forward(self, x):
model = MyModule()
with torch.no_grad():
    model.conv1.weight[0, 0].fill_(1.)

model.conv1.weight[0, 0]
# tensor([[1., 1., 1.],
#         [1., 1., 1.],
#         [1., 1., 1.]], grad_fn=<SelectBackward0>)[0].weight[0, 0]
# tensor([[1., 1., 1.],
#         [1., 1., 1.],
#         [1., 1., 1.]], grad_fn=<SelectBackward0>)

Got it, good to know this could be an issue for TorchScript down the road. I rewrote to avoid that.

Agreed the torchinfo.summary output is a bit misleading, although I don’t envy them the job of trying to parse eager graphs.

Thank you @ptrblck, you’re the undisputed forum king! Much appreciated

1 Like