Undocumented nn.Sequential add_module behavior

When adding attributes to a nn.Sequential objects (because you want to access them later), attributes that are nn.Module themselves are automatically added to the sequence. This can be convenient, but it’s easy to miss.

Correct::

In [4]: class Net(torch.nn.Sequential):
   ...:     def __init__(self, activation):
   ...:         sequence = collections.OrderedDict(activation=activation)
   ...:         super().__init__(sequence)
   ...:

In [5]: Net(activation=torch.nn.Tanh())(torch.ones(2, 2))
Out[5]:

 0.7616  0.7616
 0.7616  0.7616
[torch.FloatTensor of size 2x2]

Also correct, but undocumented:

In [4]: class Net(torch.nn.Sequential):
   ...:     def __init__(self, activation):
   ...:         super().__init__()
   ...:         self.activation = activation
   ...:

In [5]: Net(activation=torch.nn.Tanh())(torch.ones(2, 2))
Out[5]:

 0.7616  0.7616
 0.7616  0.7616
[torch.FloatTensor of size 2x2]

Incorrect output, but forward will run without errors:

In [17]: class Net(torch.nn.Sequential):
    ...:     def __init__(self, activation):
    ...:         super().__init__(activation)
    ...:         self.activation = activation
    ...:

In [18]: Net(activation=torch.nn.Tanh())(torch.ones(2, 2))
Out[18]:

 0.6420  0.6420
 0.6420  0.6420
[torch.FloatTensor of size 2x2]

This behavior is not found in the documentation. Is it intentional? If so, should it be documented?

PyTorch version 0.3.0.post4.

1 Like

In your last example you are adding the Tanh twice.
Passing it to the __init__ of the parent class will already add it to the Modules.
See the __init__ definition in the source code.
The second assignment as self.activation = activation will add another Tanh to the model.

Thanks ptrblck, I understand why it happens after inspecting the output and module list, I was just surprised by the behavior at first: it’s not documented that adding Module attributes to a nn.Sequential instance automatically adds the modules to the forward call.

Obvious in hindsight since the attribute is added to self._modules, but easy to miss. Hence the question: should this be documented? To add to that, is second way of initializing a Sequential recommended? Also see this thread.

Yeah, you are right. This point is easy to miss.
However, probably it’ll be easier to create a new Module instead of deriving from Sequential directly.
What is your use case where you need to derive from Sequential?
Maybe it’ll be worth it to create a small example for exactly this use case.

It’s certainly possible to create a Module instead, I’m dynamically creating the module contents so figured a sequential would be a nice fit. As an artificial example (it makes no sense to have activation=None here…):

class LstmOutput(nn.Sequential):
    def __init__(self, lstm, activation=None):
         sequence = [lstm]
         if activation is not None:
             sequence.append(activation)
         super().__init__(*sequence)      
         
         self.lstm = lstm  

    def zero_state(self):
        self.lstm.zero_state()

Of course, there are many ways to get around this: use a collections.OrderedDict so that the lstm attribute is added with the correct attribute name, use the apply function to zero_state the lstm, etc. Still, unless you inspect network architecture, it’s easy to miss that the order of operations is lstm -> activation -> lstm.