When adding attributes to a nn.Sequential objects (because you want to access them later), attributes that are nn.Module themselves are automatically added to the sequence. This can be convenient, but it’s easy to miss.
Correct::
In [4]: class Net(torch.nn.Sequential):
...: def __init__(self, activation):
...: sequence = collections.OrderedDict(activation=activation)
...: super().__init__(sequence)
...:
In [5]: Net(activation=torch.nn.Tanh())(torch.ones(2, 2))
Out[5]:
0.7616 0.7616
0.7616 0.7616
[torch.FloatTensor of size 2x2]
Also correct, but undocumented:
In [4]: class Net(torch.nn.Sequential):
...: def __init__(self, activation):
...: super().__init__()
...: self.activation = activation
...:
In [5]: Net(activation=torch.nn.Tanh())(torch.ones(2, 2))
Out[5]:
0.7616 0.7616
0.7616 0.7616
[torch.FloatTensor of size 2x2]
Incorrect output, but forward will run without errors:
In [17]: class Net(torch.nn.Sequential):
...: def __init__(self, activation):
...: super().__init__(activation)
...: self.activation = activation
...:
In [18]: Net(activation=torch.nn.Tanh())(torch.ones(2, 2))
Out[18]:
0.6420 0.6420
0.6420 0.6420
[torch.FloatTensor of size 2x2]
This behavior is not found in the documentation. Is it intentional? If so, should it be documented?
In your last example you are adding the Tanh twice.
Passing it to the __init__ of the parent class will already add it to the Modules.
See the __init__ definition in the source code.
The second assignment as self.activation = activation will add another Tanh to the model.
Thanks ptrblck, I understand why it happens after inspecting the output and module list, I was just surprised by the behavior at first: it’s not documented that adding Module attributes to a nn.Sequential instance automatically adds the modules to the forward call.
Obvious in hindsight since the attribute is added to self._modules, but easy to miss. Hence the question: should this be documented? To add to that, is second way of initializing a Sequential recommended? Also see this thread.
Yeah, you are right. This point is easy to miss.
However, probably it’ll be easier to create a new Module instead of deriving from Sequential directly.
What is your use case where you need to derive from Sequential?
Maybe it’ll be worth it to create a small example for exactly this use case.
It’s certainly possible to create a Module instead, I’m dynamically creating the module contents so figured a sequential would be a nice fit. As an artificial example (it makes no sense to have activation=None here…):
class LstmOutput(nn.Sequential):
def __init__(self, lstm, activation=None):
sequence = [lstm]
if activation is not None:
sequence.append(activation)
super().__init__(*sequence)
self.lstm = lstm
def zero_state(self):
self.lstm.zero_state()
Of course, there are many ways to get around this: use a collections.OrderedDict so that the lstm attribute is added with the correct attribute name, use the apply function to zero_state the lstm, etc. Still, unless you inspect network architecture, it’s easy to miss that the order of operations is lstm -> activation -> lstm.