How many ConvLSTM cells are instantiated?

Hello. I have defined the following ConvLSTM class, code borrowed from here (GitHub - sladewinter/ConvLSTM) and I’ve skipped some boilerplate initializations.

class ConvLSTM(nn.Module):

    def __init__(self, in_channels, out_channels, 
    kernel_size, padding, lstm_activation, frame_size):

        super(ConvLSTM, self).__init__()

        self.out_channels = out_channels

        # We will unroll this over time steps
        self.convLSTMcell = ConvLSTMCell(in_channels, out_channels, kernel_size,
            padding, lstm_activation,  frame_size)


    def forward(self, X):
        # Get the dimensions
        batch_size, _, seq_len, height, width = X.size()
        ...
        # Unroll over time steps
        for time_step in range(seq_len):

            H, C = self.convLSTMcell(X[:,:,time_step], H, C)    
        
            output[:,:,time_step] = H
     
        return output, H, C

If I view the summary of this model using torchinfo, it shows trainable parameters only on the first ConvLSTM Cell and ‘(recursive)’ in the rest. However, if I print the model parameters using

print(sum(p.numel() for p in t.parameters() if p.requires_grad))

it reports a different number of trainable parameters.

How many trainable ConvLSTM cells are instantiated by writing the layer in this form? 1 and the output is obtained recursively or seq_len cells with trainable parameters in each one?

I see only a single ConvLSTMCell being initialized. What dies model.named_parameters() return for your model?

Outputs:

>>> print(model)
ConvLSTM(
  (convLSTMcell): ConvLSTMCell(
    (conv): Conv2d(26, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
)

>>> for name, param in t.named_parameters():
>>>     print(name)

convLSTMcell.W_ci
convLSTMcell.W_co
convLSTMcell.W_cf
convLSTMcell.conv.weight
convLSTMcell.conv.bias

It seems that only 1 ConvLSTM cell is being initialized. It this a usual way to create ConvLSTM layers? I thought that I would have to initialize as many ConvLSTM cells as the number of inputs to the network and not only one and then reuse it.

Are you aware of any official implementation of a ConvLSTM layer I can look at?

Thank you for taking the time to answer!