Not wrapping list of layers with a nn.ModuleList still trains the layer in backprop?

they might be receiving gradients, as they might be part of computation graph dynamically created during forward() function. But the parameters won’t be updated as the optimizer is not acting on those gradients.

Is there any specific behavior you see thats not consistent? It would be good to know.