Question about ModuleList

If I do something like self.layers = nn.ModuleList([nn.Linear(3,3)] * 4), will each of those 4 linear layers have the same reference? Will gradient updates be the same for all 4 layers?

I think so. When you use [nn.Linear(3,3)] * 4, Python creates one nn.Linear(3,3) object, and then the list [ ] contains four references to that single object instead of creating four distinct nn.Linear instances. You can see this using Pythons id() function which returns a unique id for every object. Running

import torch.nn as nn

linear_layer = nn.Linear(3, 3)
layers_list = [linear_layer] * 4

print(id(layers_list[0]))
print(id(layers_list[1]))
print(id(layers_list[2]))
print(id(layers_list[3]))

returns the same id four times.
Hope that helps.