I have several layers (nn.Modules) in a ModuleList. I want to drop layers just like dropout does at a rate specified by users. Could you share an efficient implementation (snippet) ?
do you mean something like this,
model = nn.ModuleList([nn.Linear(10, 10), nn.Linear(10, 10), nn.ReLU()])
model
ModuleList(
(0): Linear(in_features=10, out_features=10, bias=True)
(1): Linear(in_features=10, out_features=10, bias=True)
(2): ReLU()
)
x = torch.randint(0, len(model), (2,)); x
tensor([2, 0])
for i in x:
model[i] = nn.Identity()
model
ModuleList(
(0): Identity()
(1): Linear(in_features=10, out_features=10, bias=True)
(2): Identity()
)
All the layers in the modulelist are identical, you can think of them as
nn.ModuleList([nn.Linear(10, 10), nn.Linear(10, 10), nn.Linear(10, 10), nn.Linear(10, 10) ])
I want to drop some of the layers randomly as specified by drop percentage.
would this work,
class Model(nn.Module):
def __init__(self):
super().__init__()
self.module_list = nn.ModuleList([nn.Linear(2, 2, bias=False) for i in range(5)])
def forward(self, input, num_layers_to_drop):
x = torch.randint(0, len(self.module_list), (num_layers_to_drop,))
# x is index for layers we want to drop
print(x)
out = input
for index, layer in enumerate(self.module_list):
if index not in x:
out = layer(out) # we pass input only through indices not in x
return out
net = Model()
optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
input = torch.randn(2, 2)
optimizer.zero_grad()
loss = net(input, 2).sum()
loss.backward()
before optimizer.step()
list(net.module_list.parameters())
[Parameter containing:
tensor([[ 0.2603, -0.4476],
[ 0.2753, -0.3421]], requires_grad=True), Parameter containing:
tensor([[ 0.1724, -0.2733],
[-0.0699, 0.0609]], requires_grad=True), Parameter containing:
tensor([[ 0.7007, 0.4430],
[ 0.1900, -0.0225]], requires_grad=True), Parameter containing:
tensor([[0.1543, 0.5798],
[0.2771, 0.6099]], requires_grad=True), Parameter containing:
tensor([[ 0.3549, -0.3747],
[ 0.1592, -0.2938]], requires_grad=True)]
we do our update, suppose x is tensor([2, 0])
optimizer.step()
[Parameter containing:
tensor([[ 0.2603, -0.4476],
[ 0.2753, -0.3421]], requires_grad=True), Parameter containing:
tensor([[ 0.1715, -0.2729],
[-0.0708, 0.0613]], requires_grad=True), Parameter containing:
tensor([[ 0.7007, 0.4430],
[ 0.1900, -0.0225]], requires_grad=True), Parameter containing:
tensor([[0.1556, 0.5793],
[0.2754, 0.6105]], requires_grad=True), Parameter containing:
tensor([[ 0.3549, -0.3745],
[ 0.1591, -0.2937]], requires_grad=True)]
layer at index 0, 2 do not get updated.
Thanks, this will do for now.