Drop layers randomly in a ModuleList

kl_divergence · February 4, 2020, 4:33am

I have several layers (nn.Modules) in a ModuleList. I want to drop layers just like dropout does at a rate specified by users. Could you share an efficient implementation (snippet) ?

vainaijr · February 4, 2020, 4:50am

do you mean something like this,

model = nn.ModuleList([nn.Linear(10, 10), nn.Linear(10, 10), nn.ReLU()])
model

ModuleList(
  (0): Linear(in_features=10, out_features=10, bias=True)
  (1): Linear(in_features=10, out_features=10, bias=True)
  (2): ReLU()
)

x = torch.randint(0, len(model), (2,)); x

tensor([2, 0])

for i in x:
  model[i] = nn.Identity()

model

ModuleList(
  (0): Identity()
  (1): Linear(in_features=10, out_features=10, bias=True)
  (2): Identity()
)

kl_divergence · February 4, 2020, 4:53am

All the layers in the modulelist are identical, you can think of them as

nn.ModuleList([nn.Linear(10, 10), nn.Linear(10, 10),  nn.Linear(10, 10), nn.Linear(10, 10) ])

I want to drop some of the layers randomly as specified by drop percentage.

vainaijr · February 4, 2020, 5:48am

would this work,

class Model(nn.Module):
  def __init__(self):
    super().__init__()
    self.module_list = nn.ModuleList([nn.Linear(2, 2, bias=False) for i in range(5)])
  def forward(self, input, num_layers_to_drop):
    x = torch.randint(0, len(self.module_list), (num_layers_to_drop,))
    # x is index for layers we want to drop
    print(x)
    out = input
    for index, layer in enumerate(self.module_list):
      if index not in x:
        out = layer(out) # we pass input only through indices not in x
    return out

net = Model()
optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
input = torch.randn(2, 2)
optimizer.zero_grad()
loss = net(input, 2).sum()
loss.backward()

before optimizer.step()

list(net.module_list.parameters())

[Parameter containing:
 tensor([[ 0.2603, -0.4476],
         [ 0.2753, -0.3421]], requires_grad=True), Parameter containing:
 tensor([[ 0.1724, -0.2733],
         [-0.0699,  0.0609]], requires_grad=True), Parameter containing:
 tensor([[ 0.7007,  0.4430],
         [ 0.1900, -0.0225]], requires_grad=True), Parameter containing:
 tensor([[0.1543, 0.5798],
         [0.2771, 0.6099]], requires_grad=True), Parameter containing:
 tensor([[ 0.3549, -0.3747],
         [ 0.1592, -0.2938]], requires_grad=True)]

we do our update, suppose x is tensor([2, 0])

optimizer.step()

[Parameter containing:
 tensor([[ 0.2603, -0.4476],
         [ 0.2753, -0.3421]], requires_grad=True), Parameter containing:
 tensor([[ 0.1715, -0.2729],
         [-0.0708,  0.0613]], requires_grad=True), Parameter containing:
 tensor([[ 0.7007,  0.4430],
         [ 0.1900, -0.0225]], requires_grad=True), Parameter containing:
 tensor([[0.1556, 0.5793],
         [0.2754, 0.6105]], requires_grad=True), Parameter containing:
 tensor([[ 0.3549, -0.3745],
         [ 0.1591, -0.2937]], requires_grad=True)]

layer at index 0, 2 do not get updated.

kl_divergence · February 4, 2020, 7:01am

Thanks, this will do for now.