Reusing nn.Dropout/Relu/Maxpool in forward

From my understanding it’s ok to reuse nn.Relu and nn.Maxpool in the forward because they don’t have trainable parameters. I’m not entirely sure of why this works, perhaps someone can clarify this for me. Let’s say I have

class NN(nn.Module):
    def __init__(self, input_size, num_classes):
        super(NN, self).__init__()
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(input_size, 50)
        self.fc2 = nn.Linear(50, 25)
        self.fc3 = nn.Linear(25, num_classes)
    
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

How does Pytorch create the graph? I’m thinking if I reuse same nn.ReLU then there would be a problem for Pytorch to remember which gradients to set to 0/1. From my understanding it’s not a good idea to reuse dropout, why is this the case?

1 Like

Someone that could help me out with this?

You can reuse parameter-less modules, as only the computation will be tracked.
A theoretical example would be reusing nn.Add(a, b) instead of a + b.

That makes sense, but if we have dropout with the same droprate can we reuse that as well?

Yes, that should also work.

That’s good to know, do you know where can I read more about how Pytorch does this backprop or why this works in this way?

Thanks a lot, I appreciate your help!

A general rule might be that all operations without parameters (as in nn.Parameter) are fine to reuse, since Autograd will only track the operation.

Of course you can also reuse modules with parameters, but this could be seen as “weight sharing”.