From my understanding it’s ok to reuse nn.Relu and nn.Maxpool in the forward because they don’t have trainable parameters. I’m not entirely sure of why this works, perhaps someone can clarify this for me. Let’s say I have
class NN(nn.Module):
def __init__(self, input_size, num_classes):
super(NN, self).__init__()
self.relu = nn.ReLU()
self.fc1 = nn.Linear(input_size, 50)
self.fc2 = nn.Linear(50, 25)
self.fc3 = nn.Linear(25, num_classes)
def forward(self, x):
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))
x = self.fc3(x)
return x
How does Pytorch create the graph? I’m thinking if I reuse same nn.ReLU then there would be a problem for Pytorch to remember which gradients to set to 0/1. From my understanding it’s not a good idea to reuse dropout, why is this the case?