Hi, I would like to ask. If I define some layers in the init funtion of a class, but in the forward function, I do not use these layers (not call them), does that affect anything?
For example, take the code below:
def __init__(self, input_size, hidden_size, output_size):
self.input_layer = nn.Linear(input_size, hidden_size)-
self.hidden_layer_1 = nn.Linear(hidden_size, hidden_size)
self.hidden_layer_2 = nn.Linear(hidden_size, hidden_size) # not used/not called in forward pass
self.output_layer = nn.Linear(hidden_size, output_size)
def forward(self, x):
out = F.relu(self.input_layer(x))
out = F.relu(self.hidden_layer_1(out))
out = torch.sigmoid(self.output_layer(out))
The best way to evaluate that is to try it But here are my thoughts:
Since it’s been instantiated, and probably sent to GPU, it will consume some memory, as the parameters are created and must be kept somewhere. However, gradients won’t be computed, as it’s not present in the forward method. Also, if you send the whole
model.parameters() to the optimizer, it might consume a bit more memory again? Not sure about this one.
In such cases, I would try to pass an additional parameter to the model, which defines if the layers have to be created or not, to save a bit of memory. The parameter won’t be kept if it’s just for initialization.
In addition, I suggest
def __init__(self, use_linear):
self.use_linear = use_linear
self.conv_layer = nn.Conv2d(3, 3, 1)
self.linear_layer = nn.Linear(12, 8)
def forward(self, x):
x = self.conv_layer(x)
return self.linear_layer if self.use_linear else x
Hi @alex.veuthey and thanks for your answer.
I did already. Even though I had to repeat training from the beginning, but I wanted to give it a try. Comparing the results from the first epoch w and w/o these extra parameters, I find that it does not affect. However, if I save the optimizer using torch.save and then load it back again using torch.load, there is an error. Most probably because the optimizer has the parameters saved in its dictionary but isn’t updating them. Not sure about this.
Hi, I recently went through an error where I created one nn.Linear layer but didn’t use it in forward.
It turns out that net.parameters() generates parameters for this layer also. So, when I tried to update using
for p in model.parameters(): p -= p.grad * 0.001 this gave me error
TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'. I resolved this by commenting the unused layer in init.
Since my loss was not depending upon unused layer gradients were none.
This is for pytorch_1.16_cpu