I have a neural net wherein I define multiple nn.Sequential modules in the __init__()
. However, I use only one of them in the forward method based on a condition. Still, for all these modules, the weights and biases are initialized and requires_grad
is also True
which is expected. I do not actually need the gradients of modules that I do not use. Example code:
class Net(nn.Module):
def __init__(self, head_type):
self.head_type = head_type
self.linear_head = nn.Linear(in_features=model_dim, out_features=classifier_dim)
self.lstm = nn.LSTM(model_dim, model_dim, batch_first=True)
self.multilinear_head = nn.Sequential(nn.Linear(in_features=model_dim, out_features=256),
nn.Dropout(0.2),
nn.Linear(in_features=256, out_features=classifier_dim),
nn.Dropout(0.2)
)
def forward(self, x):
if self.head_type == 'linear':
out = self.linear_head(x)
return x
Is this alright with respect to space the additional tensors are taking? I guess since forward pass is not peformed for the unused modules they would be taking minimal space. Any work-around for this would be helpful. Thank you