I have a model ‘m’ which is a torch.nn.ModuleList of 30 fully connected layers. Now I have to send this model ‘m’ in the forward() method of another model ‘K’ in every training iteration.
Pseudo code of above description:
nw_1 = nn.Linear(256, 256)
nw_2 = nn.Linear(256, 256)
nw_3 = nn.Linear(256, 256)
nw_30 = nn.Linear(256, 256)
self.nw_list = nn.ModuleList([nw_1, nw_2, ......, nw_30])
def forward(self, x):
out1 = nw_1(x)
out2 = nw_2(x)
out3 = nw_3(x)
out30 = nw_30(x)
return sum(out1, out2, ....,out30)
def forward(self, model):
mm = M()
kk = K()
Question: Is it wise to send the model as a function parameter? Are there any better ways to it?
Usually you would register
K, if it’s a submodule of
Your workflow should work, but you would have to make sure that both models are on the same device etc.
M as a submodule would e.g. push it to the same device as
K by using
Thanks @ptrblck for the response. What if i have only one device(i.e 1 gpu)? then there is no need to push device as by default both K and M will be on the same device
One more related question:
I have network M which predicts the parameters(weights and biases) for network K. So, here only network M’s parameters are learned whereas network K’s parameters are predicted from network M’s output. What is the efficient way to load the predicted parameters for network K(Note: network K parameters are not learnable)?
Currently, I do it like this. Below is the forward method of network K. Basically, I do a manual torch.matmul(). Is there any efficient way to just load the parameters and make prediction but not learn those parameters?
def forward(self, input, params=None):
bias = params.get('bias', None)
weight = params['weight']
output = input.matmul(weight.permute(*[i for i in range(len(weight.shape)-2)], -1, -2))
output += bias.unsqueeze(-2)
By default the models would by on the CPU, so you would still make sure to push them independently.
I think your approach of using the functional API for these fixed parameters looks good and I probably wouldn’t try to use modules and reload the parameter. In case you still want to do so, you could e.g. create a
state_dict and load it in each iteration or directly assign the new parameters to the old ones.
Another question. Let say I have two models(M and K) that are built with a superclass of nn.Module. And function M predicts the values of the parameters for the network K. Now I just want to use network M for training which predicts the parameter values for network K. In other words, I just need to send network M parameters for optimizer and take gradient wrt network M’s parameters. But when I instantiate both M and K classes before the training itself, I see grad=True for both M and K network parameters and also parameters of M and K initialized. For me it makes sense to see the grad=True and parameters values initialized for network M because it will be used for training and I am not understanding why the K.parameters() also has values and grad=True?
How can I use the network K only for inference using estimated parameter values from network M?