For two models, how many optimizers do I need?

newtopytorch · April 23, 2021, 1:30am

I currently have the following data:

f_map, inputs, s_bias = ml_dataset.dataset_for_s_bias()

where f_map is a tensor of matrices, inputs is a tensor of floats, and s_bias is a tensor of floats. The first two tensors, f_map and inputs, are the inputs to my ML regression algorithm, and s_bias is the expected output. The reason there are two kinds of inputs is because f_map is processed using a CNN, and the CNN turns each matrix in f_map into a float, which is concatenated with the inputs tensor, and the resulting tensor is inputted into an MLP to get a prediction for s_bias.

With this in mind, my model looks like this:

def load_model(lr, n_filters, filter_sizes, spp_dim, input_size, hidden_size, output_size):
cnn_model = CNN(n_filters, filter_sizes, spp_dim)
mlp_model = MultiLayerPerceptron(input_size, hidden_size, output_size)
loss_fnc = torch.nn.MSELoss()
optimizer = torch.optim.SGD(mlp_model.parameters(), lr=lr)
return cnn_model, mlp_model, optimizer, loss_fnc

I didn’t know which model to use for my optimizer, so I just put the MLP model. However, I’m not sure if this is correct. Do I need a second optimizer for my CNN model? Or does the optimizer only need the final prediction, which comes out of the MLP model?

ptrblck · April 24, 2021, 12:50am

The optimizer would need the references to all parameters it should update. In the usual case you would either create two separate optimizers (one for each model) or pass the parameters of both models to a single optimizer.
In your current setup you would also train mlp_model and cnn_model would be randomly initialized and not trained.