Hi, I am trying to use the C++ frontend to train two neural network together with one optimizer. But my loss function value is not changing throughout the training process. Can someone please help me by taking a look at the following code? I suspect the problem is the way I am giving the parameters to the optimizer. I have defined a NN class with ModuleList, that given a certain list of layer sizes creates a vanilla neural network with tanh activation (making sure to call register_module
on each linear module), and setting requires_grad
to be true, and given dtype and device for all parameters. My training function has the following contents.
std::vector<torch::Tensor> param_list, net1_params, net2_params;
net1_params = network1->parameters();
param_list = net1_params;
net2_params = network2->parameters();
param_list.insert(param_list.end(), net2_params.begin(), net2_params.end());
param_list.push_back(bias_param);
torch::optim::AdamOptions options(initial_lr);
torch::optim::Adam optimizer(param_list, options);
auto dummy_y = torch::tensor({4.});
auto x = torch::linspace(1, 100, 500).reshape({1, -1});
auto y = torch::tensor({{3.}}).reshape({-1, 1});
for (int epoch = 0; epoch < epochs; ++epoch) {
optimizer.step([&]() {
optimizer.zero_grad();
auto net1_output = network1->forward(x); // outputs a (1,50) vector
auto net2_output = network2->forward(y); // outputs a (1,50) vector
auto total_loss = torch::mse_loss(torch::matmul(net1_output, torch::transpose(net2_output, 1, 0)), dummy_y);
total_loss.backward();
if (epoch % 100 == 0) {
std::cout << "Loss for epoch " << epoch << ": " << total_loss.item() << "\n";
}
return total_loss;
});
The loss output I see with this is
Loss for epoch 0: 19.8229
Loss for epoch 100: 19.8229
Loss for epoch 200: 19.8229
Loss for epoch 300: 19.8229
Loss for epoch 400: 19.8229
Loss for epoch 500: 19.8229
Loss for epoch 600: 19.8229
Loss for epoch 700: 19.8229
Loss for epoch 800: 19.8229
Loss for epoch 900: 19.8229
Could someone please point me to how I can properly give an optimizer parameters from multiple neural networks in C++ PyTorch?