Yes, I have finally made it work thanks to God.
The problem was in using two optimizers. I don’t know why though. However, the reason behind using them is to update two sets of parameters separately. And this can be done using just one optimizer. Here is what I did:
First, I created two loss functions:
criterion = Criterion()
decoder_criterion = AnotherCriterion()
Then, I created one optmizer that monitors the whole model parameters:
opt = Optim(model.parameters())
Then, you can get the two losses separately like so:
model_loss = criterion(model_output, model_target)
decoder_loss = decoder_criterion(decoder_output, decoder_target)
Finally, you can perform the backward propagation over the two-loss functions at the same time like so:
loss = model_loss + decoder_loss
loss.backward()
opt.step()
This is how I fixed my problem. The following is a few more details in case you were interested:
-
loss.backward()
: all it does is to calculate the gradient of all parameters that were used in theloss
'sforward()
method that hasrequire_grad=True
. Given a parameterx
, this method saves its gradient wrt theloss
inside thex.grad
variable. -
opt.step()
: all it does is to update these parameters using the gradient. - When you use
(loss1 + loss2).backward()
: this performs the backward() overloss1
andloss2
separately and accumulate the gradient. You don’t have to worry.
This is what I learned so far. Please, don’t hesitate to correct me if I’m wrong.