Hello everyone,
I have a model in which I use a loss function for the first epoch, and a second loss the second epoch onwards. In pseudocode:
for batch in batches:
model.zero_grad()
output = model(batch)
if epoch == 0:
loss_function = loss1
else:
loss_function = loss2
loss = loss_function(output)
loss.backward()
optimiser.step()
Is there any problem? Should I change something to connect both losses? I thought that because of the backward computes the gradients for all the tensors with requires_grad=True
and the zero_grad()
, no connection is needed. Am I right?
The reason is that I have a model to classify sequences of characters in a speech recognition problem. Therefore, I am using the CTC loss. However, I fall in a local minimum around loss=1.50 (all blanks). I wanted to use another loss during the first n epochs (e.g. cross entropy) to reach a loss value < 1.50 to change then to the CTC and go on with the training.
Unfortunately, when I do this, I reach a value of 1.38 with the first loss (cross entropy) and then, when I change to CTC the loss, it drastically increases to 8. Therefore, I fall in the same minimum afterwards. I understood that I was doing something wrong.
Thank you very much!