Hello everyone,

I have a model in which I use a loss function for the first epoch, and a second loss the second epoch onwards. In pseudocode:

```
for batch in batches:
model.zero_grad()
output = model(batch)
if epoch == 0:
loss_function = loss1
else:
loss_function = loss2
loss = loss_function(output)
loss.backward()
optimiser.step()
```

Is there any problem? Should I change something to connect both losses? I thought that because of the backward computes the gradients for all the tensors with `requires_grad=True`

and the `zero_grad()`

, no connection is needed. Am I right?

The reason is that I have a model to classify sequences of characters in a speech recognition problem. Therefore, I am using the CTC loss. However, I fall in a local minimum around loss=1.50 (all blanks). I wanted to use another loss during the first *n* epochs (e.g. cross entropy) to reach a loss value < 1.50 to change then to the CTC and go on with the training.

Unfortunately, when I do this, I reach a value of 1.38 with the first loss (cross entropy) and then, when I change to CTC the loss, it drastically increases to 8. Therefore, I fall in the same minimum afterwards. I understood that I was doing something wrong.

Thank you very much!