2 criteria result in: one of the variables needed for gradient computation has been modified by an inplace operation

I want to use multi task learning with ctc and forced teaching.

    log_probs_ctc, log_probs_decoder = model(inputs.float(), targets_one_hot, device)
    output_ctc_ = Variable(log_probs_ctc.transpose(1, 2).transpose(0, 1), requires_grad=True)
    loss = ctc_criterion(output_ctc_, targets_ctc, input_lengths.type(torch.int32), target_lengths)
    log_probs_decoder_ = Variable(log_probs_decoder.transpose(1, 2), requires_grad=True)
    loss += F.cross_entropy(log_probs_decoder_.contiguous().
                            view((log_probs_decoder_.shape[1] * log_probs_decoder_.shape[0], 
                            log_probs_decoder_.shape[2])),
                           targets.type(torch.long).view(-1),
                           ignore_index=model.num_classes-1)

When I comment out either of the criteria everything works. when I put it together It fails with the above comment. To simplify things I left only the ctc part of the network and replace the log_probs_decoder with torch.zeros_like(log_probs_ctc). still doesn’t work. I assume that the issue has to do with the two criterion competing somehow.

Thanks in advance for any help.

I figured it out, but for everyone else I’ll leave this up here - the two criteria i used had different reduction methods. when I changed them both to mean everything was fine.