Custom Loss function, loss is not improving

Hi

I have a 14 input and 2 output(from 4096 label), I have used classification model and using the CEL() which gave me a good loss. With this in mind, I wanted to customize the loss function, In the new loss function I want to put another varaibles which will affect the loss. But when I run the model the loss is vary and it is not improving at all. I guess there is a problem in backward but I couldn’t solve it. Would you please support.

batch_size = 200
epoch_size = 20
n_input = 14
n_hidden = 1024
n_output1 = 4096
n_output2 = 4096
p_dropout = 0
Learning_rate = 0.01
my_dataloader = data.DataLoader(my_data, batch_size=batch_size, shuffle=False, num_workers=0)
epoches = np.arange(epoch_size)

# model {8 FC layer}
model = my_model()
model.cpu()
optimizer = torch.optim.Adam(model.parameters(),lr=Learning_rate,weight_decay=1e-4)
for i in epoches:
    for k, (channelR, channelI, alpha, data, UR, UI, BR, BI, XR, XI, target1, target2) in enumerate(my_dataloader):


        channel = channelR.numpy() + 1j * channelI.numpy()
        B = BR.numpy() + 1j * BI.numpy()
        X = WR.numpy() + 1j * XI.numpy()

        data = Variable(data, requires_grad=False)
        target1 = Variable(target1.long(), requires_grad=False)
        target2 = Variable(target2.long(), requires_grad=False)



        # Set gradient to 0.
        optimizer.zero_grad()

        # Feed forward.
        output1, output2 = model(data)

        # Get prediction from the maximum value
        _, predicted1 = torch.max(output1, 1)
        _, predicted2 = torch.max(output2, 1)
        R_Opt = torch.zeros(1)
        R_Pre = torch.zeros(1)



        for i in range(batch_size):
            n_opt1 = np.power(np.absolute(np.matmul(np.matmul(np.conj(channel[i, 0:6]),
                                             np.array([codeword[target1[i].numpy()[0]],
                                                       codeword[target2[i].numpy()[0]]]).transpose()), X[i, 0:2])),2)


            d_opt1 = np.power(np.absolute(np.matmul(np.matmul(np.conj(channel[i, 0:6]),
                                             np.array([codeword[target1[i].numpy()[0]],
                                                       codeword[target2[i].numpy()[0]]]).transpose()), X[i, 2:4])),2)

            n_opt2 = np.power(np.absolute(np.matmul(np.matmul(np.conj(channel[i, 6:12]),
                                             np.array([codeword[target1[i].numpy()[0]],
                                                       codeword[target2[i].numpy()[0]]]).transpose()), X[i, 2:4])),2)

            d_opt2 = np.power(np.absolute(np.matmul(np.matmul(np.conj(channel[i, 6:12]),
                                             np.array([codeword[target1[i].numpy()[0]],
                                                       codeword[target2[i].numpy()[0]]]).transpose()), X[i, 0:2])),2)



            R_opt = torch.as_tensor((n_opt1 / d_opt1) + (n_opt2 / d_opt2))
            R_opt = Variable(R_opt.data, requires_grad=True)
            R_Opt = R_Opt + R_opt

            n_pred1 = np.power(np.absolute(np.matmul(np.matmul(np.conj(channel[i, 0:6]),
                                                              np.array([codeword[predicted1[[i]].numpy()[0]],
                                                                        codeword[predicted2[[i]].numpy()[0]]]).transpose()),
                                                    X[i, 0:2])), 2)

            d_pred1 = np.power(np.absolute(np.matmul(np.matmul(np.conj(channel[i, 0:6]),
                                                              np.array([codeword[predicted1[[i]].numpy()[0]],
                                                                        codeword[predicted2[[i]].numpy()[0]]]).transpose()),
                                                    X[i, 2:4])), 2)

            n_pred2 = np.power(np.absolute(np.matmul(np.matmul(np.conj(channel[i, 6:12]),
                                                              np.array([codeword[predicted1[[i]].numpy()[0]],
                                                                        codeword[predicted2[[i]].numpy()[0]]]).transpose()),
                                                    X[i, 2:4])), 2)

            d_pred2 = np.power(np.absolute(np.matmul(np.matmul(np.conj(channel[i, 6:12]),
                                                              np.array([codeword[predicted1[[i]].numpy()[0]],
                                                                        codeword[predicted2[[i]].numpy()[0]]]).transpose()),
                                                    X[i, 0:2])), 2)

            R_pre = torch.as_tensor((n_pred1 / d_pred1) + (n_pred2 / d_pred2))
            R_pre = Variable(R_pre.data, requires_grad=True)
            R_Pre = R_Pre + R_pre



        R_OPT = R_Opt / batch_size
        R_PRED = R_Pre / batch_size
        loss = R_OPT - R_PRED

        # Gradient calculation.
        
        loss.backward()

        # Model weight modification based on the optimizer.
        optimizer.step()

        # Print loss every 10 iterations.

        if k % 100 == 0:
            print('Loss {:.4f} at iter {:d}'.format(loss.item(), k))

There might be a few issues with your loss calculation:

  • you are detaching the outputs from the computation graph by assigning them to the argmax. The softmax function might be an alternative.
    To quote @KFrank:
  • If you are using non-PyTorch methods, e.g. numpy methods, you will also break the computation graph and would need to write the backward function manually. However, if you can swap all numpy methods for their PyTorch counterparts, Autograd should create the backward automatically.

  • Rewrapping a tensor will also break the graph: R_opt = Variable(R_opt.data, requires_grad=True). Also the usage of the .data attribute is not recommended, as Autograd cannot track these changes, which might result in a wrong gradient calculation. Variables are deprecated since PyTorch 0.4., so you can use tensors in newer versions.

1 Like

Thank you so much. I will try. Many thanks ptrblck.