Higher order gradient

I try to implement network with Higher order gradient.

for epoch in range(100):
    for i, (train_x,train_y) in enumerate(train_loader):

        train_x = train_x.to(device)
        train_y = train_y.to(device)

        mypredict = my_model(train_x)
        loss = myloss(mypredict,train_y)
        grad_norm = 0
        grad_params = torch.autograd.grad(loss, my_model.parameters(), create_graph=True)
        for grad in grad_params:
            grad_norm += grad.pow(2).sum()

        grad_norm = grad_norm.sqrt()

        # take the gradients wrt grad_norm. backward() will accumulate
        # the gradients into the .grad attributes

        # do an optimization step

    print("Loss in Epoch {}/100 = {}".format(epoch+1,loss.item()))

but when I plot loss
Loss value is fluctuate

What should I do? Thank you :slight_smile:

Can you let us know why are you doing this?
You are essentially doing a rms of all the gradients for all the parameters.
I feel the gradient value would be too large in your case for every parameter update. Which might be causing the problem. Maybe if you could clip the gradients it might solve the problem.