Running into None gradients when optimizing function

Trying to optimize a custom defined function “copt2” in pytorch using standard optimizers. I got an error when trying to use Rprop so I switched to Adam. Adam optimizer will run but the objective is not decreasing and after checking the gradient is not being defined it seems. What is the cause of this? Is it the max function in the definition of copt2, which used to shift the input of the exp function so it doesnt blow up?

And if so, can I write the max function outside the copt2 definition and optimize properly?

x0=torch.randn((18,6),requires_grad=True)
epsilon=1e-3 
def copt2(x):
   N = x.size()[0]
   dimen = int(x.size()[1]/2)
   norm = LA.vector_norm(x, dim=1, keepdim=True)
   x = torch.tensor(x/norm, requires_grad=True)
   x1=x[...,:dimen]
   x2=x[...,dimen:]
   xxt1=torch.matmul(x1,x1.T)
   xxt2=torch.matmul(x2,x2.T)
   xxt3=torch.matmul(x2,x1.T)
   xxt4=torch.matmul(x1,x2.T)
   xxt=(xxt1+xxt2)**2+(xxt3-xxt4)**2
   xxt=torch.triu(xxt,1)
   s=torch.max(xxt)
   expxxt=torch.exp((xxt-s)/epsilon)
   u=torch.triu(expxxt,1).sum()
   f=s+epsilon*torch.log(u)
   return f
y=copt2(x0)
# initialize
print(y)
en_opt = [y]
torch.autograd.set_detect_anomaly(True)
optimizer = torch.optim.Adam([x0], lr=0.0001)
for i in range(20):
    optimizer.zero_grad()
    y = copt2(x0)
    y.backward(retain_graph=True)
    optimizer.step()
    en_opt.append(y.item())

    if (i + 1) % 20 == 0:
        print(x0.grad)
        print(i + 1, y)

Output:

tensor(0.873624663431270431779296, grad_fn=<AddBackward0>)
None
20 tensor(0.873624663431270431779296, grad_fn=<AddBackward0>)

Why do you do x = torch.tensor(x/norm, requires_grad=True) ? This will create a brand new tensor that is unrelated to the given input.
That would be the part that breaks the gradient propagation. You should just do x = x / norm.

@albanD Yeah, I just figured out that was the problem. Was about to delete question when I saw your answer. Thanks.