Loss value is kept constant in the training

I am new to Pytorch and I am implementing a simple feedforward neural network and the loss function does not seem to decrease.

Because of some other tests I have done, the problem seems to be in the computations I do to compute pred, since if I slightly change the network so that it spits out a 2-dimensional vector for each entry and save it as pred, everything works perfectly.

Do you see the problem in defining pred here? Thanks

As far as I understand, what is happening is that XH forgets the dependency on X, and hence even params does so.

import torch
import numpy as np
from torch import nn

dt = 0.1

class Neural_Network(nn.Module):
    def __init__(self, ):
        super(Neural_Network, self).__init__()
    
        self.l1 = nn.Linear(2,300)
        self.nl = nn.Tanh()
        self.l2 = nn.Linear(300,1)
    
    
    def forward(self, X):
        z = self.l1(X)
        z = self.nl(z)
        o = self.l2(z)
        return o



N = 1000
X = torch.rand(N,2,requires_grad=True)
y = torch.rand(N,1)
NN = Neural_Network()

criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(NN.parameters(), lr=1e-5)

epochs = 200

for i in range(epochs):  # trains the NN 1,000 times

    HH = torch.mean(NN(X))
    gradH =  torch.autograd.grad(HH, X)[0]
    XH= torch.cat((gradH[:,1].unsqueeze(0),-gradH[:,0].unsqueeze(0)),dim=0).t()
    pred = X + dt*XH

    #Optimize and improve the weights
    loss = criterion(pred, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


    print (" Loss: ", loss.detach().numpy())  # mean sum squared loss

P.S. With these X and y the loss is not expected to go to zero, I have added them here like them just for simplicity. I will apply this architecture to data points which are expected to satisfy this model. However I am just interested in seeing the loss decreasing.