I am new to Pytorch and I am implementing a simple feedforward neural network and the loss function does not seem to decrease.

Because of some other tests I have done, the problem seems to be in the computations I do to compute *pred*, since if I slightly change the network so that it spits out a 2-dimensional vector for each entry and save it as pred, everything works perfectly.

Do you see the problem in defining pred here? Thanks

As far as I understand, what is happening is that XH forgets the dependency on X, and hence even params does so.

```
import torch
import numpy as np
from torch import nn
dt = 0.1
class Neural_Network(nn.Module):
def __init__(self, ):
super(Neural_Network, self).__init__()
self.l1 = nn.Linear(2,300)
self.nl = nn.Tanh()
self.l2 = nn.Linear(300,1)
def forward(self, X):
z = self.l1(X)
z = self.nl(z)
o = self.l2(z)
return o
N = 1000
X = torch.rand(N,2,requires_grad=True)
y = torch.rand(N,1)
NN = Neural_Network()
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(NN.parameters(), lr=1e-5)
epochs = 200
for i in range(epochs): # trains the NN 1,000 times
HH = torch.mean(NN(X))
gradH = torch.autograd.grad(HH, X)[0]
XH= torch.cat((gradH[:,1].unsqueeze(0),-gradH[:,0].unsqueeze(0)),dim=0).t()
pred = X + dt*XH
#Optimize and improve the weights
loss = criterion(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print (" Loss: ", loss.detach().numpy()) # mean sum squared loss
```

P.S. With these X and y the loss is not expected to go to zero, I have added them here like them just for simplicity. I will apply this architecture to data points which are expected to satisfy this model. However I am just interested in seeing the loss decreasing.