The loss function does not decrease. Is the SGD right?

Hello everyone,

I have controlled Stochastic process X_t and I am trying to use the method 1(P.14) developped in Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games: II – The Finite Horizon Case to minimize the loss associated to the control. The main idea is that I will consider the control to be a fonction of my process X_t and i will try to approximate this function with a neural network by applying the SGD to a loss function.
My first concern is that the article states that we shall assume that the activation function is 2pi-periodic but there does not seems to be a practical 2pi-periodic activation function. (I used the reLu function in my code).
Secondly, the loss function is approximatied by simulating N times the process Xt0,…,X_t_nt where t0,…,tnt is the time grid.
The problem is that I try to plot the loss computed at each iteration but it does not decrease. My guess is that the gradient computed during the .backward() is wrong but i do not understand why. Also, i have Problem understanding where should i put requires_grad = True. I have rewriten my code many times but i still can’t really understand what is wrong with my code.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from time import time
import numpy as np
from math import *
import matplotlib.pyplot as plt
from progress.bar import Bar


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#nn with 2 Hidden layers, relu activation function
class nets(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(2, 50)
        self.layer2 = nn.Linear(50, 50)
        self.layer3 = nn.Linear(50, 1)

    def forward(self, x):
        #on utilise la fonction relu pour les deux fonctions d'activations
        y = self.layer1(x)
        y = torch.relu(y)
        y = self.layer2(y)
        y = torch.relu(y)
        y = self.layer3(y)
        return y

#this function returns a Tensor where each row correspond to a simulation 
#and each colons to a time step
def env_simple(T, N, Nt):
    dt = T/Nt
    X = torch.zeros((N,Nt+1), device = device)
    A = torch.zeros((N,Nt+1), device = device)
    for i in range(N):
        for j in range(Nt+1):
            A[i,j] = rdn.forward( torch.tensor([ 0,X[i,j] ],dtype=torch.float) )
            if (j == 0):
                X[i,j] = rdn.forward( torch.tensor([ 0,X[i,j] ],dtype=torch.float) )
            else:
                dW = torch.normal(mean = 0.0, std = sqrt(dt), size = (1,1))
#dXt = alpha_t*d_t + dW_t
                #X_t{i+1} = X_ti + phi_theta( ti, X_ti)*deltaT + deltaW
                X[i,j] = X[i,j-1] + A[i,j-1]*(dt)+dW
                A[i,j] = rdn.forward( torch.tensor([ dt*j,X[i,j] ],dtype=torch.float) )
    return X, A

#f(X_t, alpha_t) = X_t**2 + alpha_t**2 + E[alpha_t]**2
def loss_fn(X, A, N):
    return (X.pow(2).sum()/n)+(A.pow(2).sum()/n) + (A.pow(2).sum()/n**2)


rdn = nets()
rdn.to(device)
optimizer = optim.SGD(params = rdn.parameters(), lr = 0.001, momentum = 0.9)

#rdn.requires_grad_(True)
m, n, nt = 10000, 1000, 10
T = 1
dt = T/nt
bar = Bar('Processing', max=m)
cout = []
for pp in range(m) :
    x, a = env_simple(T, n, nt)
    loss = loss_fn(x, a, n)
    cout.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    bar.next()
bar.finish()

plt.plot(cout)
plt.show()

Thank you for the taking the time to help me!

Edit: since I am trying to get the grad wrt to rdn.parameters() I changed the line:

loss.backward() 

by:

loss.backward(rdn.parameters()) 

but i got this error instead:

  File "<tmp 6>", line 75, in <module>
    loss.backward(rdn.parameters())
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/autograd/__init__.py", line 143, in backward
    grad_tensors_ = _make_grads(tensors, grad_tensors_)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/autograd/__init__.py", line 33, in _make_grads
    raise RuntimeError("Mismatch in shape: grad_output["
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([50, 2]) and output[0] has a shape of torch.Size([]).