 # The loss function does not decrease. Is the SGD right?

Hello everyone,

I have controlled Stochastic process `X_t` and I am trying to use the method 1(P.14) developped in Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games: II – The Finite Horizon Case to minimize the loss associated to the control. The main idea is that I will consider the control to be a fonction of my process `X_t` and i will try to approximate this function with a neural network by applying the SGD to a loss function.
My first concern is that the article states that we shall assume that the activation function is 2pi-periodic but there does not seems to be a practical 2pi-periodic activation function. (I used the reLu function in my code).
Secondly, the loss function is approximatied by simulating N times the process Xt0,…,X_t_nt where t0,…,tnt is the time grid.
The problem is that I try to plot the loss computed at each iteration but it does not decrease. My guess is that the gradient computed during the `.backward()` is wrong but i do not understand why. Also, i have Problem understanding where should i put `requires_grad = True`. I have rewriten my code many times but i still can’t really understand what is wrong with my code.

``````import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from time import time
import numpy as np
from math import *
import matplotlib.pyplot as plt
from progress.bar import Bar

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#nn with 2 Hidden layers, relu activation function
class nets(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Linear(2, 50)
self.layer2 = nn.Linear(50, 50)
self.layer3 = nn.Linear(50, 1)

def forward(self, x):
#on utilise la fonction relu pour les deux fonctions d'activations
y = self.layer1(x)
y = torch.relu(y)
y = self.layer2(y)
y = torch.relu(y)
y = self.layer3(y)
return y

#this function returns a Tensor where each row correspond to a simulation
#and each colons to a time step
def env_simple(T, N, Nt):
dt = T/Nt
X = torch.zeros((N,Nt+1), device = device)
A = torch.zeros((N,Nt+1), device = device)
for i in range(N):
for j in range(Nt+1):
A[i,j] = rdn.forward( torch.tensor([ 0,X[i,j] ],dtype=torch.float) )
if (j == 0):
X[i,j] = rdn.forward( torch.tensor([ 0,X[i,j] ],dtype=torch.float) )
else:
dW = torch.normal(mean = 0.0, std = sqrt(dt), size = (1,1))
#dXt = alpha_t*d_t + dW_t
#X_t{i+1} = X_ti + phi_theta( ti, X_ti)*deltaT + deltaW
X[i,j] = X[i,j-1] + A[i,j-1]*(dt)+dW
A[i,j] = rdn.forward( torch.tensor([ dt*j,X[i,j] ],dtype=torch.float) )
return X, A

#f(X_t, alpha_t) = X_t**2 + alpha_t**2 + E[alpha_t]**2
def loss_fn(X, A, N):
return (X.pow(2).sum()/n)+(A.pow(2).sum()/n) + (A.pow(2).sum()/n**2)

rdn = nets()
rdn.to(device)
optimizer = optim.SGD(params = rdn.parameters(), lr = 0.001, momentum = 0.9)

m, n, nt = 10000, 1000, 10
T = 1
dt = T/nt
bar = Bar('Processing', max=m)
cout = []
for pp in range(m) :
x, a = env_simple(T, n, nt)
loss = loss_fn(x, a, n)
cout.append(loss.item())
loss.backward()
optimizer.step()
bar.next()
bar.finish()

plt.plot(cout)
plt.show()
``````

Thank you for the taking the time to help me!

Edit: since I am trying to get the grad wrt to `rdn.parameters()` I changed the line:

``````loss.backward()
``````

by:

``````loss.backward(rdn.parameters())
``````

but i got this error instead:

``````  File "<tmp 6>", line 75, in <module>
loss.backward(rdn.parameters())
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward