I am trying to solve an ode using pytorch. The ode has the form
du/dt = cos(2*3.14*t)
I parameterise my neural network as a two layer linear network
with tanh as an activation function in between. The layer takes in 1 dimensional input and returns 1 dimensional output with hidden layer size being 32.
def f(x):
"""
function that computes the output of the neural net
"""
l1 = torch.matmul(W1.T, x).reshape(-1, 1)
l1_act = leakyrelu(l1)
l2 = torch.matmul(W2.T, l1_act)
return l2
def g(t):
"""
the form of the solution chosen such that boundary condition
u(0) = 1. is satisfied
"""
return t*f(t) + torch.tensor([1.])
def loss(t, eps):
"""
the loss function which is simply the loss
on the gradient of the net being equal to the analytical gradient
value
"""
return torch.mean(((g(t+eps) - g(t)) /(eps) - torch.cos(torch.tensor([2*3.14*t])))**2)
These are the model definition
x = torch.tensor([.01])
eps = torch.tensor([0.000345])
T0 = 0
T1 = 1
nsamples = 100
t = torch.linspace(T0, T1, nsamples)
W1 = Variable(torch.ones(1, 32), requires_grad=True)
W2 = Variable(torch.ones(32, 1), requires_grad = True)
I generate a 100 datapoints between 0 and 1 and train the network on these datapoints for 5000 epochs. My training loop looks something like this
learning_rate = 1e-3
for it in tqdm(range(5000)):
err = 0
for ti in t:
ti = torch.tensor([ti])
err += loss(ti, eps)
err = err / nsamples
err.backward()
W1.data -= learning_rate * W1.grad.data
W2.data -= learning_rate * W2.grad.data
if it%100==0:
print(err.item())
grad_w1 = W1.grad.data.detach().numpy().flatten()
grad_w2 = W2.grad.data.detach().numpy().flatten()
fig, a= plt.subplots(1, 2, figsize =(6, 3))
a[0].plot(grad_w1)
a[1].plot(grad_w2)
plt.show()
W1.grad.data.zero_()
W2.grad.data.zero_()
What I notice is that the gradient updates go to 0 and the network doesnt manage to learn the values. I had been following along in an example in Julia and there the code seems to work. I was wondering if there is something wrong in my specification. If someone can point me in the right direction it would be great. This is my second attempt to solve. I tried using the nn.Module along with nn.Linear layer along with Adam optimizer and there too the error seems to be the same