I am trying to understand autograd better and would like to implement the following example. Despite searching, I haven’t found much on that elsewhere, and no working example.
In addition to improving the mean least squares error, I would like to take into account the norm of the hessian of model. That is, I add regularization by the squares of the second derivatives of the model, where the second derivatives are taken in the input (not the parameters).
I am aware that doing so becomes resource-intensive for large networks, I consider this a learning experience for now.
Hopefully, someone can explain. A very simple sample should look like this (Hessian in the comments):
import math
import random
import torch
import torch.nn
# some functions that we might try to interpolate
def func( x1, x2 ):
return math.sin( 2. * math.pi * x1 * x2 )
# generate input and outputs
N = 500
x1s = [ random.uniform(-1,1) for i in range(N) ]
x2s = [ random.uniform(-1,1) for i in range(N) ]
random.shuffle(x1s)
random.shuffle(x2s)
xs = [ [x1s[i],x2s[i]] for i in range(N) ]
ys = [ func(*x) for x in xs ]
data_x = torch.tensor(xs).resize_(N,2)
data_y = torch.tensor(ys).resize_(N,1)
# model
model = torch.nn.Sequential(
torch.nn.Linear(2, 10 ),
torch.nn.ReLU(),
torch.nn.Linear(10, 11 ),
torch.nn.ReLU(),
torch.nn.Linear(11, 12 ),
torch.nn.ReLU(),
torch.nn.Linear(12, 1 )
)
# train the neural network
optimizer = torch.optim.Adam( model.parameters(), lr = 0.01 )
num_epochs = 100
for epoch in range(num_epochs):
# turn on training mode
model.train()
model.zero_grad()
# hand-written LSE
loss = torch.mean( ( model( data_x ) - data_y )**2 )
# What I want is similar to
# loss = torch.mean( ( model( data_x ) - data_y )**2 ) + sum_of_squares_of_entries(Hessian_in_x)
# Here, Hessian_in_x is the Hessian matrix of the model with second derivatives in the input
loss.backward()
optimizer.step()
# Print the loss at the end of each epoch
print( 'Epoch [{}/{}], Loss: {:.4f}'.format( epoch+1, num_epochs, loss.item() ) )