Batch gradient descent with custom model (tensor operations)

Luigi · August 10, 2020, 2:01pm

Hi everyone!

Model description:
My custom model can be simplified and represented with:

These are tensor operations between some constant tensor -in grey- and a vector of “torch.nn.Parameter(…)”, which contains the terms I want to train by gradient descent.

I created a “nn.Module” child class that implements these tensor operations in the forward method, while the constructor takes care of initialising the constant tensors and the vector of parameters.

Issue:
How can I perform/implement (mini-)batch gradient descent using the model above, considering that the forward method carries out the computations for only one input at the time?

Thank you in advance for helping!

albanD · August 10, 2020, 7:48pm

Hi,

The simplest think to do would be to make your nn.Module support batch dimension (using bmm and matmul) to replace your current version.

If you are using nightly builds, you might be able to get that automatically by using the new vmap Module. @richard will be able to help you with that if you want to go down that path

Luigi · August 11, 2020, 9:06am

Hey alban, thank you for replying!

I may have solved the problem, torch seems already able to process mini-batches for this kind of computation without any further effort!
The only thing is to make sure you have in input a tensor of size [mini_batch, variable_to_optimise.shape].

Luigi · August 12, 2020, 11:34am

I forgot to post the (very simple) class I was using to test how the batch computation worked, in case anyone wants to try it here it is:

class linearRegression(nn.Module):

    def __init__(self):
        nn.Module.__init__(self)
        self.params = torch.nn.Parameter(torch.nn.init.normal_(torch.empty(2, 1).float(), mean=0.0, std=0.1))

        self.A = torch.tensor([1, 0], requires_grad=False, dtype=torch.float32).view(1, 2)
        self.B = torch.tensor([0, 1], requires_grad=False, dtype=torch.float32).view(1, 2)

    def forward(self, obs):
        y = (self.A @ self.params) * obs + self.B @ self.params

        return y

It implements the line equation: y = a*x + b, where I want to train a and b given input x and y_ground_truth.

criterion = torch.nn.MSELoss(reduction='mean')

Hope it hepls!