My custom model can be simplified and represented with:
These are tensor operations between some constant tensor -in grey- and a vector of “torch.nn.Parameter(…)”, which contains the terms I want to train by gradient descent.
I created a “nn.Module” child class that implements these tensor operations in the forward method, while the constructor takes care of initialising the constant tensors and the vector of parameters.
How can I perform/implement (mini-)batch gradient descent using the model above, considering that the forward method carries out the computations for only one input at the time?
Thank you in advance for helping!
The simplest think to do would be to make your nn.Module support batch dimension (using bmm and matmul) to replace your current version.
If you are using nightly builds, you might be able to get that automatically by using the new vmap Module. @richard will be able to help you with that if you want to go down that path
Hey alban, thank you for replying!
I may have solved the problem, torch seems already able to process mini-batches for this kind of computation without any further effort!
The only thing is to make sure you have in input a tensor of size [mini_batch, variable_to_optimise.shape].
I forgot to post the (very simple) class I was using to test how the batch computation worked, in case anyone wants to try it here it is:
self.params = torch.nn.Parameter(torch.nn.init.normal_(torch.empty(2, 1).float(), mean=0.0, std=0.1))
self.A = torch.tensor([1, 0], requires_grad=False, dtype=torch.float32).view(1, 2)
self.B = torch.tensor([0, 1], requires_grad=False, dtype=torch.float32).view(1, 2)
def forward(self, obs):
y = (self.A @ self.params) * obs + self.B @ self.params
It implements the line equation: y = a*x + b, where I want to train a and b given input x and y_ground_truth.
criterion = torch.nn.MSELoss(reduction='mean')
Hope it hepls!