I’m working on a problem which involves learning taking a vector X = [1, c, x_1, ..., x_N] (currently N ~ 100, but much more in the future) and learning a specific quadratic form, Y'PY + b, where Y = [1, c, X_bar] and X_bar is the average of the x's.

The current way I’ve been doing this is by using a bilinear layer with a bias term, e.g. something like:

class vf_bilin_3(nn.Module):
def __init__(self):
super().__init__()
self.Pb = nn.Bilinear(3, 3, 1, bias=True) # Y -> Y'PY + b
def forward(self, x):
# forward pass
M = x.shape[0]
X_bar = torch.mean(x[:, 2:], dim=1, keepdims=True)
Y = torch.cat((x[:, 0:2].reshape([M, 2]), X_bar), dim=1) # [1, c, X_bar]
output = self.Pb(Y, Y)
return output

But I’m concerned that this won’t scale well when we have large numbers of observations per batch, or huge N, or on the GPU, etc.

I’d be very grateful if someone could point me in the right direction, or provide any optimization tips for this network.

output depends on X_bar but not on the individual values of
the components of X. So if you call .forward() repeatedly (for
example in your training loop) you will be needlessly computing X_bar multiple times.

Move the computation of X_bar out of .forward() – and thus
out of the loop – to avoid this unnecessary computation.

(In a similar vein, but less important in terms of computation, I
would also move the construction of Y out of .forward().)

Thanks so much @KFrank, that sounds like the right advice.

One question, though. I understand that the weight updating step should be different from the precomputing inputs step, but I thought forward was the model entry point. How can I separate the two in practice?

The idea would be to preprocess your data before you feed it to your
model.

Depending on the details of your use case, you could, for example,
read your original data off of disk (independent of pytorch), construct
your Y vectors, and then write this preprocessed data back to disk.
When you run your pytorch model, just used the preprocessed data
as your input data.

Note, you will only get (significant) savings if you are going to run
(a lot) more than one epoch in your training loop. (By “epoch” I
mean that you run each sample in your dataset through your model
once.) If you are only going to run any given data sample through
your model once (or just a few times), it doesn’t matter whether you
perform the X_bar computation ahead of time or in during training.
So in such a case it might be more convenient to leave the X_bar
computation in your .forward().