I have a model that takes as input a tensor of shape (c,n,n)
and transforms it into a tensor of shape (c,m,m)
by multiplying on both sides with the parameter filters
that has shape (m,n)
, where m<<n
.
For the loss, the output tensor of shape (c,m,m)
is used to compute a pairwise distance matrix of size c
-by-c
(in this problem, c
is the number of classes, each having an associated m
-by-m
matrix).
I am using the L-BFGS algorithm to optimize this, and it works great for most problems. However, in one problem where c
has a large value (1700), I’m running into memory problems. What I am confused about is that, according to the L-BFGS documentation, memory usage should be determined by the size of the parameter. However, here reducing the size of the parameter filters
by reducing n
or m
doesn’t prevent the memory crash, only reducing the number of classes c
does. And the number of parameters (that is, m
and n
) is comparable to other problems where the model was working, that had fewer classes c
.
Is there some reason why this might happen? Is there a way to try to find a work-around, since it doesn’t seem to be the number of parameters what’s causing the issue?
The code is in this repo GitHub - dherrera1911/blur-spectral-sqfa, the model + data seem a bit complicated to provide a minimalistic example, but I can try if needed.