Hi. I have a weight matrix (call it Q). I’d like to factorize it as product of low-rank matrices to reduce model parameters. So i’d like to have q(left)q(right) = Q, where q(left) has a shape of (mxk) and q(right) has a shape of (kxn), where k <min(m,n). Is there any function in Pytorch to do this, or can I implement it like this:
m = 512
k = 85 # if set to 85: (512 * 85) + (85 * 512) << (512 * 512)
n = 512
q_left = nn.Parameter(torch.FloatTensor(m, k).uniform_(-math.sqrt(1/n)/2, math.sqrt(1/n)/2))
q_right = nn.Parameter(torch.FloatTensor(k, n).uniform_(-math.sqrt(1/n)/2, math.sqrt(1/n)/2))
q = torch.matmul(q_left, q_right)
This approach will work, but it has the disadvantage that even though
you don’t train the full set of the m*n elements of q, you do create
and work with them. It’s simpler and cheaper to never explicitly create q and, instead, apply q_right and q_left successively to the data
tensor flowing through your network.
Something like this:
module = torch.nn.Sequential (
# linear layer 1
# non-linear activation 1
# ...
torch.nn.Linear (n, k, bias = False), # q_right, no bias between q_right and q_left
# no activation between q_right and q_left
torch.nn.Linear (k, m), # q_left, bias if you want it
# non-linear activation after "q layer"
# ...
)
This – on the forward pass – is mathematically the same as having a
single linear layer for q:
# ...
torch.nn.Linear (n, m), # the full q matrix
# non-linear activation after "q layer"
# ...
where you have somehow set q = q_left * q_right.
(As an aside, another technique people sometimes use is to first train
the full m*nq layer-- or take it from a pre-trained network – and then
factor it into q_left * q_right, and then further train or fine tune.)