Factorization of a weight matrix as products of low-rank matrices

Hi. I have a weight matrix (call it Q). I’d like to factorize it as product of low-rank matrices to reduce model parameters. So i’d like to have q(left)q(right) = Q, where q(left) has a shape of (mxk) and q(right) has a shape of (kxn), where k <min(m,n). Is there any function in Pytorch to do this, or can I implement it like this:

m = 512 
k = 85  # if set to 85: (512 * 85) + (85 * 512) << (512 * 512)
n = 512  

q_left = nn.Parameter(torch.FloatTensor(m, k).uniform_(-math.sqrt(1/n)/2, math.sqrt(1/n)/2))
q_right = nn.Parameter(torch.FloatTensor(k, n).uniform_(-math.sqrt(1/n)/2, math.sqrt(1/n)/2))
q = torch.matmul(q_left, q_right)

Hi Fawaz!

This approach will work, but it has the disadvantage that even though
you don’t train the full set of the m*n elements of q, you do create
and work with them. It’s simpler and cheaper to never explicitly create
q and, instead, apply q_right and q_left successively to the data
tensor flowing through your network.

Something like this:

module = torch.nn.Sequential (
    # linear layer 1
    # non-linear activation 1
    # ...
    torch.nn.Linear (n, k, bias = False),  # q_right, no bias between q_right and q_left
    # no activation between q_right and q_left
    torch.nn.Linear (k, m),  # q_left, bias if you want it
    # non-linear activation after "q layer"
    # ...
)

This – on the forward pass – is mathematically the same as having a
single linear layer for q:

    # ...
    torch.nn.Linear (n, m),  # the full q matrix
    # non-linear activation after "q layer"
    # ...

where you have somehow set q = q_left * q_right.

(As an aside, another technique people sometimes use is to first train
the full m*n q layer-- or take it from a pre-trained network – and then
factor it into q_left * q_right, and then further train or fine tune.)

Best.

K. Frank

1 Like

Thank you for your detailed reply @KFrank. I highly appreciate the answer you gave.