# Factorization of a weight matrix as products of low-rank matrices

Hi. I have a weight matrix (call it `Q`). I’d like to factorize it as product of low-rank matrices to reduce model parameters. So i’d like to have `q(left)q(right) = Q`, where `q(left)` has a shape of `(mxk)` and `q(right)` has a shape of `(kxn)`, where `k <min(m,n)`. Is there any function in Pytorch to do this, or can I implement it like this:

``````m = 512
k = 85  # if set to 85: (512 * 85) + (85 * 512) << (512 * 512)
n = 512

q_left = nn.Parameter(torch.FloatTensor(m, k).uniform_(-math.sqrt(1/n)/2, math.sqrt(1/n)/2))
q_right = nn.Parameter(torch.FloatTensor(k, n).uniform_(-math.sqrt(1/n)/2, math.sqrt(1/n)/2))
q = torch.matmul(q_left, q_right)
``````

Hi Fawaz!

This approach will work, but it has the disadvantage that even though
you don’t train the full set of the `m*n` elements of `q`, you do create
and work with them. It’s simpler and cheaper to never explicitly create
`q` and, instead, apply `q_right` and `q_left` successively to the data

Something like this:

``````module = torch.nn.Sequential (
# linear layer 1
# non-linear activation 1
# ...
torch.nn.Linear (n, k, bias = False),  # q_right, no bias between q_right and q_left
# no activation between q_right and q_left
torch.nn.Linear (k, m),  # q_left, bias if you want it
# non-linear activation after "q layer"
# ...
)
``````

This – on the forward pass – is mathematically the same as having a
single linear layer for `q`:

``````    # ...
torch.nn.Linear (n, m),  # the full q matrix
# non-linear activation after "q layer"
# ...
``````

where you have somehow set `q = q_left * q_right`.

(As an aside, another technique people sometimes use is to first train
the full `m*n` `q` layer-- or take it from a pre-trained network – and then
factor it into `q_left * q_right`, and then further train or fine tune.)

Best.

K. Frank

1 Like