Applying linear layer but with weights transposed?

Sam_Lerman · April 30, 2020, 3:15pm

I want to reuse the weights in my linear layer but with the weights transposed. Unfortunately, it seems that using a functional linear layer and passing in the weights during the forward call of my module is inefficient. Is there another way to do it?

Specifically, what I’ve done so far looks like:

class Bla(nn.Module):
    def __init__(self, args):
        super(Bla, self).__init__()
        self.fc = nn.Linear(100, 256) 

    def forward(x):
        return F.linear(x, self.fc.weight)  #  F.linear appears slow

I tried einsum as well (since x is a batch of items of shape self.fc.weight.shape each) but didn’t get a performance improvement.

Note: I still want gradients to propagate through this operation! And I don’t want to use self.fc’s biases.

ptrblck · April 30, 2020, 3:20pm

The nn.Linear module calls into F.linear in its forward pass as seen here.
How did you profile it to come to the conclusion the functional call is slower than the modules?
Usually all modules call into their functional equivalent.

Sam_Lerman · April 30, 2020, 3:23pm

You’re right. I was too quick to post this. The slowdown was coming from something else. Thanks!