I want to reuse the weights in my linear layer but with the weights transposed. Unfortunately, it seems that using a functional linear layer and passing in the weights during the forward call of my module is inefficient. Is there another way to do it?
Specifically, what I’ve done so far looks like:
class Bla(nn.Module):
def __init__(self, args):
super(Bla, self).__init__()
self.fc = nn.Linear(100, 256)
def forward(x):
return F.linear(x, self.fc.weight) # F.linear appears slow
I tried einsum as well (since x is a batch of items of shape self.fc.weight.shape each) but didn’t get a performance improvement.
Note: I still want gradients to propagate through this operation! And I don’t want to use self.fc’s biases.