class ModelOne(nn.Module):
def __init__(self):
super().__init__()
self.weights = nn.Parameter(torch.randn(300, 10))
self.bias = nn.Parameter(torch.zeros(10))
def forward(self, x):
return x @ self.weights + self.bias
class ModelTwo(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(300, 10)
def forward(self, x):
return self.linear(x)
if so, then why does
mo = ModelOne()
[len(param) for param in mo.parameters()]
give
[300, 10]
while
mt = ModelTwo()
[len(param) for param in mt.parameters()]
give
[10, 10]
it turns out that
[param.size() for param in mo.parameters()]
gives
[torch.Size([300, 10]), torch.Size([10])]
while
[param.size() for param in mt.parameters()]
gives
[torch.Size([10, 300]), torch.Size([10])]
transpose when using nn.Linear
spanev
(Serge Panev)
August 26, 2019, 4:00pm
3
nn.Linear
is transposing the weights before multiplication so the two networks are equivalent.
To match the params shapes and the behaviour of nn.Linear you have to do:
class ModelOne(nn.Module):
def __init__(self):
super().__init__()
self.weights = nn.Parameter(torch.randn(10, 300))
self.bias = nn.Parameter(torch.zeros(10))
def forward(self, x):
return x @ self.weights.t() + self.bias