Are these two neural network structures equivalent?

vainaijr · August 26, 2019, 2:34pm

class ModelOne(nn.Module):
  def __init__(self):
    super().__init__()
    self.weights = nn.Parameter(torch.randn(300, 10))
    self.bias = nn.Parameter(torch.zeros(10))
  def forward(self, x):
    return x @ self.weights + self.bias

class ModelTwo(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(300, 10)
    
  def forward(self, x):
    return self.linear(x)

if so, then why does

mo = ModelOne()
[len(param) for param in mo.parameters()]

give
[300, 10]

while

mt = ModelTwo()
[len(param) for param in mt.parameters()]

give
[10, 10]

vainaijr · August 26, 2019, 2:45pm

it turns out that

[param.size() for param in mo.parameters()]

gives
[torch.Size([300, 10]), torch.Size([10])]

while

[param.size() for param in  mt.parameters()]

gives
[torch.Size([10, 300]), torch.Size([10])]

transpose when using nn.Linear

spanev · August 26, 2019, 4:00pm

nn.Linear is transposing the weights before multiplication so the two networks are equivalent.
To match the params shapes and the behaviour of nn.Linear you have to do:

class ModelOne(nn.Module):
  def __init__(self):
    super().__init__()
    self.weights = nn.Parameter(torch.randn(10, 300))
    self.bias = nn.Parameter(torch.zeros(10))
  def forward(self, x):
    return x @ self.weights.t() + self.bias