Tried to do real simple nn.Linear model from scratch forward() is not giving correct answer

GGPYTORCH000 · April 15, 2024, 9:49pm

i wrote small class Linear which is similar to nn.Linear interms of initialization and forwarding. It worked however when I switched my code from nn.Linear to my own Linear, forward compute is not giving correct result.
I thought fwd of linear is just simple matmul between X and Weight + bias
def forward (X:torch.tensor):
out = torch.matmul(self.weight, X)
#out = torch.matmul(X, self.weight)
out += self.bias
return out

To give more intext when following X and W values are supplied to Linear (my own) class result is
tensor([[0.3684, 1.2859]]), which I checkeked manually

X=torch.tensor([[-0.9708, 0.9610]])
W=torch.tensor([[ 0.6627, -0.4245],
… [ 0.5373, 0.2294]])
b=torch.tensor([0.4954, 0.6533] )
torch.matmul(X, W) + b
tensor([[0.3684, 1.2859]])

But with nn.linear class, it is returning
tensor([[-0.3565, -0.2904]],

Now if the W is 0, then results of both are correct meaning, bias part is working (wondeful) it is just matmul part that is something wrong.

ptrblck · April 16, 2024, 12:12am

Your matmul is wrong as you need to pass the transposed weight to it:

batch_size = 2
in_features = 3
out_features = 4

x = torch.randn(batch_size, in_features)
lin = nn.Linear(in_features, out_features)
print(lin.weight.shape)
# torch.Size([4, 3]) # [out_features, in_features]
print(lin.bias.shape)
# torch.Size([4]) # [out_features]

ref = lin(x)

w = lin.weight
b = lin.bias

out = torch.matmul(x, w.T) + b
print(out - ref)
# tensor([[0., 0., 0., 0.],
#         [0., 0., 0., 0.]], grad_fn=<SubBackward0>)

GGPYTORCH000 · April 16, 2024, 1:51am

thx!! i am able to move fwd now. (salute)