Hi!
I am new to pytorch and my model contains a bi-linear layer (= two inputs + one bias).
Therefore I implemented a simple module:
class Bilinear(nn.Module):
def __init__(self, input_size, hidden_size):
super(Bilinear, self).__init__()
self.W_a = nn.Parameter(torch.Tensor(input_size, input_size))
self.W_b = nn.Parameter(torch.Tensor(hidden_size, input_size))
self.b = nn.Parameter(torch.Tensor(input_size))
def forward(self, x, h):
return self.W_a.t().matmul(x) + self.b.t() + self.W_b.t().matmul(h)
and another module using builtin Linear modules which should be mathematically the same:
class Bilinear(nn.Module):
def __init__(self, input_size, hidden_size):
super(Bilinear, self).__init__()
self.linear_i = nn.Linear(input_size, input_size)
self.linear_h = nn.Linear(input_size, hidden_size, bias=False)
def forward(self, x, h):
return self.linear_i(x) + self.linear_h(h)
Both running on CPU.
Can someone explain me why the outcome is that different for that modules? (Sorry but the complete model would be too complex to explain here).
The second one is much more stable. The first one sometimes leads to a loss (with NLLLoss
) of nan
.
I don’t want you to help me with my concrete problem, but helping me understand what is under the hood of pytorch which leads to high discrepancies of built-in vs manually re-built modules and how to avoid common pitfalls when writing low-level modules in pytorch.
Thank you very much!