I started to learn PyTorch and am looking for some help to understand the basics. I implemented two classes A and B (shown below). I expected them to do the same thing.

However, this is not the case. When training `A`

the losses start in the three digits range, while using the same data and fit loop with `B`

(using Linear) starts with five digits losses. They both then go into the right direction, but I am still wondering what the difference is?

FWIW I went through the forums and found the topic of initialization and in `C`

I tried to imitate what I found in Linearâ€™s init() and reset_parameters() as good as I could. But it did not change the initial losses reported.

```
class A(nn.Module):
def __init__(self):
super().__init__()
self.weights = nn.Parameter(torch.rand(224*224*3, 1) / math.sqrt(224*224*3))
self.bias = nn.Parameter(torch.zeros(1))
def forward(self, xb):
return xb.view(xb.size(0), -1) @ self.weights + self.bias
class B(nn.Module):
def __init__(self):
super().__init__()
self.lin= nn.Linear(224*224*3, 1)
def forward(self, xb):
return self.lin(xb.view(xb.size(0), -1))
class C(nn.Module):
def __init__(self):
super().__init__()
self.weights = nn.Parameter(torch.Tensor(224*224*3, 1))
init.kaiming_uniform_(self.weights, a=math.sqrt(224*224*3))
bound = 1 / math.sqrt(224*224*3)
self.bias = nn.Parameter(torch.Tensor(1))
init.uniform_(self.bias, -bound, bound)
def forward(self, xb):
return xb.view(xb.size(0), -1) @ self.weights + self.bias
#FWIW here is also the fit loop.
epochs = 12
lr = 1e-8
model = B() # or A()
n=0
for epoch in range(epochs):
for xb, yb in train_dl:
yb_ = model(xb)
loss = F.mse_loss(yb_, yb)
n+=1
if n % 20 == 0: print(f'loss: {loss.item():05.2f}', )
loss.backward()
with torch.no_grad():
for p in model.parameters():
p -= p.grad * lr
model.zero_grad()
print(epoch, loss.item(), math.sqrt(loss.item()))
```