Only one layer of weights update, why?

I create a NN with one hidden layer, so I have two tensor of weights.
Then I train it, so I expect the weights to update. The problem is that only the second set of weights updates, the first stays the same!
Why? I have tried different possible solutions, but nothing.

Thank you for your help.

This is the code (I removed the imports and the dataset upload), I print “par” before training and after training, it is clear that the first set of weights did not change:

class NN1(nn.Module):
    def __init__(self, D_1, D_2, H, D_out):
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(D_1*D_2, H, bias=False),
            nn.Linear(H, D_out, bias=False),

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NN1(28, 28, 756, 10)
par = list(model.parameters())

def train_loop(model, training_data, batch_size, eta, epochs):
    optimizer = torch.optim.SGD(par, lr=eta)
    loss = nn.CrossEntropyLoss()

    train_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True)

    for epoch in range(epochs):
        for batch_idx, (X, y) in enumerate(train_dataloader):
            ypred = model(X)
            train_loss = loss(ypred, y)
            if batch_idx % (6400/batch_size) == 0:
                print(f'Epoch [{epoch + 1}/{epochs}] Batch [{batch_idx}/{len(train_dataloader)}] Training loss: {train_loss.item()}')
    return model


Just printing the parameter might not show enough decimals so clone the original parameter and subtract the updated one from it. Also check the gradients via the .grad attribute to see how small they are.

Thank you very much.

Unfortunately I have tried doing both things, but nothing. When I clone the original parameter and subtract the updated one from it I get a tensor of all 0, so they are perfectly equal I believe (and instead for the second set of parameters it’s different from 0).

When I print the gradients during the training loop they appear to be exactly 0 (I don’t know if there is an approximation that I can’t see, but I don’t think so).

By reading some other posts I understood the problem. Actually there was not really a problem. When I was printing the gradients and/or the weights, the output was clearly truncated, and the part which I was seeing was simply not updated, but other parts which I was not seeing where being updated.