for raw_data, raw_target in train_loader:
data = raw_data.view((batch_size, 1, 784))
logits = model(data)
zeros[:, :, :] = 0
zeros[rows, 0, raw_target] = 1
loss = criterion(logits, zeros.float())
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()

when you run this code it will show( biase and weights are taken from model.parameters()):

weight[0][0][0] before training: tensor(0.7576)
Bias[0] before training: tensor(0.3681)
weight[0][0][0] after training: tensor(0.7576)
Bias[0] after training tensor(-48.1940)

I think some of the weights are updating. When I sum all of the weights I get another value after training than I got before training, but it is still very close to the first one.

sum(weight) before training: tensor(1.00000e+06 *
3.9201)
Bias[0] before training: tensor(0.3681)
sum(weight) after training: tensor(1.00000e+06 *
3.8118)
Bias[0] after training tensor(-47.6196)

I am not absolutely sure in this point, but do you really want to have the batch_size as one Dimension of your weight w?
Shouldn’t the weight be the same for every element in the batch and you just expand it in the forward computation like this: