Why sum of parameter of model increase after each iteration during train the model?

Hi,

I am training the model with input batch size of 16. After each batch iteration the parameter of model increase. why ?
here is my code

model = model.train()
for i in range(0, 335, 16 ):
       y = 0
        for k, x in enumerate(model.parameters()):
            y += x.sum()
        print (y)
        input = data.cuda()
        target =  labels.cuda()
        outputs = model(inputs)
        loss = criterion(outputs, target)
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()  
        curr_err = loss.item()
        err += curr_err
        torch.cuda.empty_cache()

here is the increase parameter output

tensor(421.4539, device='cuda:0', grad_fn=<ThAddBackward>)
tensor(422.1019, device='cuda:0', grad_fn=<ThAddBackward>)----]16/335
tensor(423.4700, device='cuda:0', grad_fn=<ThAddBackward>)----]32/335
tensor(424.6810, device='cuda:0', grad_fn=<ThAddBackward>)----]48/335
tensor(425.5378, device='cuda:0', grad_fn=<ThAddBackward>)----]64/335 

......................................  ...........................
................................    ..............................

tensor(434.1239, device='cuda:0', grad_fn=<ThAddBackward>)----]304/335
tensor(434.4677, device='cuda:0', grad_fn=<ThAddBackward>)==>-]320/335

Ideally it should be 421.4539 each time.

please help me out.

What happens if you print only x.sum() ?

Sorry post was not completed that time. x.sum() taking both weight and bias of the model.

Right, thanks for updating the question!

I think .sum() will add the parameters themselves, not the number of parameters.

This topic seems to have a clean solution for counting parameters.

Yes it is adding the the parameters themselves, not the number of parameters. Each parameter is either the weight tensor or bias tensor. I am taking the sum of those tensor.

Oh so you really want the parameters’ values! Sorry for my misunderstanding.

In that case, why do you want them to stay at 421.4539 ? If they are trainable parameters, they will get modified by back-prop (and possibly weight-decay if you set it up) after each mini-batch…

So can I stop this back-prop modification. Is there any thing possible , or it will remain modified each time …? If it is possible please let me know. Thnaks

Well if you want to train a network, back-prop is strongly encouraged…

And by the way, your training loop range(0, 16, 331) does not really make sense, as the range arguments are start, end, step (with end not included).

Oh sorry it was typo mistake in range(). Thanks