Model gradients can't be changed

Imran_Rashid · March 20, 2020, 3:21pm

I am trying to modify my gradients by multiplying them by a constant. As a test, I multiply the gradients by zero, so I expect that the losses should stay constant.

However it looks like the model still updates the weights of the network even though I have tried to force the weights gradients to be zero. Why aren’t the gradients being updated?

GRADIENT_MULTIPLIER = 0.
model.train()
for epoch in range(100):
    for ind, (input_data,labels) in enumerate(train_iterator):
        optimizer.zero_grad()

        logits = model(input_data,labels)
        loss = model.loss(logits, labels)

        loss.backward()
        
        for p in model.parameters():
            p.grad *= GRADIENT_MULTIPLIER

        global_norm = np.sqrt(sum([torch.sum(w.grad**2).item() for w in model.parameters()]))
        print(f"global_norm2 : {global_norm}")
        
        optimizer.step()

        print(f"loss : {loss.data}")

Outputs are:

global_norm : 0.0
loss : 0.6851892471313477
global_norm : 0.0
loss : 0.6985365748405457
global_norm : 0.0
loss : 0.6622101664543152
global_norm : 0.0
loss : 0.45273280143737793
global_norm : 0.0
loss : 0.8967741131782532
global_norm : 0.0
loss : 0.28941503167152405

albanD · March 20, 2020, 4:03pm

Hi,

What is your optimizer? Note that some optimizers will still change the weights even for 0 gradients (for example when weight decay is used).

Imran_Rashid · March 20, 2020, 4:09pm

I’m using Adam for my optimizer. Specifically: optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

albanD · March 20, 2020, 4:12pm

For Adam, if any “momentum-like” term is non-zero, then you will update the weights even with a gradient of 0. So this is expected.