Optimization.step() does not work

Toby · December 17, 2020, 7:15am

Hi,

I have a problem with optimization.step(), my train code:

previous = model.head[12].weight
    for i, (image, target) in enumerate(bar):
        image, target = image.to(device), target.to(device)
        b_size = image.size()[0]
        output = model(image)    
        loss = criterion(output, target)
        now = model.head[12].weight
        print('='*10)
        print(previous,now)
        print(model.head[12].weight.grad)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

The twelfth layer is nn.Linear(32, 1)
The previous and now variables always return the same:

tensor([[ 0.0034, -0.2354, -0.1383, -0.2392, -0.5482,  0.4603,  0.1331,  0.3651,
         -0.1942, -0.1649,  0.2016,  0.3514,  0.2886, -0.0520,  0.1155, -0.3433,
         -0.3107,  0.1798,  0.0551, -0.0161,  0.5021,  0.1313,  0.3340,  0.2885,
          0.3380, -0.1972,  0.0067,  0.4465, -0.0814, -0.3551,  0.1053,  0.2180]],
       device='cuda:0', requires_grad=True)

Even that layer has the gradient value:

# Iter 1: None
# Iter 2:
tensor([[-0.4360, -0.2952, -0.6197, -0.0216, -1.0340,  0.7521,  0.2088,  0.7050,
         -0.6158,  0.4012,  1.0144, -0.5273,  0.6514,  0.2796, -0.6339, -0.8657,
         -0.2789,  0.2030, -0.0475,  0.4096,  0.9676,  0.3229,  0.5224,  0.1033,
          0.5484, -0.3745, -0.1743,  0.3179, -0.5385, -0.2508,  0.3502,  0.7884]],
       device='cuda:0')
# Iter 3:
tensor([[-0.5874, -0.7407, -0.6531,  0.2646, -1.6728,  0.4518,  0.1702,  1.0924,
         -1.2420,  0.1817,  0.7247,  0.9656,  0.0686,  0.1332,  0.3749, -1.8020,
         -2.0433,  0.4756, -0.8883, -0.4920,  1.8353, -0.5856, -0.2931,  0.8012,
          1.2388, -0.1963,  0.3465,  1.1411, -0.7416, -0.5985, -0.1804,  0.0322]],
       device='cuda:0')

Is it a bug, or I did something wrong?

InnovArul · December 17, 2020, 7:19am

you are storing reference to the weight in previous and now variables.
Store the cloned weight in previous (previous = model.head[12].weight.clone()). You will see the changes.

Toby · December 17, 2020, 8:01am

ah, my bad, thank you