Optimizer not taking Random Inputs

Ayush_Singhal · August 22, 2023, 2:54pm

I was just curious about the NNs. So I did a small experiment. I took a simple Linear Layer nn.Linear(2 , 2). The training is simple

losses = []

for f , t in zip(fea , tar):

    preds = layer(f)

    loss = loss_f(preds , t)
    losses.append(loss)

    optim.zero_grad()
    loss.backward()

    optim.step()

Here the optimzers is torch.optim.Adam(layer.parameters()) and loss is nn.MSELoss(). It works fine and updates the weights/parameters. But as soon as I do a slight change in optimizer as

x = nn.Parameter(torch.rand(2 , 2) , dtype = torch.float32)

optim = torch.optim.Adam([x])

The weights are not updated

Parameter containing:
tensor([[0.5747, 0.4457],
        [0.5285, 0.7974]], requires_grad=True)

{'params': [Parameter containing:
  tensor([[0.5747, 0.4457],
          [0.5285, 0.7974]], requires_grad=True)],
 'lr': 0.001,
 'betas': (0.9, 0.999),
 'eps': 1e-08,
 'weight_decay': 0,
 'amsgrad': False,
 'maximize': False,
 'foreach': None,
 'capturable': False,
 'differentiable': False,
 'fused': None}

Why even after the same size, it is not changing and even if the shape is different. do otpimizer really check if it is the actual weights of a layer …?

Is there any way, where I can make this thing work…?

ptrblck · August 22, 2023, 5:05pm

Is wrong as nn.Parameter won’t accept a dtype argument and will fail with:

TypeError: Parameter.__new__() got an unexpected keyword argument 'dtype'

Fixing this issue properly allows x to be optimized:


x = nn.Parameter(torch.rand(2 , 2, dtype = torch.float32))
optim = torch.optim.Adam([x], lr=1.)

for _ in range(10):
    print(x)
    out = x * 2
    loss = out.mean()
    optim.zero_grad()
    loss.backward()
    optim.step()

Output:

Parameter containing:
tensor([[0.6680, 0.9391],
        [0.6531, 0.9501]], requires_grad=True)
Parameter containing:
tensor([[-0.3320, -0.0609],
        [-0.3469, -0.0499]], requires_grad=True)
Parameter containing:
tensor([[-1.3320, -1.0609],
        [-1.3469, -1.0499]], requires_grad=True)
Parameter containing:
tensor([[-2.3320, -2.0609],
        [-2.3469, -2.0499]], requires_grad=True)
Parameter containing:
tensor([[-3.3320, -3.0609],
        [-3.3469, -3.0499]], requires_grad=True)
Parameter containing:
tensor([[-4.3320, -4.0609],
        [-4.3469, -4.0499]], requires_grad=True)
Parameter containing:
tensor([[-5.3320, -5.0609],
        [-5.3469, -5.0499]], requires_grad=True)
Parameter containing:
tensor([[-6.3320, -6.0609],
        [-6.3469, -6.0499]], requires_grad=True)
Parameter containing:
tensor([[-7.3320, -7.0609],
        [-7.3469, -7.0499]], requires_grad=True)
Parameter containing:
tensor([[-8.3320, -8.0609],
        [-8.3469, -8.0499]], requires_grad=True)

Ayush_Singhal · August 23, 2023, 3:00am

Sorry for the typo in nn.Parameter()

I have one last question. Does the shape play a vitol role…? Like does the working change if I give the shape of (4 , 4) rather than (2 , 2)…?

ptrblck · August 23, 2023, 12:05pm

Yes, the shape is important for the used operation. A parameter of 4x4 will also work depending on the way you are planning on using this parameter.

Ayush_Singhal · August 23, 2023, 2:37pm

Thank you for solving my issue