"bool value of Tensor with more than one value is ambiguous" when trying set different lr to different layers

hsiangyu · August 12, 2020, 2:48pm

Hello everyone, recently I have met with a tough issue (at least for me) when I am tring to set different learning rate to different layers.
My segmentation model consists of a pretrained encoder and a decoder, so I want to set a lower lr to pretrained encoder. The encoder consists of a set of modules which have similar names: conv_encoder1, conv_encoder2, …
So I have written some code like this:

encode_params = []
for i in range(5):
    module = eval('model.conv_encode{}'.format(i+1))
    params = list(map(id, module.parameters()))
    encode_params += params
decode_params = filter(lambda p: id(p) not in encode_params,
                     model.parameters())
optimizer = torch.optim.SGD([{'params': decode_params},
                             {'params': encode_params, 'lr': base_lr / 10}],
                            lr=base_lr, momentum=0.9)

However when running my code I get error in the topic: RuntimeError: bool value of Tensor with more than one value is ambiguous.
Also when I am running a single line of

list(decode_params)

I get the same error as this. I have no idea what goes wrong and I will appreciate it if anyone would point out the mistakes I have made in my code.
This is the first time I post a topic in PyTorch forum and please forgive me for any inappropriate language.

mariosasko · August 12, 2020, 6:54pm

This is not a good place to use eval. Furthermore, you should use sets to separate the encoder parameters from the decoder parameters instead of id. With this in mind, a better way to write the first part is:

encode_params = set()
for i in range(5):
     module = getattr(model, 'conv_encode{}'.format(i+1))
     encode_params.update(module.parameters())

model_params = set(model.parameters())
decode_params = model_params - encode_params
 
optimizer = torch.optim.SGD([{'params': list(decode_params)},
                             {'params': list(encode_params), 'lr': base_lr / 10}], 
                            lr=base_lr, momentum=0.9)

Try to print everything that you suspect may cause the issue.

If you can’t solve the problem yourself, please provide some additional information so it’s easier for us to find a solution (the error’s stack trace would be helpful, and printing decode_params).

hsiangyu · August 13, 2020, 1:27am

It works! I am really grateful for your help, but I am still a little bit confused of why substituting set for id would work in my case.

mariosasko · August 13, 2020, 11:46am

One issue is that you were passing ids instead of tensors to the optimizer ({'params': encode_params}), so this would raise an error eventually but not the one you mention. Other than that, it’s not obvious what’s the problem in your original code, so please provide the entire stack trace (in both cases) to make it easier for us.

hsiangyu · August 15, 2020, 10:26am

I have tried my initial versions of code and they got error like this:

when running code below:

  ...: base_lr = 1e-4
  ...: encode_params = []
  ...: for i in range(5):
  ...:     module = eval('model.conv_encode{}'.format(i+1))
  ...:     params = list(module.parameters())
  ...:     encode_params += params
  ...: decode_params = filter(lambda p: p not in encode_params,
  ...:                      model.parameters())
  ...: optimizer = torch.optim.SGD([{'params': decode_params},
  ...:                              {'params': encode_params, 'lr': base_lr / 10}],
  ...:                             lr=base_lr, momentum=0.9)
Traceback (most recent call last):
  File "D:\Anaconda3\envs\torch\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-36fa8cf864b8>", line 57, in <module>
    lr=base_lr, momentum=0.9)
  File "D:\Anaconda3\envs\torch\lib\site-packages\torch\optim\sgd.py", line 64, in __init__
    super(SGD, self).__init__(params, defaults)
  File "D:\Anaconda3\envs\torch\lib\site-packages\torch\optim\optimizer.py", line 51, in __init__
    self.add_param_group(param_group)
  File "D:\Anaconda3\envs\torch\lib\site-packages\torch\optim\optimizer.py", line 195, in add_param_group
    param_group['params'] = list(params)
  File "<ipython-input-2-36fa8cf864b8>", line 53, in <lambda>
    decode_params = filter(lambda p: p not in encode_params,
RuntimeError: The size of tensor a (64) must match the size of tensor b (7) at non-singleton dimension 3

when running code below (I think this version may be entirely wrong)

  ...: base_lr = 1e-4
  ...: encode_params = []
  ...: for i in range(5):
  ...:     module = eval('model.conv_encode{}'.format(i+1))
  ...:     params = list(module.parameters())
  ...:     encode_params += params
  ...: decode_params = filter(lambda p: id(p) not in encode_params,
  ...:                      model.parameters())
  ...: optimizer = torch.optim.SGD([{'params': decode_params},
  ...:                              {'params': encode_params, 'lr': base_lr / 10}],
  ...:                             lr=base_lr, momentum=0.9)
Traceback (most recent call last):
  File "D:\Anaconda3\envs\torch\lib\site-packages\IPython\core\interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-91e3d95a8795>", line 11, in <module>
    lr=base_lr, momentum=0.9)
  File "D:\Anaconda3\envs\torch\lib\site-packages\torch\optim\sgd.py", line 64, in __init__
    super(SGD, self).__init__(params, defaults)
  File "D:\Anaconda3\envs\torch\lib\site-packages\torch\optim\optimizer.py", line 51, in __init__
    self.add_param_group(param_group)
  File "D:\Anaconda3\envs\torch\lib\site-packages\torch\optim\optimizer.py", line 195, in add_param_group
    param_group['params'] = list(params)
  File "<ipython-input-5-91e3d95a8795>", line 7, in <lambda>
    decode_params = filter(lambda p: id(p) not in encode_params,
RuntimeError: bool value of Tensor with more than one value is ambiguous

hsiangyu · August 15, 2020, 10:33am

The initial version of my code is rather popular in many Chinese blogs and forums about PyTorch, but I am not sure why almost nobody has reported issues about these mistaken code in Chinses forums; it will be really nice of you if you can help identify the mistakes in the code (especially the reason why using list rather set can raise an error) and I will send the hyperlink and translated version of this issue there to help people not make similar mistakes.

mariosasko · August 15, 2020, 11:28am

Now I can see the error clearly. When the optimizer starts consuming the filter iterator, the membership test id(p) not in encode_params throws an error.

This snippet reproduces the error:

ls = [torch.randn(3), torch.randn(2, 3)]
ls[0] in ls # returns True
torch.randn(4) in ls # error
id(torch.randn(3)) in ls # error again

You can think of the membership test looking like this:

if obj is value:
   return True
else:
   return bool(obj == value)

This explains why the first example returns True and the other two throw an error.
And this is why you should use sets instead of lists to solve this issue.