How to find the unused parameters in network

kealennieh · December 13, 2019, 8:52am

When using torch.nn.parallel.DistributedDataParallel to train network, I’ve got " please add find_unused_parameters=True into DistributedDataParallel " error. After adding this flag into DistributedDataParallel, I can train the network normally.

I know the reason why the error occurs is that the network has some unused parameters. I wonder whether we have some tools to find what they are.

e0357894 · June 6, 2020, 1:28am

Hi,

I have the same issues and I need to find those unused parameters also. Please let me know if you have got any solutions.

Thanks

Xin_Qiu · July 2, 2020, 12:08pm

hi, nihao.
i have the same issues too. i trained the model a few step and torch.save the model to compare the parameters, and find the one not be updated(unused).
for example:

import torch
sd1 = torch.load("./work_dir/step_1.pth")["state_dict"]
sd4 = torch.load("./work_dir/step_5.pth")["state_dict"]
for k in sd1:
    v1 = sd1[k]
    v4 = sd4[k]
    if (v1==v4).all():
        print(k)

e0357894 · July 3, 2020, 12:41pm

Hi~,

I moved all the trainable parameters to the forward pass and then the problem got solved. Hope this help you.

Regards,

faizan · March 18, 2021, 3:34pm

An easy way to find unused params is train your model on a single node without the DDP wrapper. after loss.backward() and before optimizer.step() call add the below lines

for name, param in model.named_parameters():
    if param.grad is None:
        print(name)

This will print any param which did not get used in loss calculation, their grad will be None.

Philipflyg · May 5, 2021, 4:18pm

Very good method!!! thank you very much!!!