How to find the unused parameters in network

When using torch.nn.parallel.DistributedDataParallel to train network, I’ve got " please add find_unused_parameters=True into DistributedDataParallel " error. After adding this flag into DistributedDataParallel, I can train the network normally.

I know the reason why the error occurs is that the network has some unused parameters. I wonder whether we have some tools to find what they are.

9 Likes

Hi,

I have the same issues and I need to find those unused parameters also. Please let me know if you have got any solutions.

Thanks

1 Like

hi, nihao.
i have the same issues too. i trained the model a few step and torch.save the model to compare the parameters, and find the one not be updated(unused).
for example:

import torch
sd1 = torch.load("./work_dir/step_1.pth")["state_dict"]
sd4 = torch.load("./work_dir/step_5.pth")["state_dict"]
for k in sd1:
    v1 = sd1[k]
    v4 = sd4[k]
    if (v1==v4).all():
        print(k)
4 Likes

Hi~,

I moved all the trainable parameters to the forward pass and then the problem got solved. Hope this help you.

Regards,

An easy way to find unused params is train your model on a single node without the DDP wrapper. after loss.backward() and before optimizer.step() call add the below lines

for name, param in model.named_parameters():
    if param.grad is None:
        print(name)

This will print any param which did not get used in loss calculation, their grad will be None.

38 Likes

Very good method!!! thank you very much!!!