Optimizer problem when copying a model

kharrat_salma · August 4, 2022, 8:31am

Hello, I have to train n clients where each client has its own model

so I instantiated a model model=Net() and then created n copies:

models.append(copy.deepcopy(model))

after every batch I aggregate all weights and do the following:
models[i].load_state_dict(average_weights)

I noticed that there is no learning, and the loss gets stuck.

Any suggestion for this problem?

sudomaze · August 4, 2022, 11:48am

Why do you have multiple models and then average their weights?

Each model might learn something different, and taking the average of the weights, especially during training, won’t result in a better estimation to solve the problem at hand.