Fed Avg-two implementations give very different results

Huzaifa_Capricorn · July 5, 2022, 11:09pm

Implementation 1

Fed Averaging and Loading weights same loop

for k in global_dict.keys(): 

    global_dict[k] = torch.stack([client_models[i].state_dict()[k].float() for i 
    in range(len(client_models))], 0).mean(0)
    global_model.load_state_dict(global_dict)
    for  model in client_models:
        model.load_state_dict(global_model.state_dict())

Implementation 2

Fed Averaging and Loading weights different loop

for k in global_dict.keys(): 

    global_dict[k] = torch.stack([client_models[i].state_dict()[k].float() for i 
    in range(len(client_models))], 0).mean(0)
    global_model.load_state_dict(global_dict)
    
global_model.load_state_dict(global_dict)
for model in client_models:
  model.load_state_dict(global_model.state_dict())

Implementation 2 gives me a test accuracy of the global model similar to the client model but implementation 1 does not. I don’t know the reason behind it ?

Also, I am using a pre-trained model with additional layers. I want to keep the layers of the pre-trained model frozen during training for each client.

ptrblck · July 6, 2022, 4:07am

I’m a bit confused about the logic of both loops. It seems you are manipulating the global_dict in the outer loop (for k in global_dict.keys()) but are also loading it into the global_model as well as all client_models. Check if you are overwriting some of the states in this loop as I don’t fully understand why you would call the load_state_dict operation before the gobal_dict is finalized.

Huzaifa_Capricorn · July 6, 2022, 2:45pm

Yeah so the preamble to this code is this:
global_dict = global_model.state_dict()
This is a temp dictionary that is initialized as the previous state of the global dictionary.
This new dictionary is then loaded into the global model and client models as this is what happens after aggregation. So whether I am changing the states at every iteration or not, it should not matter, right?

ptrblck · July 7, 2022, 4:46am

I don’t know as it apparently does change the execution. Feel free to post a minimal, executable code snippet reproducing the issue in case you get stuck.