Parameter server based rpc in tutorial test accuracy is 0.1

Hello!
I implemented the rpc based parameter server framework according to this link, but encountered some problems.

If the neural network being trained does not contain batchnorm layers, then both training and testing can yield ideal results. However, if the neural network contains batchnorm layers, such as resnet, it is normal during the training phase, but during the testing phase, i.e. after model.eval(), the test accuracy of the model is always 0.1. My test code is as follows.

def get_accuracy(test_loader, model):
    model.eval()
    correct_sum = 0
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    with torch.no_grad():
        for i, (data, target) in enumerate(test_loader):
            data, target = data.to(device), target.to(device)
            out = model(data)
            pred = out.argmax(dim=1, keepdim=True)
            pred = pred.to(device)
            correct = pred.eq(target.view_as(pred)).sum().item()
            correct_sum += correct
    print(f"Accuracy {correct_sum / len(test_loader.dataset)}")

After encountering this problem, I checked some information. The reason might be the running_mean and running_var of the batchnorm layer, but I don’t know what to do with them. I am very confused, can you give me some advice, thank you!

Thanks for your question! It appears that model.eval() with batchnorm has historically had some confusion, see for example: Model.eval() gives incorrect loss for model with batchnorm layers - #11 by meetshah1995.

Could you try to follow the advice in that thread to see if it improves the eval accuracy? Also, what sort of accuracy do you get if you disable model.eval()?