Getting different results with code, but i think they should be the same

fred · February 15, 2021, 7:55pm

I’m getting different results when using these two pieces of code interchangeably, but i don’t know why. They should give the same results. It’s the code for my optimizer configuration.

Code 1


return torch.optim.Adam(model.parameters(), weight_decay=1e-6, lr=0.001)

Code 2


    updown_weights = []
    updown_bias = []
    for m in model.modules():
        if isinstance(m, mcnnsae_parts.DownConv) or isinstance(m, mcnnsae_parts.UpConv):
            for name, p in m.named_parameters():
                if 'bias' in name:
                    updown_bias.append(p)
                if 'weight' in name:
                    updown_weights.append(p)

    return torch.optim.Adam([
        {'params': updown_bias, 'weight_decay': 1e-6},
        {'params': updown_weights, 'weight_decay': 1e-6}],
        lr=0.001, weight_decay=1e-6)

ptrblck · February 15, 2021, 11:35pm

How are you comparing these different approaches? Are you seeding the code or reusing the same state_dicts? Is each approach deterministic but yields different results when compared to each other?
In the latter case how large are the differences?

fred · February 16, 2021, 6:25pm

I’m seeding the code using the following:

    seed=1024
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    random.seed(seed)

ptrblck · February 16, 2021, 6:41pm

Additionally to these flags you could also use torch.set_deterministic(True), which should raise errors for known non-deterministic methods as described in the reproducibility docs.