Getting different results with code, but i think they should be the same

I’m getting different results when using these two pieces of code interchangeably, but i don’t know why. They should give the same results. It’s the code for my optimizer configuration.

Code 1

return torch.optim.Adam(model.parameters(), weight_decay=1e-6, lr=0.001)

Code 2

    updown_weights = []
    updown_bias = []
    for m in model.modules():
        if isinstance(m, mcnnsae_parts.DownConv) or isinstance(m, mcnnsae_parts.UpConv):
            for name, p in m.named_parameters():
                if 'bias' in name:
                if 'weight' in name:

    return torch.optim.Adam([
        {'params': updown_bias, 'weight_decay': 1e-6},
        {'params': updown_weights, 'weight_decay': 1e-6}],
        lr=0.001, weight_decay=1e-6)

How are you comparing these different approaches? Are you seeding the code or reusing the same state_dicts? Is each approach deterministic but yields different results when compared to each other?
In the latter case how large are the differences?

I’m seeding the code using the following:

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

Additionally to these flags you could also use torch.set_deterministic(True), which should raise errors for known non-deterministic methods as described in the reproducibility docs.