Questions about optimizer behavior in the model with moduledict

Hi, I wonder if I use my optimizer in a data structure containing module dict (for example)

{key1:m1, key2:m2, key3:m3}

and my training process is like:

for epoch in ...:
    for key in keys:
        output = model\[key\](x) #that is because for one key, I only train certain layer in this model.
        loss = f(output, true)
        loss.backward()
        optimizer.step()

It is ok for me to train all the layers in this model for one epoch? Moreover, I think it is also different to use this approach:

for epoch in ...:
    loss = 0
    for key in keys:
        output = model\[key\](x) #that is because for one key, I only train certain layer in this model.
        loss += f(output, true)
    loss.backward()
    optimizer.step()

I am confused about the difference existing in this process. Thanks.

Your code is not properly formatted so I don’t know when the backward calls are used.
However, note that optimizers with internal states (e.g. Adam) would update parameters even if their .grad attributes are set to zero if a valid internal state is available.
You might thus want to consider deleting the .grad attributes via optimizer.zero_grad(set_to_none=True) to avoid it.

Hi, I modified my data structure to a right format. Now I will try your method.