Hi, I wonder if I use my optimizer in a data structure containing module dict (for example)
{key1:m1, key2:m2, key3:m3}
and my training process is like:
for epoch in ...:
for key in keys:
output = model\[key\](x) #that is because for one key, I only train certain layer in this model.
loss = f(output, true)
loss.backward()
optimizer.step()
It is ok for me to train all the layers in this model for one epoch? Moreover, I think it is also different to use this approach:
for epoch in ...:
loss = 0
for key in keys:
output = model\[key\](x) #that is because for one key, I only train certain layer in this model.
loss += f(output, true)
loss.backward()
optimizer.step()
I am confused about the difference existing in this process. Thanks.