It shouldn’t work at all since you do zero_grad not in the beginning of training loop but right in the middle. Also, since one of your models is inner model, you don’t need to call optimizer on both, call it on just the external one.
# create outer model, using __init__ create instance of inner model and place it to self.basemodel
model = OuterModel()
# self.basemodel = InnerModel()
# ^^^^ this goes to __init__ of OuterModel
# then in forward() pass of OuterModel you do
# def forward(self,x):
# x = self.basemodel(x)
# return x
# create single optimizer and pass model.parameters() to it
# training loop:
optimizer.zero_grad()
out = model(x)
loss = loss_func(out,label)
loss.backward()
optimizer.step()
I moved the zero_grad function to the beginning of training loop, but NAN situation has not changed.
For some reason, I need to use two models instead of one big model.
Could you give me some advice?