Overall architecture: [model_1] → [model_2]
optimizer_1 = SGD(model_1.parameters())
optimizer_2 = SGD(model_2.parameters())
out_1 = model_1(input)
loss_1 = criterion(out_1)
optimizer_1.zero_grad()
loss_1.backward()
optimizer_1.step()
out_2 = model_2(out_1.detach())
loss_2 = criterion(out_2)
optimizer_2.zero_grad()
loss_2.backward()
optimizer_2.step()
When I train only model_1 , I can get 90% accuracy,
but If I add train model_2, model_1 accuracy decreased significantly.
Why model_2 affect to model_1 even if I use different optimizer and detach()