I have two networks say A and B. A takes its input as a tuple such that (data_disk, b_output). data_disk is just data read from disk as one can see and b_output is the output of network B. Is “stepping optimizers of both A and B” enough to train networks A and B.
Yes, it should be enough.
You could pass the parameters of both models to one optimizer or just use separate ones.
As long as the gradients are valid, both models should be updated.