Training multiple models independently

Hello,

I am implementing the actor-critic algorithm. I have my actor and critic that are both neural nets but I want to train simultaneously and independently.

What I currently have looks more or less like this:

actor = Net1()
critic = Net2()
optim_a = optim.SGD(actor.parameters(),lr=0.1, momentum=0.9)
optim_c = optim.SGD(critic.parameters(),lr=0.1, momentum=0.9)
optim_a.zero_grad()
optim_c.zero_grad()
output_c = actor(input)
output_a = critic(output_c, other_inputs)
loss = output_c+output_a
loss.backward()
optim_a.step()
optim_c.step()

But when I train, the weights are not changing even though the loss is not 0, that is, there is some issue when the backpropagatiion is carried out. Furthermore, I would ideally like to have this

output_c.backward()
optim_c.step()
output_a.backward()
optim_c.step()

Because I am not sure, when I add them up, as in the first block, the optimizer modifies weights independently (I assume it does). I emphasize this because the output of the critic depends on the actor.

So, I’d appreciate any hint on how to do this or a different way to implement this.

I am aware of actor-critic pytorch tutorial. But they define one neural network with two independent outputs that they use to have as actor and critic, but nevertheless both share some of the weights. Ultimately, if this is the only option, then I’ll implement a shared neural network instead of two independent ones.

Thanks

Could you print out some .grad attributes of some parameters of both models after the loss.backward() call?
If they are None, you might be breaking the computation graph somewhere in your forward passes.

I don’t understand this section.
Both approaches (loss.backward() and calling backward of both outputs) should create the same gradients. You could check it again using some dummy data and comparing the .grad attributes.

The difference would be that optimizer.step() is called twice, which would make a difference, e.g. if you are using momentum.

Thanks. I’'l print the attributes to see if I am breaking the training somewhere. If that doesn’t work, I’ll edit my question and word it better to make it more clear.

It seems it was working the entire time. I was just missreading the weights update. Something similar to what happened here.

1 Like

Hello Piotr, I was wondering if PyTorch internally parallelizes the forward pass of given two models here (actor() and critic()). I am trying to build something similar where I have two models which are not capable of using the whole GPU (as far as I noticed it uses 4 GiB of 32 GiB when I checked with nvidia-smi command).
That’s why I want to train two models in single GPU and hopefully have a decent speed compared to sequentially training two models.

To achieve my goal, I have initialized those two models in a nn.Module child class and called them in the forward in seperate lines but the duration was even longer than twice the duration a single training takes. Can you point me in the right direction to achieve my goal?

No, these models won’t run in parallel as the execution is sequential due to the data dependence:

output_c = actor(input)
output_a = critic(output_c, other_inputs)

I.e. critic cannot start the execution before output_c is calculated.

If models do not depend on each other you could execute them in parallel, if the GPU has free resources. Here is another related post.

1 Like