I’m editing code from torchbeast to fit my use case. I want to use a discriminator for calculating the reward at a certain state.
I noticed that torchbeast uses a copy of the model for inference called actor_model and a the original model for learning.
This is part of polybeast.py:
def learn( ... for tensors in learner_queue: ... optimizer.zero_grad() total_loss.backward() nn.utils.clip_grad_norm_(model.parameters(), flags.grad_norm_clipping) optimizer.step() scheduler.step() actor_model.load_state_dict(model.state_dict()) ...
I tried the same thing with the discriminator but I ran out of memory and had to use minibatches. I thought of using locks so only one thread is using the discriminator. Is there any other solution to this problem? Is it impossible?