Can I train a asynchronous actor critic model without keeping a copy of the model?

I’m editing code from torchbeast to fit my use case. I want to use a discriminator for calculating the reward at a certain state.

I noticed that torchbeast uses a copy of the model for inference called actor_model and a the original model for learning.

This is part of

def learn(
    for tensors in learner_queue:
        nn.utils.clip_grad_norm_(model.parameters(), flags.grad_norm_clipping)


I tried the same thing with the discriminator but I ran out of memory and had to use minibatches. I thought of using locks so only one thread is using the discriminator. Is there any other solution to this problem? Is it impossible?