Can I train a asynchronous actor critic model without keeping a copy of the model?

I’m editing code from torchbeast to fit my use case. I want to use a discriminator for calculating the reward at a certain state.

I noticed that torchbeast uses a copy of the model for inference called actor_model and a the original model for learning.

This is part of polybeast.py:

def learn(
    ...
    for tensors in learner_queue:
        ...
        optimizer.zero_grad()
        total_loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(), flags.grad_norm_clipping)
        optimizer.step()
        scheduler.step()

        actor_model.load_state_dict(model.state_dict())
        ...

I tried the same thing with the discriminator but I ran out of memory and had to use minibatches. I thought of using locks so only one thread is using the discriminator. Is there any other solution to this problem? Is it impossible?