RuntimeError: Expected to mark a variable ready only once on multi-GPU setting

I’m trying to run a Huggingface model on multi-GPU. The problem is that when I’m processing multiple inputs which are bound to each other from a single class (shared-weights), I’m getting RuntimeError: Expected to mark a variable ready only once.. While if I use the module only once, for processing one input, I won’t get this error.

To make it clearer, here is the structure:

class Model():
    def __init__(self, ...)
	   self.encoder = ...

    def forward(input_ids, ...):

	   encoder_outputs = self.encoder(input_ids, ...)

	   # filter encoder_outputs and construct another tensor called 'input_ids_selected'

	   encoder_outputs = self.encoder(input_ids_selected, ...)

	   return encoder_outputs

If I remove this line: encoder_outputs = self.encoder(input_ids_selected, ...), I will not run into this error. Should say that to filter encoder_outputs from the first pass of encoder, I’m using other modules (linear layers) to find important input_ids, retaining those in input_ids_selected. You can see this as a two-step summarizer.