Multitask model prediction accuracy lower after converting to C++

I have a multitask model which uses ModuleDict. Its forward(self, a, a_len, b, b_len, task_id) has argument task_id.

After converting the pytorch model checkpoint into C++ by tracing, the C++ prediction accuracy is 5% lower than pytorch prediction accuracy.

If I train each task individually, i.e., into a separate model and checkpoint file, then convert each of them into C++, then the accuracy matches.

Tracing only covers the specific code-path executed during tracing, all other paths and control-flow information cannot be recorded
You could trace the various submodules and combine them using scripting to get an single exportable model encompassing everything.

Best regards

Thomas

1 Like

@tom Thank you so much. I haven’t fully understand your answer yet. The link is nice, however, I don’t know how to apply it into for-loops, i.e. iterate all possible task_id as input. Would you please provide some example?

Below is the code snippet of the Model class. It contains ModuleDict whose key is task_id. The forward function has task_id as argument.

This is a control flow, similar to if-else, since ModuleDict internally uses for-loops?

So, when tracing the model. Shall we trace ALL task_id or tracing only one task_id (variable channel in code) is enough as shown below?

channel = torch.ones(1, dtype=torch.int64)
traced_script_module = torch.jit.trace(model, (premise, premise_length, hypotheses, hypotheses_length, channel))

output = traced_script_module(premise, premise_length, hypotheses, hypotheses_length, channel)
traced_script_module.save('deploy-trace-multitask.pt')

Code snippet for Model class’s definition

        self._word_embedding = nn.Embedding(self.vocab_size,
                                            self.embedding_dim,
                                            padding_idx=padding_idx,
                                            _weight=embeddings)

        if self.dropout:
            self._rnn_dropout = RNNDropout(p=self.dropout) #shared by all tasks
            # self._rnn_dropout = nn.Dropout(p=self.dropout)

        self._encoding = Seq2SeqEncoder(nn.LSTM,
                                        self.embedding_dim,
                                        self.hidden_size,
                                        bidirectional=True)

        #multi-task
        self._attention = nn.ModuleDict({})
        self._projection = nn.ModuleDict({})
        self._classification = nn.ModuleDict({})
        for channel in channels_list:
            self.update(channel)

        # Initialize all weights and biases in the model.
        self.apply(_init_esim_weights)

    def update(self, channel):
        channel = str(channel)
        self._attention.update({channel : SoftmaxAttention()})

        self._projection.update({channel : nn.Sequential(nn.Linear(4*2*self.hidden_size, self.hidden_size), nn.ReLU())})

        self._classification.update({channel : nn.Sequential(nn.Dropout(p=self.dropout),
                                             nn.Linear(4*self.hidden_size,
                                                       self.hidden_size),
                                             nn.Tanh(),
                                             nn.Dropout(p=self.dropout),
                                             nn.Linear(self.hidden_size,
                                                       self.num_classes))})

    def forward(self,
                premises,
                premises_lengths,
                hypotheses,
                hypotheses_lengths,
                channel_tensor): #must be a tensor
        """
        Args:
            premises: A batch of varaible length sequences of word indices
                representing premises. The batch is assumed to be of size
                (batch, premises_length).
            premises_lengths: A 1D tensor containing the lengths of the
                premises in 'premises'.
            hypothesis: A batch of varaible length sequences of word indices
                representing hypotheses. The batch is assumed to be of size
                (batch, hypotheses_length).
            hypotheses_lengths: A 1D tensor containing the lengths of the
                hypotheses in 'hypotheses'.

        Returns:
            logits: A tensor of size (batch, num_classes) containing the
                logits for each output class of the model.
            probabilities: A tensor of size (batch, num_classes) containing
                the probabilities of each output class in the model.
        """
        channel_id = channel_tensor.item()
        channel = str(channel_id)
        premises_mask = get_mask(premises, premises_lengths).to(self.device)
        hypotheses_mask = get_mask(hypotheses, hypotheses_lengths)\
            .to(self.device)

        embedded_premises = self._word_embedding(premises)
        embedded_hypotheses = self._word_embedding(hypotheses)

        if self.dropout:
            embedded_premises = self._rnn_dropout(embedded_premises)
            embedded_hypotheses = self._rnn_dropout(embedded_hypotheses)

        encoded_premises = self._encoding(embedded_premises,
                                          premises_lengths)
        encoded_hypotheses = self._encoding(embedded_hypotheses,
                                            hypotheses_lengths)

        attended_premises, attended_hypotheses =\
            self._attention[channel](encoded_premises, premises_mask,
                            encoded_hypotheses, hypotheses_mask)
        """ rest of the code are omitted """

You need to trace the tasks separately and then write the bit combining them in torch.script, calling the traced modules. The link has a small recipe for that.

Best regards

Thomas

1 Like

@tom Thank you for your patience, sir. Would you please clarify the step of “combining them”?

I indeed checked the for-loop example in the link which accumulates result in each for-loop iteration.

However, here, each channel’s result is final, and excludes each other. When using multitask model, it expose argument for each individual channel. For example, passing in channel “games” will produce a final score for itself (no need to combine with other channels), while passing in channel “fashion” will produce another final score for itself.

What shall be returned in loop_in_traced_fn? Returning any channel’s result will discard other channels’.

@torch.jit.script
def loop_in_traced_fn(premise, premise_length, hypotheses, hypotheses_length, channel_list):
	for channel in channel_list:
		result = model(premise, premise_length, hypotheses, hypotheses_length, channel)
		#what shall be returned here? since each channel's result exclude each other. 

#channel_list = ["games", "fashion", "news", "food"]
traced = torch.jit.trace(loop_in_traced_fn, premise, premise_length, hypotheses, hypotheses_length, channel_list)

You can accumulate them in a list and return the list mylist = [] and mylist.append(...) should work… (Or add them, torch.cat, whatever…).

Best regards

Thomas

Thank you, Thomas.

  1. would you please point me to the exact example you are referring to? I tried several but no luck yet. If we write a wrapperclass Composition, trace each subtask separately and combine them by calling forward(self, inputs) whose inputs is ["games", "fashion", "news", "food"]. This means, the final saved trace C++ model won’t be able to accept single channel like games, right?

  2. For my code in previous post,
    It seems that torch.jit.trace(loop_in_traced_fn, inputs) can ONLY accepts the inputs as tuple, so that loop_in_traced_fn will get the tensor inside the inputs tuple. Am I correct?

It still not working yet, with error message shown below. I have also tried all kinds of input such as list of tuples, list of tensors, etc.


RuntimeError: 
Tensor cannot be used as a tuple:
@torch.jit.script
def loop_in_traced_fn(channel_tuple):
    result = []
    for channel in channel_tuple:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...  <--- HERE
        print(channel)
        channel = torch.tensor([channel], dtype=torch.int64)
        output = model(premise, premise_length, hypotheses, hypotheses_length, channel)
        result.append(output)
    return result

Could you do the following, please?

  • Trace the individual models as you did before. Store them in a dictionary with string keys.
  • Write as simple a Python method that you can think of that does what you want the final, combined model to do, using only the dictionary with the traced models (and any inputs).

Best regards

Thomas

1 Like

@tom Thank you very much, Sir. Yes, I followed what you said. Below is the error message. What am I missing?

TypeError: 'dict' object for attribute 'module_dict' is not a valid constant.
Valid constants are:
  1. a nn.ModuleList
  2. a value of type {bool, float, int, str, NoneType, function, device, layout, dtype}
  3. a list or tuple of (2)

Below is a snippet of combining them

#trace each channel's model separately
channel_tensors = []
module_dict = dict()
for channel_id in channels_list:
    channel = torch.tensor([channel_id], dtype=torch.int64)
    channel_tensors.append(channel)
    traced_script_module = torch.jit.trace(model, (premise, premise_length, hypotheses, hypotheses_length, channel))
    module_dict[channel_id] = traced_script_module

#combine each channel's model together
class MyScriptModule(torch.jit.ScriptModule):
    __constants__ = ['module_dict']

    def __init__(self, module_dict):
        super(MyScriptModule, self).__init__()
        self.module_dict = module_dict

    @torch.jit.script_method
    def forward(self,
                premises,
                premises_lengths,
                hypotheses,
                hypotheses_lengths,
                channel): #channel must be a tensor
        channel_id = channel.item()
        return self.module_dict[channel_id](premise, premise_length, hypotheses, hypotheses_length, channel)

my_script_module = MyScriptModule(module_dict)
my_script_module.save("deploy-trace-multitask.channel_all.pt")

@tom I also tried another aproach by defining a simple python function to trace, and get errors as below.

    _jit_script_compile(mod, ast, _rcb, get_default_args(fn))
RuntimeError: 
python value of type 'dict' cannot be used as a value:
@torch.jit.script
def loop_in_traced_fn(premise, premise_length, hypotheses, hypotheses_length, channel):
    channel_id = channel.item()
    result = module_dict[channel_id](premise, premise_length, hypotheses, hypotheses_length, channel)
             ~~~~~~~~~~~ <--- HERE
    return result

Below is the code snipeet

#trace each channel's model separately
channel_tensors = []
module_dict = dict()
for channel_id in channels_list:
    channel = torch.tensor([channel_id], dtype=torch.int64)
    channel_tensors.append(channel)
    traced_script_module = torch.jit.trace(model, (premise, premise_length, hypotheses, hypotheses_length, channel))
    module_dict[channel_id] = traced_script_module

@torch.jit.script
def loop_in_traced_fn(premise, premise_length, hypotheses, hypotheses_length, channel):
    channel_id = channel.item()
    result = module_dict[channel_id](premise, premise_length, hypotheses, hypotheses_length, channel)
    return result

channel = channel_tensors[0]
traced = torch.jit.trace(loop_in_traced_fn, (premise, premise_length, hypotheses, hypotheses_length, channel))
traced.save("deploy-trace-multitask.channel_all.pt")

Python method in both approaches report error message like “python dict type is NOT supported in Script”. The dict is used to map each channel into its traced model. This is the confusing part.

I guess this python method also requires to be traced and saved as the big wrapper model for all channel’s individually traced models.

It turns out that it’s a bit more trick, I’m still working on it…

1 Like

Thank you so much for your kind help. The main issue is how to trace and save the python method which combines all channels’ traced module.

Hi, Thomas. Is it possible? Thanks.

Anyone knows, please?