How to use TorchScript with Dropout for training?

fermat97 · July 9, 2020, 6:53pm

My model is:

import torch.nn as nn

class Net(nn.Module):
    def __init__(self, num_input, num_hidden, num_classes, dropout,
                 activation='tanh'):
        super(Net, self).__init__()
        self.dropout = nn.Dropout(dropout)
        self.fc1 = nn.Linear(num_input, num_hidden)
        self.fc2 = nn.Linear(num_hidden, num_classes)

        if activation == 'tanh':
            self.activation_f = torch.tanh
        elif activation == 'relu':
            self.activation_f = torch.relu

    def forward(self, x):
        x = self.activation_f(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

I call my model for instance as:

model = Net(14,512,2,0.2).to(device)

However once I use TorchScript as:

traced_model = torch.jit.trace(model, torch.zeros([1, 14], dtype=torch.float))

I receive the following error:

IndexError: The shape of the mask [2] at index 0 does not match the shape of the indexed tensor [1, 2] at index 0

I know that if I use model.eval() I don’t receive any error BUT I want to use my model for training and not evaluation. Does anybody know any solution or workaround for such problem?

PS: I am using PyTorch version 1.4.

SplitInfinity · July 9, 2020, 10:23pm

I am unable to reproduce this error with Python 3.8.3 and pytorch-1.4.0 from conda.

Based on your error message, it seems like your input shape might be the problem. Try using an input with shape [14] as the input during tracing:

traced_model = torch.jit.trace(model, torch.zeros([14], dtype=torch.float))

Stepping back a bit, I would advise you to use scripting (torch.jit.script) rather than tracing for this use case. The reason is that control flow is not visible to tracing and from what I remember, the backward pass for dropout predicates whether or not the incoming gradient should be backpropagated based on whether the corresponding input was passed through during the forward pass.

fermat97 · July 9, 2020, 11:32pm

Hi, but it is not possible to take input of size [14], where is the batch dimension ([1,14])? I don’t think the dimension is the problem since once I fix the dropout rate to 0.0 then there is no error message and it works just fine.

SplitInfinity · July 9, 2020, 11:47pm

I still cannot reproduce your error. Can you provide more details on your setup? There is an environment details collection script in the PyTorch repository.